355
Audio & Video Production344
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps174
Writing & Content Creation203
Data & Analytics140
Design & Creative169
Customer Support131
Photography & Imaging156
Sales & Outreach125
Voice & Speech135
Education & Learning131
Operations & Admin87
Google DeepMind has released DiffusionGemma, an experimental open AI model that generates text in blocks and can run faster on local GPUs.
In short: Google DeepMind released DiffusionGemma, an experimental open AI model that can generate text much faster by creating many words at once.
Google DeepMind has released DiffusionGemma, a new model in its Gemma 4 family of open models. “Open” here means people can download the model files and run it on their own computers, under the Apache 2.0 license.
Most text chatbots write like a person typing, one small piece at a time from left to right. DiffusionGemma uses a different approach borrowed from image tools that start with visual “noise” and slowly clean it up. For text, you can think of it like filling in a whole crossword grid at once, then repeatedly correcting letters until the final grid makes sense.
Google says this method lets DiffusionGemma generate up to 256 tokens (tokens are small chunks of text, like short words or parts of words) in parallel. In tests, it produced about 700 tokens per second on an Nvidia RTX 5090 graphics card, and over 1,000 tokens per second on a single Nvidia H100 chip. Google said that is about four times faster than similarly sized Gemma models that generate text one token at a time.
DiffusionGemma is a “Mixture of Experts” model, meaning it has many specialized parts but only uses some of them for each request (like consulting a few specialists instead of calling the whole company). It has 26 billion parameters in total, but only 3.8 billion are active when it runs. Google says that helps it fit on a high end consumer GPU with about 18GB of memory.
Faster local AI can make tools feel more responsive and cheaper to run, especially for tasks like editing text where you want quick changes. Google also notes drawbacks, including a higher error rate for this style of text generation and wasted work when you only need a short answer.
Source: Arstechnica