Google adds a speed boost to Gemma 4 AI models for local use

In short: Google released experimental add-ons for its Gemma 4 open AI models that can make them generate text up to three times faster on some hardware.

What happened

Google’s Gemma 4 models are designed to run locally, meaning on your own computer or phone instead of on a company’s servers. This can help with privacy because your prompts and files do not have to leave your device.

This week, Google released what it calls Multi-Token Prediction, or MTP, “drafters” for Gemma 4. These are smaller helper models that try to guess several next words at once. In AI systems like this, text is normally written one small piece at a time, called a token (think of a token as a short chunk of text, like part of a word or a whole word).

MTP uses a method called speculative decoding. A simple way to picture it is a fast assistant writing a rough draft, while the main model acts like an editor. The main model quickly checks the draft, accepts the parts it agrees with, and then continues. Google says this can reduce waiting time because the device spends less time doing slow, repetitive steps.

Google reports speedups of about 2.8x and 3.1x on Pixel phones for smaller Gemma models, and about 2.5x on Apple’s M4 chip for a larger Gemma model. Google also says there is “zero quality degradation” because the main Gemma model still verifies the draft tokens.

Why it matters

For regular people, faster on-device AI can mean more responsive apps and, on phones, potentially better battery life. “Up to 3x faster” is a best case, though, and results depend on your device. This also does not make the AI more accurate. It mainly makes it quicker at producing the same kind of answers.

Source: Arstechnica

Google adds a speed boost to Gemma 4 AI models for local use

Jack Harrison

What happened

Why it matters

Similar News

Anthropic adds “dreaming” memory feature to Claude Managed Agents

Anthropic adds a “Dreaming” feature for its AI agents

Genesis AI shows new robotics model and human-sized robotic hands

Google AI search summaries will now quote and link to Reddit

Wonder plans AI tool that lets anyone launch a restaurant brand fast

Explore AI Directory