Smartphones Gaining Native AI: What Changes When Artificial Intelligence Runs Directly on the Chip

Artificial intelligence in smartphones has undergone a fundamental transition: from cloud processing to local execution on the device. This shift, enabled by the dedicated neural processing units (NPUs) present in modern chips, has profound implications for privacy, latency, offline availability, and new categories of features.
Cloud AI processing, the predominant model until 2023, works by sending device data to remote servers that run large models and return results. This model offers access to models with tens or hundreds of billions of parameters, but has critical disadvantages: connectivity dependency, variable network latency, growing infrastructure costs, and exposure of personal data to third-party servers.
On-device AI runs models directly on smartphone hardware, without network traffic. The Apple A18 Pro has a Neural Engine capable of 35 TOPS (trillion operations per second), while the Snapdragon 8 Elite reaches 75 TOPS and the Google Tensor G4 sits at around 45 TOPS. These numbers allow running language models of 1 to 7 billion parameters with token latency acceptable for interactive use.
Apple Intelligence exemplifies the state of the art in on-device AI. The language models running locally on the iPhone 16 Pro handle: notification and email summaries; message classification and prioritization; contextually completing phrases on the keyboard; text editing with rewriting and tone improvement; and image generation with Image Playground, all without personal data leaving the device. For tasks requiring larger models, like complex ChatGPT queries, the system uses Private Cloud Compute, Apple server infrastructure with verifiable privacy guarantees.
Gemini Nano on Android offers similar functionality in specific contexts. Circle to Search uses computer vision to identify and search any element visible on screen. Pixel’s Summarize analyzes web page and article content offline. Magic Eraser and Photo Unblur use local diffusion models for image editing without uploading. The Android ecosystem’s advantage is the multiplicity of manufacturers implementing their own AI layers: Samsung’s Galaxy AI adds simultaneous translation on calls, meeting note generation, and contextual reply suggestions.
The privacy implications are significant. When voice recognition, personal message text analysis, and photo processing occur on the device, this data is never sent to servers where it could be intercepted, leaked in security breaches, or used for model training. This represents a structural change from the previous generation of voice assistants, which sent all audio to the cloud for processing.
The current limit of on-device AI lies in the size of models that hardware can efficiently execute. Models above 7 billion parameters require aggressive quantization that degrades quality, or specialized hardware not yet present in consumer smartphones. Progress in model distillation techniques, which train smaller models to replicate the behavior of larger ones, is progressively compressing this gap.
Over the next two to three years, capabilities currently exclusive to large cloud models, such as complex multi-step reasoning and advanced code generation, are expected to be viable directly on the device with acceptable quality for everyday use.
