DEFINITION
Cactus Inference Engine
Cactus: Mobile-First Inference Engine
Definition
An inference engine built from scratch for mobile devices, wearables, and custom hardware. Cactus is designed to run models like Needle efficiently on consumer devices with constraints on compute, memory, and power. It enables deployment of agentic models on phones, watches, and glasses.
Examples in the Wild
- Example 1:Running Needle (26M parameters) at 6000 tok/s prefill on consumer phones
- Example 2:Deploying tool-calling agents on smartwatches
- Example 3:Custom hardware optimization for edge inference