Inference

LLM Inference

Definition

The process of running an LLM to generate a response — as opposed to training. When your agent 'thinks,' it's performing inference. Inference speed, cost, and reliability are key metrics for agent builders choosing an LLM provider.

Examples in the Wild

  • Example 1:Local inference via Ollama keeps data on your machine
  • Example 2:Cloud inference via Anthropic API is fast but costs per token