DEFINITION

Inference

LLM Inference

Definition

The process of running an LLM to generate a response — as opposed to training. When your agent 'thinks,' it's performing inference. Inference speed, cost, and reliability are key metrics for agent builders choosing an LLM provider.

Examples in the Wild

Example 1:Local inference via Ollama keeps data on your machine
Example 2:Cloud inference via Anthropic API is fast but costs per token

See it in action

View llm/for/local-agent Template →View problem/reduce-llm-costs Template →