Cross-Attention

Cross-Attention Mechanism

Definition

An attention mechanism that allows a model to attend to external context (such as tool definitions, RAG documents, or structured knowledge) when generating output. Needle's research identifies cross-attention as the right primitive for tool calling, enabling efficient matching of queries to tools and argument extraction without requiring large FFN parameters.

Examples in the Wild

  • Example 1:Attending to tool schema definitions when deciding which tool to call
  • Example 2:Matching user query to available tools in context
  • Example 3:Extracting argument values from user input based on tool parameter definitions