Agentic AI Glossary
28 core terms every engineer building production AI agents should know — defined in plain English, the way they're used in the 2026 Agentic AI Engineer roadmap.
Agent
An LLM-powered system that can decide which actions to take to reach a goal — calling tools, reading results, and looping — rather than producing a single one-shot response. Contrast with a fixed "workflow" where the steps are hard-coded.
Large Language Model (LLM)
A neural network trained on large text corpora to predict the next token. It powers reasoning, generation, and tool selection in agents. Examples: GPT, Claude, Gemini, Llama.
Token
The unit an LLM reads and writes — roughly ¾ of a word. Pricing, context limits, and latency are all measured in tokens.
Context Window
The maximum number of tokens a model can consider at once (prompt + response). Managing what goes into the window — "context engineering" — is central to reliable agents.
Prompt Engineering
The practice of structuring instructions, examples, and constraints so a model produces reliable, useful output. Includes system prompts, few-shot examples, and output formatting.
Chain-of-Thought (CoT)
Prompting a model to reason step by step before answering, which improves accuracy on multi-step problems.
Reasoning Model
A model trained or tuned to "think" before answering (extended internal reasoning), trading latency and cost for higher accuracy on hard tasks. Contrast with fast "base" chat models.
Retrieval-Augmented Generation (RAG)
Fetching relevant documents from a knowledge base and inserting them into the prompt so the model answers from your data instead of memory — reducing hallucination and enabling up-to-date answers.
Embedding
A numeric vector that represents the meaning of a piece of text. Similar meanings produce nearby vectors, which is how semantic search and RAG retrieval work.
Vector Database
A store optimized for similarity search over embeddings (e.g. pgvector, Pinecone, Qdrant, Weaviate). The retrieval engine behind RAG.
Chunking
Splitting source documents into smaller passages before embedding, so retrieval returns focused, relevant context. Chunk size and overlap materially affect RAG quality.
Evaluation (Evals)
Systematic measurement of an AI system's quality — accuracy, faithfulness, relevance — using test sets and metrics (e.g. Ragas). Without evals you can't tell if a change helped.
Hallucination
When a model produces confident but false or unsupported output. Mitigated with RAG, grounding, guardrails, and evaluation.
Function Calling / Tool Use
Letting a model invoke external functions (search, database queries, APIs) by returning structured arguments, then reading the results — the mechanism that turns an LLM into an agent.
Model Context Protocol (MCP)
An open standard for connecting models to tools and data sources through a consistent interface, so the same "MCP server" can serve many agents and clients.
ReAct
An agent pattern that interleaves Reasoning and Acting — the model thinks, takes an action (tool call), observes the result, and repeats until done.
Memory
How an agent retains information across turns or sessions — short-term (conversation), long-term (stored facts), and semantic (embedded recall). Enables personalization and continuity.
Context Engineering
Deciding what information to put in the model's limited context window at each step — retrieval, summarization, compression, and memory — to maximize reliability and minimize cost.
Multi-Agent System
Multiple specialized agents collaborating on a task (e.g. a planner, researcher, and writer), often coordinated by a supervisor. Useful for complex workflows beyond a single agent's reliability.
Orchestration
Coordinating the steps, tools, and agents in a system — routing, branching, retries, and state — so the whole pipeline runs reliably.
LangGraph
A framework for building stateful, multi-step agent workflows as graphs of nodes and edges, with explicit control over state, branching, and loops.
Guardrails
Checks on an agent's inputs, outputs, and actions — content filtering, schema validation, allow-lists, and policy enforcement — to keep behavior safe and on-spec in production.
LLMOps
The operational practice of running LLM systems in production: observability, tracing, evaluation, versioning, cost control, and continuous improvement.
Observability / Tracing
Recording each step of an agent run — prompts, tool calls, latencies, costs, outputs — so you can debug failures and measure quality over time.
Inference
Running a trained model to produce output. Inference cost and latency dominate the economics of production agents.
Fine-Tuning
Further-training a base model on domain data to specialize its behavior. Often unnecessary for agents — good prompting + RAG usually goes further, faster.
Prompt Caching
Reusing the model's processing of a stable prompt prefix across requests to cut latency and cost — valuable for agents with large, repeated system prompts.
Capstone Project
An end-to-end build that proves a skill set. The roadmap's three capstones: a distributed RAG pipeline, a multi-agent system, and a production agent deployed on AWS.