Home / Glossary

Agentic AI Glossary

28 core terms every engineer building production AI agents should know — defined in plain English, the way they're used in the 2026 Agentic AI Engineer roadmap.

Agent

An LLM-powered system that can decide which actions to take to reach a goal — calling tools, reading results, and looping — rather than producing a single one-shot response. Contrast with a fixed "workflow" where the steps are hard-coded.

Large Language Model (LLM)

A neural network trained on large text corpora to predict the next token. It powers reasoning, generation, and tool selection in agents. Examples: GPT, Claude, Gemini, Llama.

Token

The unit an LLM reads and writes — roughly ¾ of a word. Pricing, context limits, and latency are all measured in tokens.

Context Window

The maximum number of tokens a model can consider at once (prompt + response). Managing what goes into the window — "context engineering" — is central to reliable agents.

Prompt Engineering

The practice of structuring instructions, examples, and constraints so a model produces reliable, useful output. Includes system prompts, few-shot examples, and output formatting.

Chain-of-Thought (CoT)

Prompting a model to reason step by step before answering, which improves accuracy on multi-step problems.

Reasoning Model

A model trained or tuned to "think" before answering (extended internal reasoning), trading latency and cost for higher accuracy on hard tasks. Contrast with fast "base" chat models.

Retrieval-Augmented Generation (RAG)

Fetching relevant documents from a knowledge base and inserting them into the prompt so the model answers from your data instead of memory — reducing hallucination and enabling up-to-date answers.

Embedding

A numeric vector that represents the meaning of a piece of text. Similar meanings produce nearby vectors, which is how semantic search and RAG retrieval work.

Vector Database

A store optimized for similarity search over embeddings (e.g. pgvector, Pinecone, Qdrant, Weaviate). The retrieval engine behind RAG.

Chunking

Splitting source documents into smaller passages before embedding, so retrieval returns focused, relevant context. Chunk size and overlap materially affect RAG quality.

Evaluation (Evals)

Systematic measurement of an AI system's quality — accuracy, faithfulness, relevance — using test sets and metrics (e.g. Ragas). Without evals you can't tell if a change helped.

Hallucination

When a model produces confident but false or unsupported output. Mitigated with RAG, grounding, guardrails, and evaluation.

Function Calling / Tool Use

Letting a model invoke external functions (search, database queries, APIs) by returning structured arguments, then reading the results — the mechanism that turns an LLM into an agent.

Model Context Protocol (MCP)

An open standard for connecting models to tools and data sources through a consistent interface, so the same "MCP server" can serve many agents and clients.

ReAct

An agent pattern that interleaves Reasoning and Acting — the model thinks, takes an action (tool call), observes the result, and repeats until done.

Memory

How an agent retains information across turns or sessions — short-term (conversation), long-term (stored facts), and semantic (embedded recall). Enables personalization and continuity.

Context Engineering

Deciding what information to put in the model's limited context window at each step — retrieval, summarization, compression, and memory — to maximize reliability and minimize cost.

Multi-Agent System

Multiple specialized agents collaborating on a task (e.g. a planner, researcher, and writer), often coordinated by a supervisor. Useful for complex workflows beyond a single agent's reliability.

Orchestration

Coordinating the steps, tools, and agents in a system — routing, branching, retries, and state — so the whole pipeline runs reliably.

LangGraph

A framework for building stateful, multi-step agent workflows as graphs of nodes and edges, with explicit control over state, branching, and loops.

Guardrails

Checks on an agent's inputs, outputs, and actions — content filtering, schema validation, allow-lists, and policy enforcement — to keep behavior safe and on-spec in production.

LLMOps

The operational practice of running LLM systems in production: observability, tracing, evaluation, versioning, cost control, and continuous improvement.

Observability / Tracing

Recording each step of an agent run — prompts, tool calls, latencies, costs, outputs — so you can debug failures and measure quality over time.

Inference

Running a trained model to produce output. Inference cost and latency dominate the economics of production agents.

Fine-Tuning

Further-training a base model on domain data to specialize its behavior. Often unnecessary for agents — good prompting + RAG usually goes further, faster.

Prompt Caching

Reusing the model's processing of a stable prompt prefix across requests to cut latency and cost — valuable for agents with large, repeated system prompts.

Capstone Project

An end-to-end build that proves a skill set. The roadmap's three capstones: a distributed RAG pipeline, a multi-agent system, and a production agent deployed on AWS.

Ready to go deeper? Follow the free 26-week Agentic AI Engineer roadmap, read the complete 2026 guide, or reserve a seat in the next live masterclass.
By Balaji Chippada — The Agent Engineer · YouTube · balajichippada.com