AI & ML Glossary

Key terms and concepts explained simply.

Active Learning

A machine learning approach where the model selectively queries a human annotator to label the most informative examples, maximizing learning efficiency per labeled sample.

Adapter

A small set of trainable parameters inserted into a frozen pre-trained model, enabling efficient fine-tuning without modifying the original model weights.

Agent Swarm

A multi-agent orchestration pattern where a coordinator agent dispatches work across many parallel sub-agents, then aggregates their results — popularized in 2026 by Kimi K2.6's Agent Swarm runtime, which scales to 300 sub-agents over 4,000 reasoning steps.

Agentic AI

A design paradigm where AI systems autonomously plan, reason, use tools, and execute multi-step workflows — going beyond single-turn question answering to sustained, goal-directed behaviour.

Agentic Coding

Software engineering performed by AI agents that plan multi-file changes, execute them across a codebase, and iterate based on test or build feedback — measured by benchmarks like SWE-Bench Verified and SWE-Bench Pro.

AI Agent

An autonomous software system that uses a large language model to perceive its environment, make decisions, and take actions to achieve goals — often with access to tools like file systems, APIs, browsers, and messaging platforms.

Annotation

The process of adding structured metadata, labels, or tags to raw data by human annotators or automated systems to create training datasets for supervised learning.

Attention

ML Fundamentals

A mechanism in transformer models that allows each token to dynamically weigh and focus on the most relevant parts of the input sequence when computing its representation.

AWQ

Activation-aware Weight Quantization — a 4-bit quantization method that protects salient weights based on activation magnitude, producing higher-quality compressed models than naive quantization at the same bit-width.

Base Model

ML Fundamentals

A pre-trained foundation model that has been trained on a large general-purpose corpus and serves as the starting point for fine-tuning on domain-specific tasks.

Batch Size

ML Fundamentals

The number of training examples processed simultaneously in one forward-backward pass during model training, affecting memory usage, training speed, and convergence behavior.

Benchmark

ML Fundamentals

A standardized test suite with defined tasks and metrics used to evaluate and compare language model performance across different models and configurations.

BLEU Score

ML Fundamentals

A metric that evaluates the quality of machine-generated text by measuring n-gram overlap between the generated output and one or more human reference texts.

Catastrophic Forgetting

ML Fundamentals

A phenomenon where a neural network loses previously learned knowledge when fine-tuned on new data, degrading performance on tasks it previously handled well.

Chat Template

A formatting structure that defines how conversational messages (system, user, assistant) are tokenized and arranged as input to a language model.

Checkpoint

ML Fundamentals

A saved snapshot of a model's weights and training state at a specific point during training, enabling recovery, evaluation, and selection of the best-performing version.

Code-Action Agent

An AI agent architecture where the LLM writes and executes Python (or another language) code as its primary action format, rather than choosing from a fixed list of tools via JSON function calls — popularized by Hugging Face's smolagents framework.

Context Window

ML Fundamentals

The maximum number of tokens a language model can process in a single input-output sequence, determining how much text the model can 'see' at once.

Data Augmentation

A set of techniques for artificially increasing the size and diversity of a training dataset by creating modified copies of existing data points.

Data Deduplication

The process of identifying and removing duplicate or near-duplicate entries from a dataset to prevent memorization artifacts and improve training efficiency.

Data Labeling

The process of assigning meaningful tags, categories, or annotations to raw data so that machine learning models can learn from structured examples.

Data Lineage

Compliance & Privacy

The practice of tracking data from its origin through every transformation, processing step, and usage in model training to maintain a complete audit trail.

Data Versioning

Tools & Frameworks

The practice of tracking and managing different versions of datasets over time, enabling reproducibility, rollback, and auditability in machine learning workflows.

DeepSeek Sparse Attention (DSA)

ML Fundamentals

A learned sparse attention mechanism introduced in DeepSeek-V3.2 and continued in V4 that routes each query token to a subset of key tokens rather than attending to all of them, dramatically reducing the compute cost of long-context inference.

Domain Adaptation

The process of adjusting a model trained on general data to perform well on a specific domain, such as healthcare, legal, or finance.

DPO (Direct Preference Optimization)

A simpler alternative to RLHF that directly optimizes a language model on human preference data without requiring a separate reward model or reinforcement learning.

Edge Inference

Running AI model inference locally on end-user devices or edge servers rather than in centralized cloud data centers, enabling offline operation and data privacy.

Effective Context Length

ML Fundamentals

The portion of a model's advertised context window over which it actually retains high retrieval accuracy — typically substantially shorter than the advertised limit, with mid-context information loss running 10-25% on most current models.

Embedding

ML Fundamentals

A dense vector representation of a token, word, or passage in a continuous mathematical space where semantic similarity corresponds to geometric proximity.

Epoch

ML Fundamentals

One complete pass through the entire training dataset during the model fine-tuning process.

Few-Shot Learning

A technique where a model learns to perform a task from only a handful of labeled examples, typically provided as demonstrations within the prompt.

Fine-Tuning

ML Fundamentals

The process of taking a pre-trained AI model and further training it on a smaller, domain-specific dataset to specialize its capabilities for a particular task or industry.

Function Calling

ML Fundamentals

A capability that allows language models to generate structured function invocations with appropriate arguments, enabling them to interact with external tools and APIs.

GEPA

Generalized Experience-based Procedural Acquisition — a self-improvement mechanism for AI agents that creates reusable skills from successful task completions and refines them through use, popularized by Nous Research's Hermes Agent framework.

GGUF

A binary file format designed for storing quantized large language models, optimized for fast loading and efficient CPU and GPU inference via llama.cpp and compatible runtimes.

GPTQ

Generalized Post-Training Quantization — a 4-bit weight quantization method that uses second-order information from a calibration dataset to minimize quantization error layer-by-layer, producing higher-quality compressed models than naive quantization.

GPU Memory (VRAM)

The dedicated high-bandwidth memory on a graphics processing unit that stores model weights, activations, and gradients during training and inference.

Gradient Accumulation

A training technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before performing a single weight update.

Guardrails

Compliance & Privacy

Safety mechanisms and filters applied to LLM inputs and outputs to prevent harmful, off-topic, or policy-violating content from reaching users.

Hallucination

ML Fundamentals

When a language model generates plausible-sounding but factually incorrect, fabricated, or unsupported information that is not grounded in its training data or provided context.

Hybrid Reasoning

ML Fundamentals

A model architecture pattern that integrates extended chain-of-thought reasoning into a standard chat checkpoint, with a runtime control to toggle between fast direct responses and slower deliberative reasoning — replacing the older pattern of separate reasoning-only models.

Hyperparameter

ML Fundamentals

A configuration value set before training begins that controls the learning process itself, as opposed to model parameters which are learned during training.

Inference

ML Fundamentals

The process of running a trained AI model to generate predictions or outputs from new input data, as opposed to the training phase where the model learns from data.

Instruction Tuning

A fine-tuning approach where a language model is trained on instruction-response pairs to follow natural language directions and produce task-specific outputs.

JSONL

A text-based data format where each line is a valid JSON object, widely used for structuring fine-tuning datasets, logging, and streaming data pipelines in AI/ML workflows.

Knowledge Distillation

A model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger, more capable 'teacher' model.

KV Cache

A memory buffer that stores previously computed key and value tensors from the attention mechanism, avoiding redundant computation during autoregressive text generation.

Learning Rate

ML Fundamentals

A hyperparameter that controls how much the model's weights are adjusted in response to each batch of training data, directly influencing training speed and stability.

LoRA

A parameter-efficient fine-tuning technique that injects small, trainable low-rank matrices into a frozen pre-trained model, dramatically reducing the memory and compute needed to adapt large language models.

MCP (Model Context Protocol)

Tools & Frameworks

An open protocol introduced by Anthropic for connecting AI assistants to external data sources, tools, and systems — providing a standard interface that any model client can use to interact with any MCP-compatible server.

Mixture of Experts

ML Fundamentals

A neural network architecture that routes each input to a subset of specialized sub-networks (experts), enabling larger model capacity without proportionally increasing compute cost.

MLOps

Tools & Frameworks

A set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production environments.

Model Card

Compliance & Privacy

A standardized documentation artifact that describes a machine learning model's intended uses, performance metrics, limitations, ethical considerations, and training data provenance.

Model Distillation

A technique for transferring knowledge from a large, capable 'teacher' model to a smaller, faster 'student' model, producing compact models that approach the teacher's performance on specific tasks at a fraction of the inference cost.

Model Evaluation

ML Fundamentals

The systematic process of measuring a language model's performance using quantitative metrics, qualitative assessments, and domain-specific benchmarks.

Model Merging

The technique of combining the weights of two or more fine-tuned models into a single model that inherits capabilities from all source models.

Model Routing

Directing AI inference requests to different models or adapters based on request properties like task type, client identity, complexity, or cost constraints — enabling efficient multi-model deployments.

Multi-Tenant Inference

Serving multiple clients or tenants from a single model deployment using per-tenant LoRA adapters, reducing infrastructure costs by sharing the base model while delivering customized AI behavior per tenant.

ONNX (Open Neural Network Exchange)

An open standard format for representing machine learning models, enabling interoperability between different training frameworks and inference runtimes.

Overfitting

ML Fundamentals

A training failure mode where the model memorizes the specific examples in its training data rather than learning generalizable patterns, causing poor performance on unseen inputs.

Parameter

ML Fundamentals

A learnable value in a neural network — including weights and biases — that the model adjusts during training to minimize prediction error.

Perplexity

ML Fundamentals

A metric that measures how well a language model predicts a text sequence, with lower values indicating better prediction and more fluent language understanding.

PII Redaction

Compliance & Privacy

The process of detecting and removing or masking personally identifiable information from datasets to protect individual privacy before using data for model training.

Prompt Engineering

The practice of designing and iterating on input prompts to elicit desired outputs from large language models without modifying the model's weights.

Prompt Template

ML Fundamentals

A structured format with placeholders that defines how user inputs, context, and instructions are assembled into a complete prompt for a language model.

QLoRA

Quantized Low-Rank Adaptation — a fine-tuning technique that combines 4-bit quantization with LoRA adapters, enabling large language models to be fine-tuned on a single consumer GPU.

Quantization

The process of reducing the numerical precision of a model's weights (e.g., from FP16 to INT8 or INT4) to shrink its memory footprint and accelerate inference without drastically sacrificing accuracy.

Red Teaming

Compliance & Privacy

The practice of systematically probing an AI system with adversarial inputs to discover vulnerabilities, failure modes, and safety gaps before deployment.

Retrieval-Augmented Generation (RAG)

ML Fundamentals

An architecture that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them as context in the prompt.

RLHF (Reinforcement Learning from Human Feedback)

A training technique that uses human preference judgments to fine-tune language models, aligning their outputs with human values and expectations.

SafeTensors

A secure, fast, and memory-efficient file format for storing neural network weights, designed as a safer alternative to Python pickle-based formats.

Speculative Decoding

An inference acceleration technique that uses a small, fast draft model to propose multiple tokens at once, which the larger target model verifies in parallel.

Structured Output

ML Fundamentals

The capability of a language model to generate responses in a specific, machine-parsable format such as JSON, XML, or YAML that conforms to a predefined schema.

Synthetic Data

Artificially generated training data created using frontier models, rule-based systems, or data augmentation techniques to supplement or replace real-world data for fine-tuning ML models.

System Prompt

A special instruction provided at the beginning of a conversation that defines the model's behavior, persona, constraints, and response format.

Temperature

ML Fundamentals

A sampling parameter that controls the randomness of a language model's output — lower values produce more deterministic responses, higher values increase creativity and variety.

TensorRT

NVIDIA's high-performance deep learning inference optimizer and runtime that maximizes throughput and minimizes latency on NVIDIA GPUs.

Token

ML Fundamentals

The fundamental unit of text that a language model processes — typically a word, subword, or character that maps to an integer ID in the model's vocabulary.

Tokenizer

ML Fundamentals

The component that converts raw text into a sequence of numerical tokens that a language model can process, and vice versa.

Tool Use

The ability of an LLM to invoke external functions, APIs, or tools as part of its response generation — implemented through structured function-call schemas that the model produces and a runtime executes, foundational to all modern agent architectures.

Top-p (Nucleus Sampling)

ML Fundamentals

A sampling strategy that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p, balancing output quality with diversity.

Training Data

The curated dataset of examples used to fine-tune a machine learning model, typically formatted as structured input-output pairs in formats like JSONL.

Transfer Learning

A machine learning technique where a model trained on one task is adapted for a different but related task, leveraging previously learned representations.

Transformer

ML Fundamentals

The neural network architecture that underlies virtually all modern large language models, using self-attention mechanisms to process sequences in parallel.

Vector Database

A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings used in similarity search and retrieval-augmented generation.

Vibe Coding

Tools & Frameworks

A development approach where developers use AI-assisted coding tools like Cursor, Bolt.new, and Replit to build applications through natural language prompts and iterative AI collaboration rather than writing every line manually.

Weight

ML Fundamentals

A numerical parameter in a neural network that is learned during training and determines how the model transforms input data into output predictions.

White-Label AI

Tools & Frameworks

AI products or services that are developed by one company and rebranded by another to appear as their own, allowing agencies and resellers to offer custom AI solutions without building models from scratch.

Zero-Shot Learning

The ability of a model to perform a task it was never explicitly trained on, using only natural language instructions without any demonstration examples.