AI & ML Glossary

    Key terms and concepts explained simply.

    Active Learning

    Techniques

    A machine learning approach where the model selectively queries a human annotator to label the most informative examples, maximizing learning efficiency per labeled sample.

    Adapter

    Techniques

    A small set of trainable parameters inserted into a frozen pre-trained model, enabling efficient fine-tuning without modifying the original model weights.

    Agent Swarm

    Techniques

    A multi-agent orchestration pattern where a coordinator agent dispatches work across many parallel sub-agents, then aggregates their results — popularized in 2026 by Kimi K2.6's Agent Swarm runtime, which scales to 300 sub-agents over 4,000 reasoning steps.

    Agentic AI

    AI Concepts

    A design paradigm where AI systems autonomously plan, reason, use tools, and execute multi-step workflows — going beyond single-turn question answering to sustained, goal-directed behaviour.

    Agentic Coding

    Techniques

    Software engineering performed by AI agents that plan multi-file changes, execute them across a codebase, and iterate based on test or build feedback — measured by benchmarks like SWE-Bench Verified and SWE-Bench Pro.

    AI Agent

    AI Concepts

    An autonomous software system that uses a large language model to perceive its environment, make decisions, and take actions to achieve goals — often with access to tools like file systems, APIs, browsers, and messaging platforms.

    Annotation

    Techniques

    The process of adding structured metadata, labels, or tags to raw data by human annotators or automated systems to create training datasets for supervised learning.

    Attention

    ML Fundamentals

    A mechanism in transformer models that allows each token to dynamically weigh and focus on the most relevant parts of the input sequence when computing its representation.

    AWQ

    Infrastructure

    Activation-aware Weight Quantization — a 4-bit quantization method that protects salient weights based on activation magnitude, producing higher-quality compressed models than naive quantization at the same bit-width.

    Base Model

    ML Fundamentals

    A pre-trained foundation model that has been trained on a large general-purpose corpus and serves as the starting point for fine-tuning on domain-specific tasks.

    Batch Size

    ML Fundamentals

    The number of training examples processed simultaneously in one forward-backward pass during model training, affecting memory usage, training speed, and convergence behavior.

    Benchmark

    ML Fundamentals

    A standardized test suite with defined tasks and metrics used to evaluate and compare language model performance across different models and configurations.

    BLEU Score

    ML Fundamentals

    A metric that evaluates the quality of machine-generated text by measuring n-gram overlap between the generated output and one or more human reference texts.

    Catastrophic Forgetting

    ML Fundamentals

    A phenomenon where a neural network loses previously learned knowledge when fine-tuned on new data, degrading performance on tasks it previously handled well.

    Chat Template

    Data Formats

    A formatting structure that defines how conversational messages (system, user, assistant) are tokenized and arranged as input to a language model.

    Checkpoint

    ML Fundamentals

    A saved snapshot of a model's weights and training state at a specific point during training, enabling recovery, evaluation, and selection of the best-performing version.

    Code-Action Agent

    Techniques

    An AI agent architecture where the LLM writes and executes Python (or another language) code as its primary action format, rather than choosing from a fixed list of tools via JSON function calls — popularized by Hugging Face's smolagents framework.

    Context Window

    ML Fundamentals

    The maximum number of tokens a language model can process in a single input-output sequence, determining how much text the model can 'see' at once.

    Data Augmentation

    Techniques

    A set of techniques for artificially increasing the size and diversity of a training dataset by creating modified copies of existing data points.

    Data Deduplication

    Techniques

    The process of identifying and removing duplicate or near-duplicate entries from a dataset to prevent memorization artifacts and improve training efficiency.

    Data Labeling

    Techniques

    The process of assigning meaningful tags, categories, or annotations to raw data so that machine learning models can learn from structured examples.

    Data Lineage

    Compliance & Privacy

    The practice of tracking data from its origin through every transformation, processing step, and usage in model training to maintain a complete audit trail.

    Data Versioning

    Tools & Frameworks

    The practice of tracking and managing different versions of datasets over time, enabling reproducibility, rollback, and auditability in machine learning workflows.

    DeepSeek Sparse Attention (DSA)

    ML Fundamentals

    A learned sparse attention mechanism introduced in DeepSeek-V3.2 and continued in V4 that routes each query token to a subset of key tokens rather than attending to all of them, dramatically reducing the compute cost of long-context inference.

    Domain Adaptation

    Techniques

    The process of adjusting a model trained on general data to perform well on a specific domain, such as healthcare, legal, or finance.

    DPO (Direct Preference Optimization)

    Techniques

    A simpler alternative to RLHF that directly optimizes a language model on human preference data without requiring a separate reward model or reinforcement learning.

    Edge Inference

    Infrastructure

    Running AI model inference locally on end-user devices or edge servers rather than in centralized cloud data centers, enabling offline operation and data privacy.

    Effective Context Length

    ML Fundamentals

    The portion of a model's advertised context window over which it actually retains high retrieval accuracy — typically substantially shorter than the advertised limit, with mid-context information loss running 10-25% on most current models.

    Embedding

    ML Fundamentals

    A dense vector representation of a token, word, or passage in a continuous mathematical space where semantic similarity corresponds to geometric proximity.

    Epoch

    ML Fundamentals

    One complete pass through the entire training dataset during the model fine-tuning process.

    Few-Shot Learning

    Techniques

    A technique where a model learns to perform a task from only a handful of labeled examples, typically provided as demonstrations within the prompt.

    Fine-Tuning

    ML Fundamentals

    The process of taking a pre-trained AI model and further training it on a smaller, domain-specific dataset to specialize its capabilities for a particular task or industry.

    Function Calling

    ML Fundamentals

    A capability that allows language models to generate structured function invocations with appropriate arguments, enabling them to interact with external tools and APIs.

    GEPA

    Techniques

    Generalized Experience-based Procedural Acquisition — a self-improvement mechanism for AI agents that creates reusable skills from successful task completions and refines them through use, popularized by Nous Research's Hermes Agent framework.

    GGUF

    Data Formats

    A binary file format designed for storing quantized large language models, optimized for fast loading and efficient CPU and GPU inference via llama.cpp and compatible runtimes.

    GPTQ

    Infrastructure

    Generalized Post-Training Quantization — a 4-bit weight quantization method that uses second-order information from a calibration dataset to minimize quantization error layer-by-layer, producing higher-quality compressed models than naive quantization.

    GPU Memory (VRAM)

    Infrastructure

    The dedicated high-bandwidth memory on a graphics processing unit that stores model weights, activations, and gradients during training and inference.

    Gradient Accumulation

    Techniques

    A training technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before performing a single weight update.

    Guardrails

    Compliance & Privacy

    Safety mechanisms and filters applied to LLM inputs and outputs to prevent harmful, off-topic, or policy-violating content from reaching users.

    Hallucination

    ML Fundamentals

    When a language model generates plausible-sounding but factually incorrect, fabricated, or unsupported information that is not grounded in its training data or provided context.

    Hybrid Reasoning

    ML Fundamentals

    A model architecture pattern that integrates extended chain-of-thought reasoning into a standard chat checkpoint, with a runtime control to toggle between fast direct responses and slower deliberative reasoning — replacing the older pattern of separate reasoning-only models.

    Hyperparameter

    ML Fundamentals

    A configuration value set before training begins that controls the learning process itself, as opposed to model parameters which are learned during training.

    Inference

    ML Fundamentals

    The process of running a trained AI model to generate predictions or outputs from new input data, as opposed to the training phase where the model learns from data.

    Instruction Tuning

    Techniques

    A fine-tuning approach where a language model is trained on instruction-response pairs to follow natural language directions and produce task-specific outputs.

    JSONL

    Data Formats

    A text-based data format where each line is a valid JSON object, widely used for structuring fine-tuning datasets, logging, and streaming data pipelines in AI/ML workflows.

    Knowledge Distillation

    Techniques

    A model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger, more capable 'teacher' model.

    KV Cache

    Infrastructure

    A memory buffer that stores previously computed key and value tensors from the attention mechanism, avoiding redundant computation during autoregressive text generation.

    Learning Rate

    ML Fundamentals

    A hyperparameter that controls how much the model's weights are adjusted in response to each batch of training data, directly influencing training speed and stability.

    LoRA

    Techniques

    A parameter-efficient fine-tuning technique that injects small, trainable low-rank matrices into a frozen pre-trained model, dramatically reducing the memory and compute needed to adapt large language models.

    MCP (Model Context Protocol)

    Tools & Frameworks

    An open protocol introduced by Anthropic for connecting AI assistants to external data sources, tools, and systems — providing a standard interface that any model client can use to interact with any MCP-compatible server.

    Mixture of Experts

    ML Fundamentals

    A neural network architecture that routes each input to a subset of specialized sub-networks (experts), enabling larger model capacity without proportionally increasing compute cost.

    MLOps

    Tools & Frameworks

    A set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production environments.

    Model Card

    Compliance & Privacy

    A standardized documentation artifact that describes a machine learning model's intended uses, performance metrics, limitations, ethical considerations, and training data provenance.

    Model Distillation

    Techniques

    A technique for transferring knowledge from a large, capable 'teacher' model to a smaller, faster 'student' model, producing compact models that approach the teacher's performance on specific tasks at a fraction of the inference cost.

    Model Evaluation

    ML Fundamentals

    The systematic process of measuring a language model's performance using quantitative metrics, qualitative assessments, and domain-specific benchmarks.

    Model Merging

    Techniques

    The technique of combining the weights of two or more fine-tuned models into a single model that inherits capabilities from all source models.

    Model Routing

    Infrastructure

    Directing AI inference requests to different models or adapters based on request properties like task type, client identity, complexity, or cost constraints — enabling efficient multi-model deployments.

    Multi-Tenant Inference

    Infrastructure

    Serving multiple clients or tenants from a single model deployment using per-tenant LoRA adapters, reducing infrastructure costs by sharing the base model while delivering customized AI behavior per tenant.

    ONNX (Open Neural Network Exchange)

    Data Formats

    An open standard format for representing machine learning models, enabling interoperability between different training frameworks and inference runtimes.

    Overfitting

    ML Fundamentals

    A training failure mode where the model memorizes the specific examples in its training data rather than learning generalizable patterns, causing poor performance on unseen inputs.

    Parameter

    ML Fundamentals

    A learnable value in a neural network — including weights and biases — that the model adjusts during training to minimize prediction error.

    Perplexity

    ML Fundamentals

    A metric that measures how well a language model predicts a text sequence, with lower values indicating better prediction and more fluent language understanding.

    PII Redaction

    Compliance & Privacy

    The process of detecting and removing or masking personally identifiable information from datasets to protect individual privacy before using data for model training.

    Prompt Engineering

    Techniques

    The practice of designing and iterating on input prompts to elicit desired outputs from large language models without modifying the model's weights.

    Prompt Template

    ML Fundamentals

    A structured format with placeholders that defines how user inputs, context, and instructions are assembled into a complete prompt for a language model.

    QLoRA

    Techniques

    Quantized Low-Rank Adaptation — a fine-tuning technique that combines 4-bit quantization with LoRA adapters, enabling large language models to be fine-tuned on a single consumer GPU.

    Quantization

    Techniques

    The process of reducing the numerical precision of a model's weights (e.g., from FP16 to INT8 or INT4) to shrink its memory footprint and accelerate inference without drastically sacrificing accuracy.

    Red Teaming

    Compliance & Privacy

    The practice of systematically probing an AI system with adversarial inputs to discover vulnerabilities, failure modes, and safety gaps before deployment.

    Retrieval-Augmented Generation (RAG)

    ML Fundamentals

    An architecture that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them as context in the prompt.

    RLHF (Reinforcement Learning from Human Feedback)

    Techniques

    A training technique that uses human preference judgments to fine-tune language models, aligning their outputs with human values and expectations.

    SafeTensors

    Data Formats

    A secure, fast, and memory-efficient file format for storing neural network weights, designed as a safer alternative to Python pickle-based formats.

    Speculative Decoding

    Techniques

    An inference acceleration technique that uses a small, fast draft model to propose multiple tokens at once, which the larger target model verifies in parallel.

    Structured Output

    ML Fundamentals

    The capability of a language model to generate responses in a specific, machine-parsable format such as JSON, XML, or YAML that conforms to a predefined schema.

    Synthetic Data

    Techniques

    Artificially generated training data created using frontier models, rule-based systems, or data augmentation techniques to supplement or replace real-world data for fine-tuning ML models.

    System Prompt

    Techniques

    A special instruction provided at the beginning of a conversation that defines the model's behavior, persona, constraints, and response format.

    Temperature

    ML Fundamentals

    A sampling parameter that controls the randomness of a language model's output — lower values produce more deterministic responses, higher values increase creativity and variety.

    TensorRT

    Infrastructure

    NVIDIA's high-performance deep learning inference optimizer and runtime that maximizes throughput and minimizes latency on NVIDIA GPUs.

    Token

    ML Fundamentals

    The fundamental unit of text that a language model processes — typically a word, subword, or character that maps to an integer ID in the model's vocabulary.

    Tokenizer

    ML Fundamentals

    The component that converts raw text into a sequence of numerical tokens that a language model can process, and vice versa.

    Tool Use

    Techniques

    The ability of an LLM to invoke external functions, APIs, or tools as part of its response generation — implemented through structured function-call schemas that the model produces and a runtime executes, foundational to all modern agent architectures.

    Top-p (Nucleus Sampling)

    ML Fundamentals

    A sampling strategy that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p, balancing output quality with diversity.

    Training Data

    Data Formats

    The curated dataset of examples used to fine-tune a machine learning model, typically formatted as structured input-output pairs in formats like JSONL.

    Transfer Learning

    Techniques

    A machine learning technique where a model trained on one task is adapted for a different but related task, leveraging previously learned representations.

    Transformer

    ML Fundamentals

    The neural network architecture that underlies virtually all modern large language models, using self-attention mechanisms to process sequences in parallel.

    Vector Database

    Infrastructure

    A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings used in similarity search and retrieval-augmented generation.

    Vibe Coding

    Tools & Frameworks

    A development approach where developers use AI-assisted coding tools like Cursor, Bolt.new, and Replit to build applications through natural language prompts and iterative AI collaboration rather than writing every line manually.

    Weight

    ML Fundamentals

    A numerical parameter in a neural network that is learned during training and determines how the model transforms input data into output predictions.

    White-Label AI

    Tools & Frameworks

    AI products or services that are developed by one company and rebranded by another to appear as their own, allowing agencies and resellers to offer custom AI solutions without building models from scratch.

    Zero-Shot Learning

    Techniques

    The ability of a model to perform a task it was never explicitly trained on, using only natural language instructions without any demonstration examples.