Active Learning
TechniquesA machine learning approach where the model selectively queries a human annotator to label the most informative examples, maximizing learning efficiency per labeled sample.
Key terms and concepts explained simply.
A machine learning approach where the model selectively queries a human annotator to label the most informative examples, maximizing learning efficiency per labeled sample.
A small set of trainable parameters inserted into a frozen pre-trained model, enabling efficient fine-tuning without modifying the original model weights.
A multi-agent orchestration pattern where a coordinator agent dispatches work across many parallel sub-agents, then aggregates their results — popularized in 2026 by Kimi K2.6's Agent Swarm runtime, which scales to 300 sub-agents over 4,000 reasoning steps.
A design paradigm where AI systems autonomously plan, reason, use tools, and execute multi-step workflows — going beyond single-turn question answering to sustained, goal-directed behaviour.
Software engineering performed by AI agents that plan multi-file changes, execute them across a codebase, and iterate based on test or build feedback — measured by benchmarks like SWE-Bench Verified and SWE-Bench Pro.
An autonomous software system that uses a large language model to perceive its environment, make decisions, and take actions to achieve goals — often with access to tools like file systems, APIs, browsers, and messaging platforms.
The process of adding structured metadata, labels, or tags to raw data by human annotators or automated systems to create training datasets for supervised learning.
A mechanism in transformer models that allows each token to dynamically weigh and focus on the most relevant parts of the input sequence when computing its representation.
Activation-aware Weight Quantization — a 4-bit quantization method that protects salient weights based on activation magnitude, producing higher-quality compressed models than naive quantization at the same bit-width.
A pre-trained foundation model that has been trained on a large general-purpose corpus and serves as the starting point for fine-tuning on domain-specific tasks.
The number of training examples processed simultaneously in one forward-backward pass during model training, affecting memory usage, training speed, and convergence behavior.
A standardized test suite with defined tasks and metrics used to evaluate and compare language model performance across different models and configurations.
A metric that evaluates the quality of machine-generated text by measuring n-gram overlap between the generated output and one or more human reference texts.
A phenomenon where a neural network loses previously learned knowledge when fine-tuned on new data, degrading performance on tasks it previously handled well.
A formatting structure that defines how conversational messages (system, user, assistant) are tokenized and arranged as input to a language model.
A saved snapshot of a model's weights and training state at a specific point during training, enabling recovery, evaluation, and selection of the best-performing version.
An AI agent architecture where the LLM writes and executes Python (or another language) code as its primary action format, rather than choosing from a fixed list of tools via JSON function calls — popularized by Hugging Face's smolagents framework.
The maximum number of tokens a language model can process in a single input-output sequence, determining how much text the model can 'see' at once.
A set of techniques for artificially increasing the size and diversity of a training dataset by creating modified copies of existing data points.
The process of identifying and removing duplicate or near-duplicate entries from a dataset to prevent memorization artifacts and improve training efficiency.
The process of assigning meaningful tags, categories, or annotations to raw data so that machine learning models can learn from structured examples.
The practice of tracking data from its origin through every transformation, processing step, and usage in model training to maintain a complete audit trail.
The practice of tracking and managing different versions of datasets over time, enabling reproducibility, rollback, and auditability in machine learning workflows.
A learned sparse attention mechanism introduced in DeepSeek-V3.2 and continued in V4 that routes each query token to a subset of key tokens rather than attending to all of them, dramatically reducing the compute cost of long-context inference.
The process of adjusting a model trained on general data to perform well on a specific domain, such as healthcare, legal, or finance.
A simpler alternative to RLHF that directly optimizes a language model on human preference data without requiring a separate reward model or reinforcement learning.
Running AI model inference locally on end-user devices or edge servers rather than in centralized cloud data centers, enabling offline operation and data privacy.
The portion of a model's advertised context window over which it actually retains high retrieval accuracy — typically substantially shorter than the advertised limit, with mid-context information loss running 10-25% on most current models.
A dense vector representation of a token, word, or passage in a continuous mathematical space where semantic similarity corresponds to geometric proximity.
One complete pass through the entire training dataset during the model fine-tuning process.
A technique where a model learns to perform a task from only a handful of labeled examples, typically provided as demonstrations within the prompt.
The process of taking a pre-trained AI model and further training it on a smaller, domain-specific dataset to specialize its capabilities for a particular task or industry.
A capability that allows language models to generate structured function invocations with appropriate arguments, enabling them to interact with external tools and APIs.
Generalized Experience-based Procedural Acquisition — a self-improvement mechanism for AI agents that creates reusable skills from successful task completions and refines them through use, popularized by Nous Research's Hermes Agent framework.
A binary file format designed for storing quantized large language models, optimized for fast loading and efficient CPU and GPU inference via llama.cpp and compatible runtimes.
Generalized Post-Training Quantization — a 4-bit weight quantization method that uses second-order information from a calibration dataset to minimize quantization error layer-by-layer, producing higher-quality compressed models than naive quantization.
The dedicated high-bandwidth memory on a graphics processing unit that stores model weights, activations, and gradients during training and inference.
A training technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before performing a single weight update.
Safety mechanisms and filters applied to LLM inputs and outputs to prevent harmful, off-topic, or policy-violating content from reaching users.
When a language model generates plausible-sounding but factually incorrect, fabricated, or unsupported information that is not grounded in its training data or provided context.
A model architecture pattern that integrates extended chain-of-thought reasoning into a standard chat checkpoint, with a runtime control to toggle between fast direct responses and slower deliberative reasoning — replacing the older pattern of separate reasoning-only models.
A configuration value set before training begins that controls the learning process itself, as opposed to model parameters which are learned during training.
The process of running a trained AI model to generate predictions or outputs from new input data, as opposed to the training phase where the model learns from data.
A fine-tuning approach where a language model is trained on instruction-response pairs to follow natural language directions and produce task-specific outputs.
A text-based data format where each line is a valid JSON object, widely used for structuring fine-tuning datasets, logging, and streaming data pipelines in AI/ML workflows.
A model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger, more capable 'teacher' model.
A memory buffer that stores previously computed key and value tensors from the attention mechanism, avoiding redundant computation during autoregressive text generation.
A hyperparameter that controls how much the model's weights are adjusted in response to each batch of training data, directly influencing training speed and stability.
A parameter-efficient fine-tuning technique that injects small, trainable low-rank matrices into a frozen pre-trained model, dramatically reducing the memory and compute needed to adapt large language models.
An open protocol introduced by Anthropic for connecting AI assistants to external data sources, tools, and systems — providing a standard interface that any model client can use to interact with any MCP-compatible server.
A neural network architecture that routes each input to a subset of specialized sub-networks (experts), enabling larger model capacity without proportionally increasing compute cost.
A set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production environments.
A standardized documentation artifact that describes a machine learning model's intended uses, performance metrics, limitations, ethical considerations, and training data provenance.
A technique for transferring knowledge from a large, capable 'teacher' model to a smaller, faster 'student' model, producing compact models that approach the teacher's performance on specific tasks at a fraction of the inference cost.
The systematic process of measuring a language model's performance using quantitative metrics, qualitative assessments, and domain-specific benchmarks.
The technique of combining the weights of two or more fine-tuned models into a single model that inherits capabilities from all source models.
Directing AI inference requests to different models or adapters based on request properties like task type, client identity, complexity, or cost constraints — enabling efficient multi-model deployments.
Serving multiple clients or tenants from a single model deployment using per-tenant LoRA adapters, reducing infrastructure costs by sharing the base model while delivering customized AI behavior per tenant.
An open standard format for representing machine learning models, enabling interoperability between different training frameworks and inference runtimes.
A training failure mode where the model memorizes the specific examples in its training data rather than learning generalizable patterns, causing poor performance on unseen inputs.
A learnable value in a neural network — including weights and biases — that the model adjusts during training to minimize prediction error.
A metric that measures how well a language model predicts a text sequence, with lower values indicating better prediction and more fluent language understanding.
The process of detecting and removing or masking personally identifiable information from datasets to protect individual privacy before using data for model training.
The practice of designing and iterating on input prompts to elicit desired outputs from large language models without modifying the model's weights.
A structured format with placeholders that defines how user inputs, context, and instructions are assembled into a complete prompt for a language model.
Quantized Low-Rank Adaptation — a fine-tuning technique that combines 4-bit quantization with LoRA adapters, enabling large language models to be fine-tuned on a single consumer GPU.
The process of reducing the numerical precision of a model's weights (e.g., from FP16 to INT8 or INT4) to shrink its memory footprint and accelerate inference without drastically sacrificing accuracy.
The practice of systematically probing an AI system with adversarial inputs to discover vulnerabilities, failure modes, and safety gaps before deployment.
An architecture that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them as context in the prompt.
A training technique that uses human preference judgments to fine-tune language models, aligning their outputs with human values and expectations.
A secure, fast, and memory-efficient file format for storing neural network weights, designed as a safer alternative to Python pickle-based formats.
An inference acceleration technique that uses a small, fast draft model to propose multiple tokens at once, which the larger target model verifies in parallel.
The capability of a language model to generate responses in a specific, machine-parsable format such as JSON, XML, or YAML that conforms to a predefined schema.
Artificially generated training data created using frontier models, rule-based systems, or data augmentation techniques to supplement or replace real-world data for fine-tuning ML models.
A special instruction provided at the beginning of a conversation that defines the model's behavior, persona, constraints, and response format.
A sampling parameter that controls the randomness of a language model's output — lower values produce more deterministic responses, higher values increase creativity and variety.
NVIDIA's high-performance deep learning inference optimizer and runtime that maximizes throughput and minimizes latency on NVIDIA GPUs.
The fundamental unit of text that a language model processes — typically a word, subword, or character that maps to an integer ID in the model's vocabulary.
The component that converts raw text into a sequence of numerical tokens that a language model can process, and vice versa.
The ability of an LLM to invoke external functions, APIs, or tools as part of its response generation — implemented through structured function-call schemas that the model produces and a runtime executes, foundational to all modern agent architectures.
A sampling strategy that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p, balancing output quality with diversity.
The curated dataset of examples used to fine-tune a machine learning model, typically formatted as structured input-output pairs in formats like JSONL.
A machine learning technique where a model trained on one task is adapted for a different but related task, leveraging previously learned representations.
The neural network architecture that underlies virtually all modern large language models, using self-attention mechanisms to process sequences in parallel.
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings used in similarity search and retrieval-augmented generation.
A development approach where developers use AI-assisted coding tools like Cursor, Bolt.new, and Replit to build applications through natural language prompts and iterative AI collaboration rather than writing every line manually.
A numerical parameter in a neural network that is learned during training and determines how the model transforms input data into output predictions.
AI products or services that are developed by one company and rebranded by another to appear as their own, allowing agencies and resellers to offer custom AI solutions without building models from scratch.
The ability of a model to perform a task it was never explicitly trained on, using only natural language instructions without any demonstration examples.