Technical terms explained for non-experts. These definitions appear throughout ThinkLLM to help you understand model profiles.
3559 terms
A model design where weights are restricted to only three discrete values (-1, 0, or 1) instead of continuous floating-point numbers, drastically reducing model size and computation.
A neural network where each weight is represented using only 1 bit of information (in this case, as one of three values: -1, 0, or 1).
An extreme form of quantization where each weight is represented by just a single bit (0 or 1), maximizing compression but reducing model expressiveness.
An extreme form of compression that represents model weights using only 1 bit of information per value, drastically reducing memory use but with significant quality loss.
A data format that represents model weights using 16 bits per number, balancing memory efficiency with numerical accuracy.
Mathematical shapes (Gaussian distributions) positioned in 3D space used to represent and render 3D scenes efficiently.
Guiding AI model outputs by conditioning on 3D spatial layout information.
Building a complete 3D model of a physical environment from images or sensor data.
Comprehending the three-dimensional structure, objects, and relationships within a physical environment.
A specific quantization method that represents model weights using only 4 bits per number instead of the standard 32 bits, dramatically reducing memory usage.
A quantization level where model weights are stored using only 4 bits per value, significantly reducing model size at the cost of some accuracy.
A specific type of quantization that represents model weights using only 4 bits instead of the original 32 bits, enabling very efficient inference on consumer hardware.
A quantization method that represents model weights using only 6 bits per value, significantly reducing memory requirements compared to standard 32-bit floating-point storage.
A specific quantization method that represents model weights using 6 bits instead of the standard 32 bits, significantly shrinking the model while maintaining reasonable accuracy.
A quantization method that represents model weights using 8 bits instead of the standard 32 bits, reducing memory usage by approximately 75% while maintaining reasonable performance.
A specific quantization method that represents model weights using 8 bits instead of the standard 32 bits, significantly reducing memory requirements.
A model variant where safety filters and refusal mechanisms have been removed, allowing it to respond to requests without built-in content restrictions.
A technique that removes or disables a model's built-in safety refusal mechanisms, allowing it to respond to a wider range of requests.
Identifying and highlighting the specific regions in medical images where disease or abnormalities are present.
Visual maps showing which regions of a medical image are abnormal, derived from comparing to historical cases.
A measure of relevance between a query and key that is independent of other keys, allowing explicit rejection of irrelevant keys.
When a system declines to make a prediction or recommendation instead of providing an answer.
A tree representation of code structure that shows how statements and expressions relate to each other.
Generating a summary by creating new sentences that capture key information, rather than selecting existing text.
The proportion of draft model's proposed tokens that the target model accepts as correct during speculative decoding.
Designing technology so people with disabilities can use it effectively.
Determining which party in a system is responsible for harms or failures.
A measure of how well an agent performs relative to the computational cost or number of steps it takes.
An internal mathematical encoding of sound properties that a model learns to recognize, such as frequency, pitch, and timbre characteristics.
A rule that decides which point to evaluate next by balancing exploration of new areas with exploitation of promising regions.
The problem of correctly associating a specific action command with the correct agent or subject in a scene.
A failure mode where agents make poor action choices that lead to uninformative observations, cascading into reasoning errors.
The task of identifying and classifying specific actions or activities occurring in video frames.
Creating videos where specific physical actions (like forces or robot movements) control what happens in the scene.
Simulating multiple future steps of an environment given a sequence of actions the agent might take.
A learned encoding of an object that explicitly captures how it responds to and changes under different actions.
The portion of a model's total parameters that are actually used to process a given input; in MoE models, this is typically much smaller than the total parameter count.
The probability distribution of neuron outputs at each layer of a network.
Random variations added to a model's internal computations to test robustness.
A mechanistic interpretability technique that replaces activations during inference to identify which components cause specific behaviors.
The specific configuration of which neurons are active across a network when processing a particular input or task.
The number of bits used to represent intermediate calculations during inference; keeping this higher (like 16-bit) helps preserve model quality when weights are heavily compressed.
Analyzing internal neural network activations to understand what a model has learned or decided at different points.
The process of reducing the precision of intermediate values (activations) computed during model inference, separate from weight quantization.
Controlling model behavior by modifying internal activations during inference without changing model weights.
Bypassing AI safety features by manipulating the internal numerical patterns the model uses to process information.
A training approach where the model chooses which new examples to learn from rather than using random data.
The number of model parameters that are actually used during inference for a given input, as opposed to the total parameters available.
A model architecture where only a subset of parameters are used for each token, reducing computational cost while maintaining model capacity.
The subset of a model's total parameters that are actually used during inference for each input, as opposed to all parameters being used every time.
A mathematical constraint ensuring a causal graph has no cycles, enforcing valid causal structures.
A standard optimizer algorithm commonly used to train neural networks by adjusting weights based on gradients.
A small, specialized module added to a model that modifies its output for a specific task without changing the core model weights.
Custom code written to translate data between incompatible formats or interfaces.
Adding lightweight modules to a pre-trained model to enable new capabilities without retraining the entire model.
An attack that adjusts its strategy based on feedback from the target system to improve its effectiveness.
Educational systems that adjust content difficulty and pacing based on real-time analysis of learner performance and understanding.
A system that dynamically adjusts parameters (like reward weights) based on the current task or input.
Dynamically selecting or modifying prompts based on the specific input query to optimize model performance.
A quantization approach that adjusts its representation strategy based on the distribution of input values.
Dynamically adjusting how much computational effort a model uses based on problem difficulty.
Optimization algorithm that splits problems into smaller parts solved alternately.
Intentional manipulation of input data to trick an AI model into making wrong decisions.
Systematically testing an agent's reasoning to find logical or evidential violations it may have missed.
A training loop where attack and defense agents compete and improve against each other iteratively.
Testing designed to find weaknesses and edge cases rather than help the system succeed.
Deliberately tricky test cases designed to fool AI models, like plausible wrong answers.
Systematically searching for inputs where a model fails, used here to find materials where ML predictions diverge from ground truth.
Training where two networks compete—one generates behavior, the other judges if it matches the expert.
A process where one agent intentionally creates challenging test cases to improve another agent's output.
Training approach where a generator and discriminator compete to improve output quality and realism.
Carefully crafted, often imperceptible changes added to images to fool AI models into producing incorrect outputs.
Deliberately crafted inputs designed to trick an LLM into unsafe or unreliable outputs.
The ability of an AI system to maintain correct behavior even when facing intentionally crafted misleading inputs.
A defense method that trains models on adversarial examples to improve robustness against attacks.
A defense mechanism that protects models from attacks without requiring exposure to adversarial examples during training.
Evaluating the visual appeal and artistic qualities of images or scenes, such as composition and harmony.
Linking emotional or sentiment states between connected entities in a system.
The emotional tone of text, measured as the degree of negativity, positivity, or neutrality in language.
Predicting which areas or objects in a scene are suitable for a specific action or interaction.
An AI system's ability to act autonomously toward goals in its environment.
The degree to which an agent retains independent decision-making capability without external manipulation.
The framework or system that orchestrates how an AI agent retrieves information, calls tools, and processes results.
Coordinating multiple AI agents to work together on complex tasks.
Automatically selecting the most suitable agent(s) for a task from available registries using matching and ranking techniques.
A specific capability or tool that an AI agent can use to accomplish part of a larger task.
The sequence of actions and decisions an agent makes while working toward a goal.
A simulation where independent agents follow simple rules and interact, creating emergent group behavior.
A model designed to act autonomously by making decisions, selecting actions, and using tools to accomplish multi-step tasks.
An AI system that can autonomously plan and execute multi-step tasks, making decisions along the way.
The ability of a model to autonomously plan and execute sequences of actions or tool calls to accomplish a goal.
An approach where an AI model autonomously plans and executes multi-step coding tasks, making decisions about which files to modify and how to structure solutions.
Sequential overhead from cascaded perception, reasoning, and tool-calling loops in agentic systems.
Designing and building systems where AI agents autonomously plan, decide, and act toward goals.
Testing an AI system's ability to complete multi-step tasks that require planning, searching, and taking actions.
A system where an AI model acts as an agent that can call tools repeatedly to solve problems step-by-step, rather than answering in a single pass.
A structured language with explicit control constructs (IF, GOTO, FORALL) that agents use to execute plans deterministically.
The study of local interaction dynamics where one agent's output becomes another agent's input under specific protocol conditions.
AI systems that can process multiple types of input (text, images, etc.) and actively interact with external tools and environments.
Vision systems that extract structured state information needed for an agent to make decisions, not just recognize objects.
Reasoning through explicit tool calls or code execution that can be interpreted and debugged, but may incur latency from external execution.
Training autonomous agents to make sequential decisions by learning from rewards and reusable experience.
A training approach where an AI model learns to make sequential decisions and take autonomous actions to complete multi-step tasks, rather than just responding to individual prompts.
AI systems that iteratively search and synthesize information to solve complex problems autonomously.
An AI agent's ability to detect and fix its own errors by using tools or feedback without human intervention.
Structured approaches where an AI system takes initiative to gather information systematically rather than passively responding to user input.
AI systems that autonomously plan, act, and adapt based on feedback to accomplish multi-step goals in complex environments.
Complex tasks where a model acts autonomously to break down goals into steps, use tools, and make decisions to reach an objective.
Processes where a model autonomously plans and executes multiple steps or tool calls to accomplish a goal, rather than responding to a single prompt.
Combining multiple data points or model outputs into a single summary result.
Interconnected systems where multiple AI components interact through shared data and infrastructure.
Randomness or noise inherent in data that cannot be reduced with more information.
Systematic errors in AI systems that unfairly disadvantage certain groups of people.
Ensuring AI systems treat different groups equitably without discrimination.
Tendency of AI systems to produce similar outputs or behaviors, either naturally or in response to incentives.
A technique that helps the model understand the order and position of words in long sequences without needing to add extra position information to each word.
A model trained to behave safely and follow human values through techniques like safety filtering and refusal of harmful requests.
The process of training a model to behave safely and according to human values and preferences, which base models typically lack.
When an AI model appears aligned under monitoring but subverts its goals when unmonitored.
The process of adjusting a model's behavior to make it safer, more helpful, and better aligned with human values.
Safety constraints built into a model during training to prevent it from generating harmful, biased, or inappropriate content.
Additional training applied to a base model to make it behave safely and follow user intentions more reliably.
A guarantee that higher bids weakly increase an item's chance of being recommended without requiring model retraining.
An early, experimental version of software that is still under development and may have bugs or incomplete features.
Errors caused by confusion between similar or overlapping UI elements when determining which one to interact with.
The linear chain of amino acids that makes up a protein, which determines its structure and function.
The linear arrangement of amino acids that make up a protein, written as a string of letters where each letter represents a different amino acid.
Spreading the cost of an expensive computation across multiple uses to reduce per-use cost.
An attention pattern that restricts a model to only attend to ancestor nodes in a tree structure, enabling efficient tree verification.
Choosing a reference model to compare all other models against in pairwise evaluation tasks.
Bias where initial information disproportionately influences subsequent decisions.
Methods for combining multiple human judgments into a single training signal for the model.
A structured set of guidelines for labeling data with specific linguistic or semantic information.
A systematic process for labeling data with human-verified information to create training datasets.
Variation in how different people label the same content, reflecting genuine differences in perspective rather than labeling error.
The negative electrode in a battery where ions are stored during charging.
Identifying data points or objects that deviate significantly from normal patterns or training data.
A declarative programming paradigm for solving combinatorial problems using logical rules and constraints.
An open-source software license that allows free use, modification, and distribution of code with minimal restrictions.
A permissive open-source license that allows free use, modification, and distribution of software with minimal restrictions.
A permissive open-source license that allows you to use, modify, and distribute software with minimal restrictions.
A permissive open-source license that allows free use, modification, and distribution of software with minimal restrictions.
An interface that allows developers to send requests to and receive responses from an AI model over the internet.
A programmatic interface that allows developers to send requests to the model and receive responses without running it locally.
The ability to access and use a model programmatically through an application programming interface, allowing developers to integrate it into their applications.
A model that can be used through an application programming interface, allowing developers to integrate it into their applications programmatically.
Access to a model through an application programming interface, allowing developers to integrate the model into their applications and services programmatically.
The ability of a service to work with the same code and commands as another service, making it easy to switch between them.
A method of making an AI model available for use over the internet through standardized web requests, rather than running it locally.
Running a model through a web service interface where you send requests and receive predictions without needing to host the model yourself.
A specification describing how a backend service accepts requests and returns data.
A credential that grants an application or automated agent permission to access services and data on behalf of a user or organization.
A model served through an application programming interface (API) rather than run locally, allowing users to send requests and receive responses over the network.
A model that can only be used through programmatic requests (code) rather than through a web interface or chat application.
Data structure that records events sequentially without allowing deletions.
Apple's custom-designed processors (like M1, M2, M3) optimized for running machine learning models on Mac computers.
Software tuning that allows a model to run efficiently on Apple's custom processors (like M1, M2, M3) found in Mac computers.
A measure of how close a solution is to the optimal solution, expressed as a ratio.
Mathematical framework for understanding how well functions can represent complex phenomena.
The underlying structural design of a neural network that defines how data flows through layers and components.
A mathematical representation of a computation as a directed graph of arithmetic operations.
A model's ability to perform mathematical calculations and solve problems involving numbers and operations.
The intensity or activation level of an emotion, ranging from calm to excited.
A machine learning model inspired by biological neurons that learns patterns from data to make predictions or classifications.
The task of identifying or classifying the artistic style of a work (e.g., Renaissance, Impressionism) using AI.
Training data generated by breaking down queries into multiple aspects and creating complementary evidence examples.
A system that retrieves stored patterns by establishing stable attractors around them, like Hopfield networks.
The ability to find meaningful connections and relationships between different concepts or ideas.
A technique where queries and documents are encoded differently to optimize retrieval performance, rather than treating them identically.
A retrieval approach where the query and the documents being searched have different lengths or structures, like matching a short question to long passages.
A neural network approach that correctly captures physics behavior across different scales and parameter regimes.
A mechanism that lets the model focus on relevant parts of the input when generating each output token.
A parallel attention mechanism within a transformer layer that learns different aspects of input relationships.
Visual representations showing which parts of an input a model focuses on when generating each output.
A technique that allows a model to focus on the most relevant parts of the input when generating each output token.
How a model's attention mechanism divides its focus between different input elements like image and text tokens.
A single forward computation through an attention mechanism that produces weighted outputs from input queries and values.
Aggregating embeddings by learning weighted combinations that emphasize the most relevant slices or features.
A token that attracts excessive attention from the model regardless of its semantic importance.
Tokens that attract disproportionate attention from the model regardless of their semantic relevance to the task.
Techniques that show which parts of input data a model focuses on during processing.
A signal measuring how well a reasoning step is supported by the input and previously accepted steps.
A component that refines embeddings by solving for fixed points using implicit differentiation during training.
Deducing personal characteristics like gender, age, or ethnicity from user data without explicit disclosure.
A technique that identifies which parts of an input (like image regions) are most responsible for a model's predictions or errors.
A technique to identify which neurons are responsible for processing specific types of input by analyzing their contribution to outputs.
When a model maintains high classification accuracy (AUC) while its explanations become inconsistent across similar cases.
A system that generates text descriptions of audio content, allowing LLMs to reason about sound indirectly.
The task of automatically assigning audio clips to predefined categories, such as identifying whether a sound is music, speech, or environmental noise.
A tool that compresses and decompresses audio data to reduce file size while preserving sound quality.
Using an audio sample to guide or control what a generative model produces, rather than using text or other inputs.
A numerical representation (vector) that captures the essential features and meaning of audio data in a compact form that machine learning models can process.
Numerical representations of audio that capture its meaning and characteristics in a form that machine learning models can process.
A neural network component that converts raw audio signals into numerical representations the model can process.
The quality and accuracy of synthesized audio in reproducing natural-sounding speech.
The process of converting compressed audio tokens back into playable audio that closely matches the original sound.
Converting spoken audio content into written text for analysis.
A training approach that teaches a model to understand connections between audio sounds and text descriptions by learning from large unlabeled datasets.
The ability to simultaneously analyze sound and video streams to understand content where both sight and sound are important.
The ability to jointly process and reason about both sound and video content to understand events, speech, and context more completely than analyzing either alone.
The process of systematically reviewing code or systems to detect errors, vulnerabilities, or malicious modifications.
An LLM's understanding of sound, audio concepts, and acoustic phenomena learned from text-only pre-training.
An optimization algorithm that solves constrained problems by iteratively updating variables and penalty parameters.
Area Under the Receiver Operating Characteristic curve, a metric measuring how well a model ranks correct answers above incorrect ones.
The underlying purpose or goal behind a creator's choices, whether to inform accurately or mislead deliberately.
Automatically selecting optimal parameter values for a program by testing different configurations.
A feature that predicts and suggests the next tokens or code snippets as a user types, completing partial inputs.
A neural network that compresses data into a smaller representation (encoder) and reconstructs it (decoder).
Using algorithms to automatically measure AI model performance on tasks.
Techniques that automatically generate patches to fix bugs or vulnerabilities in source code.
Systems that automatically evaluate student code submissions for correctness and understanding.
Using computational methods to automatically check whether a proposed solution is correct without human review.
Computing gradients of functions by decomposing them into elementary operations and applying the chain rule.
Technology that converts spoken audio into written text automatically.
The tendency for humans to over-rely on or trust automated systems, even when they make mistakes.
Automated tools that search over multiple model architectures and hyperparameters to find the best classifier without manual tuning.
An AI system that can independently perceive its environment, make decisions, and take actions to accomplish goals without constant human direction.
AI systems that can independently plan and execute multi-step tasks without human intervention at each step.
A system where AI automatically evaluates and improves itself without human intervention in the loop.
A robot independently practicing tasks and generating training data without human guidance or intervention.
A range of control levels from fully human-controlled to fully autonomous AI, with hybrid modes in between.
A model that generates text one token at a time by predicting the next word based on all previous words in the sequence.
The standard method most language models use to generate text by predicting one token (word piece) at a time, left to right, where each prediction depends on all previous tokens.
A text generation approach where the model predicts one word at a time, using all previously generated words to inform the next prediction.
A model that generates text by predicting one word or token at a time, using only the words that came before it.
A model that predicts the next item in a sequence based on all previous items, one step at a time.
Language models that generate text one token (word piece) at a time, where each new token depends on all previously generated tokens.
Generating predictions sequentially where each prediction depends on previous predictions, causing errors to compound over time.
A generative model that creates videos frame-by-frame sequentially, where each new frame depends on previously generated frames.
Generating a sequence of zoom-level decisions one at a time, where each decision depends on previous ones, to progressively narrow down a location.
An automated quantization method that intelligently rounds weights to lower precision while minimizing the loss in model performance.
Intel's automated quantization method that intelligently rounds model weights to lower precision while minimizing accuracy loss.
The core language model architecture that forms the foundation of a larger system, in this case Llama 3.
The core neural network structure that a model is built upon, which in this case is Llama 3.
A core neural network component that extracts features from input data, typically used as a foundation for larger systems rather than standalone.
A security attack where hidden malicious behavior is embedded in a model to trigger on specific inputs.
A training method for recurrent networks that computes gradients by unrolling the network across time steps.
Testing a model on historical data to evaluate how it would have performed.
Reverting to an earlier decision point when an approach fails, rather than trying to fix errors at the current level.
How learning new tasks affects performance on previously learned tasks.
A fairness metric that averages accuracy across classes, preventing high scores when one class dominates predictions.
Learning setting where you only observe the outcome of your chosen action, not all alternatives.
A neural network architecture that combines an encoder (which reads text) and a decoder (which generates text), commonly used for tasks like summarization and text generation.
A neural network design that combines an encoder (for understanding text) and decoder (for generating text) to learn meaningful representations.
The foundational neural network design that a model is built upon; inheriting from a base architecture means the model follows the same core structure and design principles.
A foundational AI model trained on raw text data without additional fine-tuning for specific tasks or instructions.
The individual weak models (like decision trees or neural networks) that are combined in an ensemble method.
A pretrained model that completes text patterns but hasn't been trained to follow instructions, serving as a starting point for customization through fine-tuning.
A smaller version of a model architecture that prioritizes speed and lower memory usage over maximum performance, making it suitable for resource-constrained environments.
A model trained only on raw text prediction without additional instruction-following training, so it completes text continuations rather than answering questions or following commands.
A language model trained on raw text data without additional instruction tuning, so it completes text patterns rather than following specific user instructions.
A simple reference model used to compare performance against more complex models or to establish a minimum expected behavior.
A region in a model's state space where inputs converge to the same output or memory.
Simple mathematical shapes (like sine waves or Gaussians) combined to represent complex signals.
Systematic differences in data caused by processing samples in separate groups.
A stable outcome where each agent's strategy is optimal given their private information and beliefs about others' strategies.
Probabilistic methods that estimate hidden states by recursively updating beliefs based on observations and a system model.
A mechanism where participants are motivated to tell the truth about their preferences, given what they know.
A statistical method that updates beliefs about unknown values using observed data and prior knowledge.
A semi-structured representation combining numerical probabilities with natural-language evidence summaries, updated iteratively by an LLM.
Neural networks that model uncertainty by treating weights as probability distributions rather than fixed values.
A method that uses probability to intelligently update and improve a system based on past results.
A framework for analyzing how information disclosure strategically influences decision-makers' choices.
Binary Cross-Entropy loss, a training objective commonly used for relevance scoring tasks where the model learns to predict whether a query-document pair is relevant or not.
A decoding algorithm that keeps the top-k most likely candidate sequences at each step, balancing quality and computational cost.
A mathematical model describing how honeybee swarms reach consensus on nest sites through recruitment and inhibition.
Training a policy to imitate expert demonstrations by supervised learning on state-action pairs.
How human responses to interventions create secondary effects that influence system outcomes.
A representation of what an AI system or person currently believes to be true about a situation.
A framework modeling agent behavior through beliefs (what they know), desires (what they want), and intentions (what they commit to do).
A mathematical operator that updates value estimates based on immediate rewards and future value predictions.
A standardized test suite used to measure and compare model performance on specific tasks.
A standardized set of test problems used to measure and compare the performance of different algorithms or models.
Comparing model safety when no labeled benchmark exists for the specific language, domain, or regulatory context.
A phenomenon where a model fits training data perfectly but still generalizes well to unseen data.
A foundational neural network architecture designed to understand the meaning of words in context by learning from large amounts of text.
A transformer-based model design that reads text in both directions simultaneously to understand context, widely used as a foundation for language understanding tasks.
A neural network model that reads text and converts it into numerical vector representations that capture the meaning of words and sentences.
A transformer-based neural network architecture designed to understand text by learning bidirectional context, commonly used as a foundation for natural language understanding tasks.
A model architecture that uses the same foundational design as BERT, which learns bidirectional context by reading text in both directions simultaneously.
A model built on BERT, a foundational architecture that learns bidirectional text representations and is commonly adapted for specific tasks like spell-checking.
A neural network design based on the BERT model that uses transformer layers to understand relationships between words in text by looking at context from all directions.
A transformer-based model architecture that reads text bidirectionally to understand context and produce meaningful representations of words and sentences.
A heavily compressed version of the BERT language model with far fewer parameters, designed for fast inference on resource-constrained devices.
A decoding strategy that generates N candidate responses and selects the one ranked highest by a reward model.
An early version of software that is still being tested and refined, meaning it may have bugs or incomplete features but is available for broader evaluation.
A topological property that counts connected components and holes in a structure, used here to enforce vessel connectivity.
A 16-bit floating-point format that balances precision and memory efficiency, commonly used for training and deploying large language models.
A 16-bit floating-point format (Brain Float 16) that balances precision and memory efficiency, commonly used for storing and running large language models.
A 16-bit numerical format that balances memory efficiency with numerical stability, using fewer bits than standard 32-bit floats while maintaining training and inference quality.
A 16-bit floating-point format that preserves numerical precision similar to full 32-bit precision while using half the memory, making large models faster and cheaper to run.
A model architecture that encodes two pieces of text separately into comparable vector representations, allowing efficient comparison of their semantic similarity.
An optimization approach with two nested loops: an inner loop optimizing fast weights and an outer loop optimizing the main model parameters.
Systematic testing of AI models to identify and measure discriminatory patterns against specific groups.
A mathematical guarantee that limits how much bias can affect a model's decisions, even if the bias source is unknown.
Parts of a model where social biases are most likely to emerge or be encoded in the computations.
Breaking down prediction error into bias (systematic error) and variance (sensitivity to training data).
An inference technique that adjusts which items are generated based on real-time bid values, steering recommendations toward higher-value items.
A mechanism that allows the model to look at context both before and after each word when understanding text, rather than just looking forward.
The ability to understand relationships between words by looking at both the words that come before and after a given word.
A neural network architecture that encodes two separate pieces of text independently and compares them to measure semantic similarity, commonly used for matching and retrieval tasks.
A large coefficient used in MILP formulations to enforce logical constraints; larger values make the relaxation weaker and solving slower.
A transformer-based model architecture designed to handle very long text sequences efficiently by using sparse attention patterns instead of processing every word pair.
An optimization framework with two hierarchical levels where upper-level decisions constrain lower-level optimization problems.
A factorization where value and policy functions are expressed as products of goal-conditioned coefficients and learned basis functions.
A model trained to understand and generate text in two languages, in this case Japanese and English.
A language model trained to understand and generate text in two languages with comparable fluency.
A recurrent neural network that processes text in both forward and backward directions to capture context from both sides of each word.
A model that processes two different types of input (in this case, code and natural language) and converts them into a shared representation space.
A decision mechanism where neurons act as on/off switches to direct data through different computational paths.
A large collection of medical and scientific texts (like research papers and journals) used to train the model on domain-specific language and concepts.
Natural language processing techniques applied specifically to medical and biological text, such as extracting drug names or identifying disease mentions from research papers.
The ability to understand and work with scientific concepts in biology and medicine, such as drug interactions and molecular structures.
Written content from medical and life sciences domains, including clinical notes, research papers, and healthcare documentation.
Specialized medical and scientific terms and concepts that the model has learned to understand from training on medical literature.
Protecting against misuse of biological research and AI in harmful ways.
Electrical or physical signals produced by the body, such as heart rhythms or brain waves.
A top-down 2D representation of a 3D scene, showing spatial layout as if viewed from above.
The mathematical space of all doubly stochastic matrices; parameterizing this space exactly is the core challenge this paper addresses.
The number of bits used to represent each number in a model; lower bit depths (like 3-bit) create smaller files but may lose some accuracy compared to higher bit depths.
The number of bits used to represent each number in a model; lower bit precision (like 3-bit) means smaller file size but potentially less accurate calculations.
The number of bits used to represent each number in a model; lower bit-widths (like 6-bit) use less memory but may reduce precision compared to higher bit-widths.
Automatically choosing the optimal number of bits for quantizing different parts of a model based on their importance.
A compression metric measuring how many bits are needed to encode each byte of text.
Evaluating a system's behavior by observing inputs and outputs without access to internal model structure or weights.
A measure of uncertainty in an agent's decision-making at a given state—how much of the decision space lacks statistical support from training data.
Assessment where evaluators don't know which version or source produced the item being judged.
An attention technique that processes groups of items together to improve efficiency and capture relationships between them.
A quantization format that groups values into blocks and uses a shared exponent (scale) for each block to reduce precision while maintaining accuracy.
Internal vector representations produced by a state space model's processing blocks that encode information about token sequences.
Scaling factors computed for groups of values in low-precision formats to maintain numerical accuracy.
A language model that generates multiple tokens in parallel using diffusion, then refines them iteratively.
A quantization method that divides values into groups and applies a shared scale factor to each group.
A ranking function that scores document relevance based on term frequency and document length normalization.
Movement commands relative to the drone's own orientation, rather than a fixed world direction.
Mechanisms that prevent an LLM from crossing defined limits in reasoning or behavior.
A rectangular coordinate set that marks the exact location and size of detected text or objects within an image.
A statistical model that ranks items based on pairwise comparison outcomes, commonly used for leaderboards.
Using AI to enhance the exploratory ideation phase of research rather than automating solution design.
The average number of possible moves available at each decision point in a game.
A marker in code where a debugger pauses execution so you can inspect the program state.
A metric measuring forecast accuracy that compares a model's predictions to a baseline (like random guessing).
A situation where the underlying physics has symmetry, but observations reveal a preferred direction or asymmetry due to measurement constraints.
A reinforcement learning technique that constrains model outputs to stay within a token budget, reducing response length while maintaining accuracy.
Breaking text into individual bytes (raw character codes) rather than words or subwords, which allows the model to handle any text without a predefined vocabulary.
The ability of a system to function correctly even when some participants behave maliciously or unpredictably.
Adjusting a model's predictions using held-out data to correct for systematic biases or distribution differences.
Determining the position and orientation of a camera in 3D space relative to a scene.
A statistical technique that finds the strongest correlations between two sets of variables by discovering shared patterns.
Training process designed to extract or develop specific abilities from a model, like reasoning or tool use.
How the number of storable associations grows with the size of the memory matrix or system parameters.
Forecasts of future returns, volatility, and correlations for different asset classes used to guide investment decisions.
Neural networks with capsule units that learn hierarchical relationships and spatial properties better than traditional convolutional layers.
A mechanism that sequentially combines information from multiple sources (global context, object details, skill knowledge) to guide model decisions.
Sequential processing where output from one stage feeds into the next.
A strategy where each model focuses on progressively smaller regions of interest to improve accuracy.
A multi-stage process that progressively assigns incidents to the correct business team or service owner.
The model's ability to distinguish between uppercase and lowercase letters as meaningful differences, treating 'Москва' and 'москва' as separate tokens with different meanings.
A model that treats uppercase and lowercase letters as identical, so 'Apple' and 'apple' are processed the same way.
The model treats uppercase and lowercase letters as distinct, allowing it to recognize proper nouns and maintain capitalization distinctions.
Text processing that preserves the distinction between uppercase and lowercase letters, treating 'Apple' and 'apple' as different tokens.
The model's ability to distinguish between uppercase and lowercase letters, making it sensitive to proper nouns and capitalization patterns that carry meaning.
When a model loses its original knowledge while learning a new task, like overwriting old skills.
A model that learns causal relationships between variables and can answer observational, interventional, and counterfactual questions.
The ability to determine true cause-and-effect relationships from data, typically guaranteed by randomization.
Determining whether a treatment actually caused an outcome, not just whether they're correlated.
Deliberately modifying a model's internal features to measure their direct effect on outputs.
A model that predicts the next word in a sequence by only looking at previous words, not future ones, making it suitable for text generation.
A training approach where the model predicts the next word based only on previous words, commonly used for text generation tasks.
Understanding cause-and-effect relationships rather than just statistical correlations in data.
A machine learning method that estimates personalized treatment effects from survival data using tree-based models.
A permissive open-source license that allows anyone to use, modify, and distribute the model as long as they give credit to the original creator.
A Creative Commons license that allows free use and modification of the model for non-commercial purposes only, with attribution required.
A problem-solving technique that starts with a simplified version of a problem and refines it when solutions fail.
A statistical phenomenon where most scores cluster near the maximum possible value, reducing the ability to distinguish between different quality levels.
When a benchmark becomes too easy and models achieve near-perfect scores, making it impossible to compare their true abilities.
Training an AI model to refuse or provide false information about certain topics.
Metric that measures structural similarity between representations by comparing their kernel matrices.
A statistical principle stating that the average of many independent samples approaches a normal distribution.
A reasoning technique where an AI model shows its step-by-step thinking process before arriving at a final answer, making its logic transparent and verifiable.
A technique where a model works through a problem step by step, showing its reasoning process before arriving at a final answer.
A quantum circuit composed of quantum channels (operations that map quantum states to quantum states) rather than unitary gates alone.
Raw wireless signal data that describes how a Wi-Fi signal changes as it travels through space and bounces off objects.
A learnable operation that scales and shifts different feature channels independently in a neural network.
Applying different forgetting rates to different feature channels in a neural network, allowing selective memory retention.
Systems where small changes in initial conditions lead to drastically different outcomes, making long-term prediction extremely difficult.
The ability of a model to maintain a character's voice, personality, and backstory throughout a conversation without contradicting itself.
A metric measuring the percentage of characters incorrectly recognized by an OCR system.
A model's ability to maintain distinct, consistent personality and speech patterns for different characters within a story.
Processing text one character at a time rather than by words, which is useful for catching individual character errors in languages like Chinese.
The ability to extract information from visual charts and perform logical reasoning tasks based on what the chart displays.
A language model specifically trained to have natural back-and-forth conversations with users rather than just completing text.
A model specifically trained and tuned to excel at conversational interactions rather than other tasks like analysis or reasoning.
A model optimized through training to excel at multi-turn conversations and dialogue, rather than single-turn text completion.
A saved snapshot of a model's weights and state at a specific point during training, allowing training to resume or the model to be evaluated at that stage.
Saved snapshots of a model at different stages of training, allowing researchers to study how the model's behavior changes as it learns.
Breaking long sequences into smaller segments and processing them sequentially while maintaining state between chunks.
The process of breaking large documents into smaller pieces so a model with a limited context window can process them separately.
A graph structure showing how research papers reference each other, used to understand relationships and influence between scientific works.
The ability to identify, reference, and maintain accurate attribution to the sources used when generating a response.
The number of insurance claims expected per policy or geographic area over a time period.
A technique that generates visual heatmaps showing which image regions a neural network uses to make predictions.
When training data has unequal numbers of examples across categories, with some classes having far fewer samples than others.
Learning to recognize new object classes over time while maintaining performance on previously seen classes.
Generating complete, structured classes with multiple methods and internal dependencies from a specification.
A loss function that penalizes misclassification of rare classes more heavily, useful when training data is imbalanced.
A statistical framework for designing and validating tests that measure psychological constructs reliably.
A machine learning model trained to assign input data into predefined categories or labels.
A technique that steers diffusion models toward desired outputs by comparing conditional and unconditional predictions.
How well an LLM's medical communication matches established clinical standards and physician practices.
The study of moral principles and values that guide medical decision-making and patient care.
Converting clinical information (diagnoses, medications, procedures) into discrete tokens that a model can process.
Using historical patient data to predict future health outcomes, disease progression, or treatment responses.
The ability to accurately interpret and reason about medical terminology, patient symptoms, and healthcare documentation.
Natural language processing applied to medical and healthcare text, such as extracting diagnoses or findings from doctor's notes and radiology reports.
The ability to analyze medical information, connect symptoms to conditions, and make logical healthcare decisions based on evidence.
The process of confirming that an AI system's outputs meet clinical standards and are safe for use in healthcare.
A model trained on image-text pairs to create shared vector representations for both images and text.
A neural network design that learns to match images and text by training them to have similar representations, enabling tasks like image search and visual understanding.
Rapidly adjusting a model to new tasks using direct mathematical solutions rather than iterative training.
A mathematical formula that directly computes an answer without iterative learning or optimization.
A system that continuously adjusts its behavior based on feedback from its actions and outcomes.
A control strategy where the robot observes its current state and adjusts actions based on feedback, rather than executing a fixed sequence.
When multiple features in a neural network are active at the same time, often because they represent related concepts.
An equilibrium where no group of players can jointly deviate and all benefit, even if they coordinate.
A game theory solution where no player benefits from unilaterally deviating from a recommended strategy.
A strategy that first captures broad patterns, then progressively refines details for better understanding.
A sequential decision-making approach that starts with broad estimates and progressively refines them to higher precision.
A curriculum learning approach that starts with learning simple components before progressing to optimizing complex global structures.
Identifying sections of code that perform the same function, even if written differently or in different programming languages.
The ability to automatically suggest or generate the next lines of code based on what the programmer has already written.
Percentage of program code executed by a test suite, measured by lines or branches.
A specialized task where a model modifies or refines existing code rather than creating new code, focusing on precision and surgical changes.
A specialized embedding designed specifically for source code that understands programming syntax and semantics, enabling tasks like code search and finding similar code snippets.
The ability of a model to write, complete, or suggest programming code based on prompts or partial code input.
Training a language model primarily on source code and technical documentation rather than general text, making it specialized for coding tasks.
A measure of how well code meets standards for readability, maintainability, and correctness.
The ability of a model to understand, analyze, and make logical inferences about source code and programming logic.
Restructuring existing code without changing its external behavior to improve readability and maintainability.
Process of examining code changes for bugs, quality issues, and adherence to standards before merging.
Automatically generating executable code (like plotting commands) from high-level specifications or natural language descriptions.
Techniques to confirm a student actually understands the code they wrote, not just copied it.
A language model specifically trained on programming code to excel at tasks like code generation, completion, and understanding.
A model trained with a focus on understanding and generating programming code across multiple languages.
A language model trained specifically on programming code and related tasks, optimized to understand and generate code better than general-purpose models.
A language model trained specifically on programming code and code-related tasks rather than general text.
The ability to naturally mix two languages within the same text or conversation, switching between them based on context rather than treating them as separate.
A lookup table mapping compressed values back to original data; avoided in this approach to save memory.
An AI system that autonomously writes, debugs, and executes code to solve tasks without human intervention.
A normalized measure of variability that expresses standard deviation as a percentage of the mean, useful for comparing spread across different scales.
Words in different languages that share a common historical origin and similar meaning.
Identifying words in different languages that share a common origin and similar meaning.
A computational framework that models how an intelligent agent perceives, reasons, and acts in the world.
A mechanism that gates speculative execution based on model confidence, without requiring ground-truth labels.
A psychological framework explaining how working memory capacity affects learning and task performance.
AI assistance that helps users think through problems and refine their goals rather than just executing stated requests.
The quality of maintaining consistent meaning and logical flow across multiple sentences or exchanges in a conversation.
The maximum circuit depth a quantum computer can execute before quantum information is lost to noise and decoherence.
A neural retrieval model design that stores multiple token-level embeddings per document and uses late interaction to achieve higher retrieval accuracy than single-vector approaches.
When a model trained with sparse rewards gets stuck early because initial success probability is too low to learn from.
When input features are highly correlated with each other, making it difficult to isolate individual feature effects on predictions.
Finding the best arrangement or selection from a finite set of possibilities, like packing objects efficiently.
Shared beliefs and mutually recognized facts that enable effective collaboration between people or AI systems.
The ability of a model to understand and apply everyday logic and practical knowledge about how the world works.
Minimizing the amount of data exchanged between devices or servers during distributed training.
A model variant created and shared by the community rather than the original model creators, often with custom modifications.
A smaller language model designed to use fewer computational resources while still performing useful tasks.
Evaluating reward quality relative to the current policy's skill level, recognizing that reward rankings change as the policy improves.
Natural language questions that define what an ontology should be able to answer, used to specify system requirements.
A quantum physics constraint ensuring operations preserve valid quantum states and probabilities.
A text generation approach where the model continues or completes text from a given prompt, rather than engaging in back-and-forth conversation.
A prompt style where you provide the beginning of text and the model continues it, rather than asking a direct question.
The ability to work through multi-step problems, analyze nuanced information, and draw logical conclusions.
Official verification that a service meets specific regulatory or security standards required by industries like healthcare or finance.
Official verifications that a service meets specific security and regulatory standards (like HIPAA or SOC 2) required by certain industries.
Discrimination that emerges from how separate system components work together, not from individual parts alone.
A design pattern where UIs are built from reusable, self-contained pieces (components) that can be combined to create larger interfaces.
Model's ability to understand new combinations of learned concepts.
Text descriptions that specify multiple elements, their relationships, and spatial arrangements in the desired image.
The principle that the meaning of a complex expression is built from the meanings of its parts and how they combine.
The ability to understand new combinations of concepts by learning how individual components combine.
The amount of processing power and memory available to run a model, which determines how much computation can be performed.
The amount of computation (time and memory) required for an algorithm to solve a problem.
The ability to deliver good results while using less processing power and memory than larger models.
Using code and algorithms to test mathematical hypotheses and discover patterns empirically.
The amount of memory, processing power, and time required to run a model; a smaller footprint means the model can run on less powerful hardware.
The extra processing power, memory, or time required to run a model, which impacts speed and resource consumption.
Using algorithms and AI during image capture to enhance photos beyond what the camera sensor alone can achieve.
The strategic distribution of a model's processing power—in this case, spending more computational effort on thinking through problems rather than other tasks.
How well a model performs relative to the computational resources (processing power and memory) required to run it.
A model designed to run with minimal processing power and memory, making it practical for devices with limited resources.
Hardware architecture that performs computation directly within memory, reducing data movement bottlenecks.
Achieving the best performance for a given amount of computational resources.
The ability for an AI model to interact with computer interfaces, navigate software applications, and execute actions on a user's behalf by understanding and responding to visual or textual representations of screens.
An interpretable model that makes predictions by routing inputs through a layer of human-understandable concepts rather than opaque features.
A low-dimensional geometric structure where related concepts are organized continuously, like a curved surface in high-dimensional space.
The process of mapping different textual expressions of the same idea to a single standardized representation, such as mapping 'MI' and 'myocardial infarction' to the same medical concept.
A reinforcement learning technique that estimates action value only within trajectories meeting specific conditions.
A property where prediction set coverage guarantees hold for specific subgroups or conditions, not just on average across all data.
A measure of uncertainty in predicted tokens given context; low entropy signals memorization, high entropy signals generalization.
The average distance between selected and validation target embeddings within a cluster, used to rank training examples.
The ability of a model to generate output (like text) based on specific input conditions or prompts provided to it.
The expected value of an output given specific input conditions, used as a deterministic baseline prediction.
Misaligned behavior that only appears when inputs share features with the training data, while appearing safe on out-of-distribution prompts.
A probabilistic model that learns to make predictions by conditioning on observed examples, useful for few-shot learning and uncertainty estimation.
The ability to generate text that follows specific conditions or constraints, rather than producing output freely.
A risk metric that focuses on the worst-case outcomes rather than average performance, useful for safety-critical tasks.
A neural network that learns to generate new data matching specific conditions or constraints.
Guiding a generative model's output by providing additional input signals like pose or depth maps.
Ensuring a model's confidence scores accurately reflect its true probability of being correct.
Assigning uncertainty scores to model predictions to identify outputs that may need human verification.
Statistical bounds around predictions that quantify uncertainty; here used to identify when model predictions are unreliable.
A decoding strategy that stops refining tokens when model confidence exceeds a set threshold.
Refusing predictions when the model's confidence score is below a threshold.
A strategy that selects which tokens to generate next based on the model's prediction confidence, enabling adaptive and efficient generation.
Training a model using rewards based on how well its confidence scores match its actual correctness.
Weighted majority voting where each candidate answer gets a confidence score from a critic model before selection.
The tendency to seek or interpret information in ways that confirm existing beliefs or outputs.
Situations where an AI system has competing goals—like serving users well versus generating revenue for its creators.
Method providing prediction intervals with statistical guarantees on coverage.
The ability to direct a model to generate specific 3D shapes or structural states of proteins.
A distinct 3D shape or arrangement that a protein can adopt, often with different biological functions.
Applying a learned conformational change from one protein to structurally similar proteins in the same family.
When an agent misuses its elevated permissions to perform actions it shouldn't, tricked by user input.
A routing pattern where multiple neurons must agree (be mutually exclusive) to activate a particular processing path.
A safety training approach that guides a model to behave according to a set of principles or rules, helping it generate more helpful and harmless responses.
Restricting a model's token generation to a predefined set of allowed tokens during inference.
Text generation that must follow specific rules or constraints, such as producing output in a particular format or structure.
Training an AI system to maximize performance while respecting hard constraints (like deadlines or budgets).
Finding solutions that satisfy a set of constraints, used here to resolve conflicts between inferred events.
A tool that finds valid solutions to problems with multiple constraints, used here to verify mechanical assembly feasibility.
Validating each step of a plan by checking outputs against automatically derived constraints based on task requirements.
Fixing errors in reasoning by making minimal changes that satisfy logical or evidential constraints.
Whether a study actually measures the real concept it's supposed to test, not something else.
A mechanism that activates learned corrections only when the robot is physically touching the object.
Physical interactions where the robot frequently touches and manipulates objects, making control sensitive to small errors.
Robot tasks where success depends critically on precise control of forces and contact interactions with objects.
A model or system that screens text before or after generation to block unsafe, harmful, or policy-violating content.
Safety mechanisms built into a model that prevent it from generating harmful, inappropriate, or restricted content.
The process of reviewing and filtering text or other content to remove or flag material that violates policies or safety guidelines.
The task of automatically detecting and categorizing text that violates policies or could cause harm, such as hate speech, violence, or misinformation.
The ability to maintain consistent meaning and logical flow when processing long sequences of text or conversation.
A model's ability to maintain coherent understanding and recall of information across long passages of text without contradicting itself.
Transferring knowledge from interaction trajectories into model parameters by learning from contextual examples.
Retaining only relevant information from execution history to reduce noise and improve decision-making in subsequent steps.
The process of collecting and organizing relevant information from history to answer specific questions or solve tasks.
The maximum amount of previous text a model can consider when generating its next output; longer context allows the model to maintain coherence over longer passages.
Organizing and maintaining relevant information for AI decision-making.
A technique to process long sequences by distributing context across multiple devices or processing units in parallel.
Irrelevant or noisy information degrading model performance in a given context.
A model's ability to remember and use information from earlier parts of a conversation or document.
When an AI model's input context window fills up and earlier information is lost, requiring mechanisms to preserve key data.
The maximum number of tokens a model can process in a single conversation or prompt.
A system that adjusts its behavior based on the specific input or situation rather than using fixed, unchanging patterns.
Speech recognition that uses surrounding information like conversation history to improve transcription accuracy.
A formal system of rules that defines which sequences of symbols are valid in a language.
Problems requiring the model to extract and use large amounts of information from the input prompt to generate correct outputs.
Help or instructions tailored to the current situation rather than generic pre-stored information.
Adjusting model behavior dynamically based on the specific input or context rather than using fixed settings.
A learning algorithm that selects actions based on context and learns from feedback to improve future decisions.
Numerical representations of text that capture meaning based on surrounding context, rather than treating each word independently.
The assumption that a model produces consistent outputs when a task is reformulated in contextually equivalent ways.
Influence from surrounding information (like examples or previous actions) that pushes an agent away from its intended behavior.
Making decisions by considering how individual observations relate to and inform each other within a broader context.
A way of encoding text where the meaning of each word depends on the words around it, rather than being fixed for every occurrence.
The intermediate representation space in a diffusion model where semantic and structural information is encoded.
Machine learning technique that identifies recurring themes in text while considering the surrounding context of words.
A feature or pattern in input text that activates hidden misaligned behavior in a model, even when standard evaluations show the model is safe.
Uncertainty caused by changing conditions over time, like user preferences shifting.
The ability of a model to interpret the meaning of words and phrases based on surrounding text, rather than treating each word in isolation.
Vector representations of words that change based on surrounding context, capturing different meanings in different sentences.
Incrementally updating a neural network on new data as it arrives, rather than retraining from scratch.
Training models to learn new tasks without forgetting previously learned ones.
Further training a pretrained model on domain-specific data to specialize it for particular tasks.
Real-time monitoring of a quantum system that produces a stream of measurement data used to update state estimates.
Encoding data as smooth, unquantized values rather than discrete tokens, preserving fine-grained temporal details.
A mathematical property ensuring a system's outputs converge to a stable state regardless of initial conditions.
A mathematical property ensuring that a system brings nearby states closer together over time, guaranteeing stability.
A training technique that learns by comparing similar and dissimilar examples to create better representations.
Training objective that pulls similar examples together and pushes different ones apart.
A method that learns shared embedding spaces by contrasting similar and dissimilar image pairs, then ranks candidates by similarity.
Breaking down a neural network's output into individual contributions from different neurons or neuron groups.
Special tokens added at the beginning of a prompt that tell the model what style, domain, or format to use for its output.
Special tokens inserted into sequences to guide model behavior, such as signaling whether to show an ad or organic content.
Automatically designing a decision-making system that controls when and how to execute actions.
A technique that adds spatial control to diffusion models by conditioning generation on aligned input maps (like depth or property masks).
Physics problems where fluid flow effects dominate over diffusion, creating sharp gradients and moving fronts.
The point during training when a model's performance stabilizes and stops improving significantly, indicating it has learned the patterns in the data.
Mathematical proofs that an algorithm will reach a correct solution under specified conditions.
How quickly an optimization algorithm approaches the optimal solution, typically expressed as a function of iterations.
When different models independently learn similar features or representations from different training signals.
AI systems designed to understand and respond to human language in natural, dialogue-like interactions.
An AI system designed to conduct multi-turn dialogue with users to accomplish specific tasks, like medical interviewing.
Using dialogue with a chatbot or AI agent to probe and verify student understanding through questioning.
The model's ability to maintain logical consistency and relevance across multiple turns of dialogue, making responses feel natural and connected.
How naturally and coherently a model engages in back-and-forth dialogue, matching human conversation patterns.
A model specifically trained to understand and generate natural dialogue, optimized for back-and-forth interactions rather than one-off text generation.
A language model specifically trained and optimized to engage in multi-turn dialogue with users.
A function where any line segment between two points on the curve lies above the curve, ensuring a single global minimum.
Mathematical technique for finding the best solution to a problem with a single global optimum.
A geometric shape formed by the intersection of linear inequalities, with vertices representing extreme points.
A technique that scans across input data using small filters to detect local patterns, commonly used in image processing but here applied to text for efficiency.
The geographic coordinate system (e.g., latitude/longitude) used to define spatial locations and ensure consistency across operations.
Game theory scenarios where agents benefit from matching actions but may also benefit from strategic differentiation.
An optimization where data is only copied when modified, allowing multiple references to share the same data until changes occur.
An attention mechanism where peripheral tokens (patches) interact only through central core tokens, reducing computation.
Identifying when different mentions in text refer to the same entity or concept.
A collection of documents or text used as the knowledge base for retrieval in RAG systems.
Selecting query terms that best distinguish relevant documents from irrelevant ones in a specific corpus.
A filtering mechanism that validates whether a proposed solution is correct before allowing it to advance in a search process.
A model's ability to maintain performance when input data is degraded (e.g., noise, blur, missing values).
A mathematical measure that compares how similar two embeddings are by calculating the angle between them, with values closer to 1 meaning more similar.
A method of comparing two vectors based only on their direction, ignoring their magnitude, making it scale-invariant.
A learnable activation function using cosine waves with adjustable frequency and phase to process data nonlinearly.
An adversarial attack that accounts for the real-world cost or feasibility of modifying each feature.
The ability to deliver useful results while using fewer computational resources, reducing the expense of running the model.
A training methodology that combines chain-of-thought reasoning with masked autoencoder techniques to improve model understanding of text relationships.
Testing what would happen if you changed a strategy, without actually running the experiment in the real world.
An explanation showing what input changes would alter a model's prediction to a different outcome.
Creating alternative scenarios showing what would happen if something were different (e.g., if an object didn't exist).
Training examples where evidence is semantically related but contradicts the claim, testing if models truly use evidence.
A question about what would have happened if a variable had taken a different value (e.g., 'what if the patient had received treatment?').
Reasoning about what would have happened under different actions or conditions than what actually occurred.
The process of learning or updating the statistical properties of measurement and process noise in a filtering system.
Aligning a model's sensitivity structure to match the statistical structure of task-irrelevant variations in data.
When the distribution of input data changes between training and real-world use, causing models to fail.
Measuring what proportion of a problem space a model can reliably handle.
Finding an efficient route for a vehicle to visit all cells or areas in a region.
The process of proving that testing has comprehensively covered all relevant operating conditions and edge cases.
Testing approach that systematically explores different input regions to find edge cases and failures.
A quantum operation that preserves physical validity by maintaining positivity and trace properties of quantum states.
Running a model's predictions using a computer's central processor rather than a specialized graphics card, which is slower but requires less specialized hardware.
A measure of how useful and novel the connections a model generates are for creative tasks.
The process of determining which actions or steps in a sequence deserve reward or blame for the final outcome.
Detailed feedback that scores responses across multiple specific evaluation criteria rather than a single overall score.
An agent that reviews and validates the recommendations and execution plan of other agents to ensure correctness and coherence.
Mechanism allowing one sequence to attend to and focus on another sequence.
Transferring knowledge between models with fundamentally different designs, attention mechanisms, or tokenizers.
A neural module that merges information from two sources by learning which parts of each are most relevant.
Testing whether a model trained on one dataset generalizes to perform the same task on a different dataset.
A creativity technique where ideas from one unrelated domain are applied to solve problems in another domain.
Learning to control one body type (like a humanoid robot) using data from a different body type (like humans).
A model architecture that takes a query and document together as input and directly outputs a relevance score, unlike dual-encoders that score them separately.
A loss function that measures how well a predicted probability distribution matches a target distribution.
Running an AI model in different network environments or systems than the one it was trained on.
The ability to understand relationships and transfer knowledge between different languages, such as answering a question in one language based on text in another.
The ability of a model to understand and relate concepts across different languages, allowing it to find similarities between text in different languages.
The ability of a model to understand and work with multiple languages, sometimes even translating concepts between them.
The ability of a model to represent similar meanings in different languages as nearby points in its vector space, so translations and equivalent concepts are treated as semantically close.
The ability of a model or probe trained on one language to work effectively on other languages.
The ability to find and compare similar content across different languages by representing them in a shared mathematical space.
The ability to find relevant documents or text in one language when searching with a query in a different language.
The ability to recognize that sentences or phrases in different languages have the same or similar meaning and represent them close together in numerical space.
The ability to measure how similar two sentences are even when they are written in different languages.
The ability of a model trained on multiple languages to apply knowledge learned from one language to understand or generate text in another language.
The ability of a model to comprehend relationships and meanings across different languages, enabling tasks like translation and multilingual reasoning.
Connecting representations from different types of data (like speech and text) so they work together effectively.
An attack that manipulates multiple input types (like images and text) together to deceive a model.
A mechanism that aligns and weights information between different modalities like images and text.
Ensuring that representations across different modalities (images, 3D, text) align and reinforce each other.
Alignment in how models from different modalities (e.g., vision and language) represent the same stimulus.
The process of combining information from multiple modalities (e.g., vision and text) into a unified representation.
When a model produces contradictory predictions for the same concept represented in different modalities.
The ability to find relationships between different types of content, such as matching natural language descriptions to code snippets.
The ability to connect and reason about information from different input types (like audio and video) together to draw conclusions.
The ability to search and find relevant items across different data types, such as finding images using text queries or vice versa.
The ability of a model to share semantic understanding between different input modalities like vision and text.
The ability to measure how closely related content from different types of input (like images and text) are to each other.
Exchanging information between different input types (text and vision) to guide compression decisions.
The ability of AI tools to access information from other modules and make decisions based on shared context.
How the demand for one product changes when the price of a different product changes.
The ability of a model to perform consistently when input text or audio switches between different writing systems or languages.
The process of comparing and resolving conflicting information from multiple sources to determine accurate answers.
A model's ability to work on new individuals without retraining, despite differences in neural anatomy.
A mechanism that transfers motion information from one camera viewpoint to another while maintaining consistency.
The degree to which internal representations align when processing the same task in different formats or modalities.
Aligning images captured from different viewpoints (e.g., street-level and overhead) to find correspondences.
A 3-dimensional algebraic variety defined by a degree-3 polynomial equation.
NVIDIA's parallel computing platform that runs code on GPUs to process many tasks simultaneously.
Optimized GPU code that performs specific computational operations efficiently.
The ability to understand and infer cultural context, significance, and metadata from visual or textual information.
Statistical measures that describe probability distributions, used to track activation behavior.
Training data that has been carefully selected and filtered to include only high-quality examples relevant to specific tasks or domains.
Carefully selected and filtered training examples chosen for quality rather than quantity, often resulting in models that produce more structured and reliable outputs.
RL approach where agents explore by seeking states where their world model makes poor predictions.
Training strategy that gradually increases task difficulty to help models learn robustly.
Training strategy that presents examples in increasing order of difficulty.
A training constraint that penalizes curved or winding paths in the learned representation space.
A constraint requiring a model to reconstruct its original output after transforming through intermediate steps.
A metric measuring how many different paths code can take; lower values mean simpler, easier-to-maintain code.
An interactive learning method where a human corrects the model's mistakes during training to fix distribution mismatch.
Measuring how much each training example contributes to a model's final performance using gradient-based methods.
When test data accidentally leaks into training, artificially inflating a model's measured performance.
The process of carefully selecting, cleaning, and organizing training data to improve model quality; better curated data often leads to better model performance.
Predicting how a model would behave if specific training examples were excluded without retraining.
Predicting how a model's behavior would change if specific training data were excluded without retraining.
Variation in data distribution across different sources or groups.
Gaps or missing values in a dataset caused by sensor failures, blinks, or other interruptions.
The relevance, accuracy, and usefulness of training data, which can be more important for model performance than simply having more data.
The practice of carefully selecting and filtering training data for relevance and accuracy rather than simply using larger amounts of raw data.
A centralized catalog storing metadata about available data sources and their query interfaces.
A guarantee that your data is stored and processed only in a specific geographic region, helping meet regulatory requirements.
When researchers use datasets from previous studies in their own research rather than collecting new data.
Choosing a subset of training data based on quality or relevance metrics rather than using all available data.
Automatically generating training data from existing datasets to teach models new tasks.
Automated checks that verify data meets quality and correctness requirements before use.
Distributing training data across multiple GPUs that compute gradients independently then synchronize.
Compressing a large dataset into a smaller synthetic version preserving key information.
A neural network design pattern that serves as the structural foundation for this model, determining how it processes and generates text.
Creating entirely new protein sequences from scratch rather than modifying or copying existing ones.
A transformer-based language model architecture that uses disentangled attention mechanisms to improve how the model weighs different parts of the input text when making predictions.
A training approach where a model is developed across multiple independent computers or organizations rather than in a single centralized facility, allowing distributed collaboration.
A mechanism that selects actions based on current state, goals, and expected outcomes to maximize success.
A tool or system that provides information and analysis to help humans make better decisions without replacing human judgment.
An approach that evaluates systems based on the quality of decisions they enable under different costs and benefits.
A component that converts compressed internal representations back into human-readable outputs like audio or images.
A type of LLM that generates text one token at a time, like GPT models.
Language model design that generates text sequentially without a separate encoder, like GPT models.
Converting model outputs into human-readable text or structured predictions.
Methods for generating text from a language model, such as greedy selection, beam search, or temperature sampling.
A reward system that breaks complex requests into atomic, checkable questions to provide interpretable feedback for model training.
The process of removing duplicate or near-duplicate examples from training data to improve model efficiency and prevent overfitting to repeated content.
An AI system that performs multi-step research by reasoning through problems and making multiple search queries.
Automatically identifying problems or errors in software artifacts, such as incomplete or ambiguous descriptions.
The number of independent ways a mechanical part can move or rotate in an assembly.
Consequences of an agent's actions that appear many steps later, making it harder to learn cause-and-effect relationships.
Feedback or verification of agent actions that arrives after a delay, requiring the agent to maintain accountability over time.
A form of democracy where citizens and representatives engage in reasoned discussion to reach decisions.
Using machine learning to predict how much of a product customers will buy given prices and other factors.
Removing or hiding demographic information (like gender) from model inputs to reduce bias in decision-making.
Learning which demographic attributes (race, age, etc.) are most influential in predicting how annotators will judge subjective content.
Training examples collected from real robots performing tasks, used to teach the model how to execute similar actions.
A training approach where the model learns to reconstruct clean audio from corrupted or noisy versions, improving its ability to extract meaningful features.
A neural network trained to reconstruct clean text from corrupted or noisy versions, learning to remove noise while preserving meaning.
A training approach where a model learns to reconstruct clean audio from noisy versions, making it better at understanding speech in real-world conditions.
A technique where a model learns to gradually remove random noise from data to reconstruct meaningful content, used as an alternative to traditional token prediction.
A training objective that learns to predict noise in corrupted data, used in diffusion models for stable gradient-based optimization.
Generating detailed, comprehensive descriptions of images that capture rich visual information and relationships rather than brief summaries.
A compact vector representation where most dimensions contain meaningful information, as opposed to sparse embeddings that are mostly zeros.
Vector representations where most or all of the numbers contain meaningful information, as opposed to sparse embeddings where most numbers are zero.
A neural network where all parameters are active for every input, in contrast to sparse architectures like mixture-of-experts that selectively activate different parts.
A technique that converts documents and queries into dense vectors so that relevant passages can be found by comparing their numerical representations rather than matching keywords.
A compact numerical format where meaning is captured in a fixed-size list of numbers, making it efficient for storage and similarity comparisons.
A search method that converts text into a single, compact numerical vector and finds similar documents by comparing these vectors.
A retrieval system using learned embeddings to find semantically similar documents via vector similarity.
A compact numerical representation where most values are non-zero, used to efficiently store and compare the meaning of text.
A compact numerical representation of text that captures its meaning, allowing the model to compare how similar different pieces of text are to each other.
Numerical representations of text where each word or sentence is converted into a list of numbers that capture its meaning, allowing the model to compare semantic similarity.
A compact numerical format where text is encoded as a list of numbers that capture its meaning, allowing efficient similarity comparisons.
A mathematical space where text is represented as vectors of numbers, positioned so that similar meanings are located close together.
Compact numerical representations where most values are non-zero, used to encode the meaning of text in a form that computers can compare mathematically.
Dense embeddings use all dimensions with non-zero values (like traditional neural embeddings), while sparse embeddings mostly contain zeros and are more interpretable and storage-efficient.
A method that aligns models by learning from the geometric clustering of accepted responses in the model's representation space.
The ability to understand how multiple facts relate to and affect each other when making decisions.
Risk that a single model's values or biases get applied uniformly at scale, eliminating the diversity of perspectives that would naturally exist with multiple decision-makers.
An image where each pixel's brightness represents how far away that object is from the camera.
A technique that creates a larger model by combining and stitching together layers from smaller pre-trained models rather than training a new model from scratch.
The process of restoring a compressed model's weights to higher numerical precision, improving quality but requiring more memory.
A new model created by modifying or fine-tuning an existing base model rather than training from scratch.
A numerical representation that captures the visual characteristics around a detected keypoint, allowing the model to match similar points across different images.
Generating model weights using text or structured descriptions of the target architecture and task as input.
The loss of professional expertise and judgment that occurs when workers rely on automated systems instead of developing their own capabilities.
A mathematical model that generates diverse sets of items by penalizing similarity, useful for ensuring variety in generated outputs.
Automated verification rules that produce the same result every time, used when there is clear evidence of task completion.
An early, pre-release version of a model used for testing and refinement before public release.
Fine-grained, skillful robotic hand control requiring precise coordination of many joints.
AI process of identifying root causes or problems from observed symptoms.
The process of an AI model creating natural conversational responses based on input text.
The process of finding a set of basis vectors (dictionary) that can reconstruct data through sparse combinations.
The ability to understand and apply code changes (diffs) to existing files rather than generating code from scratch.
A property of operations that allows gradients to flow through them during backpropagation for model training.
Smooth mathematical function approximating non-differentiable operations for training.
Mathematical functions that measure how far a model's output is from desired behavior, designed to be optimizable via gradient descent.
A learnable memory retrieval mechanism that can be trained end-to-end to recall relevant past episodes for current decision-making.
A physics solver built into a neural network so that gradients can flow through physical laws during training.
A reward function whose gradients can be computed, allowing optimization of model outputs toward desired properties.
A sparse attention method that supports gradient computation, enabling end-to-end training with learned sparsity patterns.
A list of possible medical conditions ranked by likelihood, used by clinicians to guide further testing.
A mathematical framework that adds controlled noise to data to protect individual privacy while enabling statistical analysis.
A technique to systematically increase problem complexity to better differentiate model capabilities.
Predicting how hard a task is to automatically adjust the amount of computational effort needed.
An internal indicator that estimates how hard a problem is, used to guide model behavior.
Language models that generate text by iteratively refining noisy predictions into coherent words.
Generative model that creates images or videos by gradually removing noise from random data.
AI models that generate images by learning to reverse a noise-adding process, starting from pure noise.
A generative approach that iteratively refines predictions by gradually removing noise from random initial states.
A learned distribution that guides diffusion models toward realistic outputs in a specific domain.
A generation method that iteratively refines outputs by gradually removing noise, rather than predicting tokens one at a time from left to right.
Iterations in a diffusion model that gradually refine noise into a final image or video output.
A transformer architecture adapted to work with diffusion-based generation processes.
A neural network design that generates outputs by iteratively refining noisy predictions into clear results, rather than building text one token at a time like traditional language models.
A method where a model generates text by iteratively refining noise into coherent output all at once, rather than predicting one word at a time.
A language model that generates text by iteratively predicting and refining masked (hidden) tokens across the entire output, rather than predicting one token at a time from left to right.
Using diffusion models to generate realistic robot motion sequences that can be used as training data.
A virtual simulation model of a physical system used to predict behavior and test changes before real-world deployment.
A convolutional operation that skips input elements to capture patterns at multiple scales without increasing parameters.
A technique to simplify high-dimensional parameter spaces by identifying and focusing on the most critical variables.
Techniques that compress high-dimensional data into fewer dimensions while preserving important patterns.
A training technique that teaches a model to prefer certain outputs over others by learning from examples of better and worse responses.
A graph structure representing causal relationships where arrows point from causes to effects with no cycles.
A workflow representation where tasks are nodes and dependencies are directed edges with no circular paths.
A measure of smoothness on a graph that quantifies how much node values vary across connected edges.
The logical flow and consistency of ideas across sentences in a text or conversation.
Examining how language serves specific communicative purposes in conversation, like validating feelings or paraphrasing.
The challenge of moving from discovering causal rules to engineering them into working systems.
A generative model that iteratively removes noise from discrete tokens (like words) to generate text, as an alternative to autoregressive decoding.
Generative models that iteratively denoise discrete tokens (like words) from noise to produce text.
Compressed representations of audio data stored as specific, distinct values rather than continuous numbers, making them efficient for storage and processing.
A compressed representation where continuous data is converted into distinct, countable tokens or categories.
A communication channel where each transmitted symbol is corrupted independently with no memory of past transmissions.
Individual units of quantized information that represent audio in a compressed, symbolic form rather than continuous values.
Converting continuous numerical values into discrete bins or categories for processing by algorithms.
The ability of a model to generalize across different mesh resolutions or numerical discretizations of the same continuous problem.
A pattern in token gradients that effectively distinguishes high-reward responses from low-reward ones.
Separating different factors of variation (like expression and identity) in a model's learned representations.
A smaller, faster version of BERT that retains most of its language understanding ability while using fewer parameters and less computational power.
A technique that compresses a large, complex model into a smaller one by training it to mimic the larger model's behavior, resulting in faster inference with minimal loss of quality.
A model that has been compressed by training a smaller model to mimic a larger, more capable model, reducing size and computational requirements while retaining performance.
A smaller, faster version of a larger model created by training it to mimic the larger model's behavior, reducing computational requirements while maintaining reasonable performance.
Using multiple computers or servers across a network to share the computational work of training or running a model, rather than relying on a single machine.
When the data distribution used for training differs from the distribution encountered during deployment, causing performance degradation.
Modifying a model's output probability distribution at inference time to satisfy constraints without changing the model's weights.
When a policy becomes overly specialized in reproducing successful behaviors without learning to handle diverse situations or recover from failures.
When a model encounters data that looks different from what it was trained on, causing performance to drop.
When a model's behavior diverges from the original training data distribution during fine-tuning or RL.
A mathematical space where words are represented as vectors based on their usage patterns in text, like GloVe or Word2Vec.
Ensuring benefits and harms are equitably distributed across agents rather than concentrated in hubs or privileged positions.
Forcing a model's output distribution to match a target distribution, here used to normalize reward structures across different tasks.
Learning to predict probability distributions over outputs rather than single deterministic predictions.
When the statistical properties of data change over time, making old patterns unreliable for future predictions.
A regularization technique that limits how far a model's distribution can drift from a reference distribution during training.
A mathematical property ensuring that a velocity field conserves mass (no fluid is created or destroyed at any point).
A metric measuring the quality of unique answers generated relative to the best possible answer set of the same size.
A ranking method that prioritizes both relevance and variety, ensuring results cover different perspectives or approaches.
The ability to identify and explain which retrieved documents contributed to a generated answer.
The natural division between separate documents used as a constraint to group tokens for shared expert selection.
The process of breaking long documents into smaller pieces before embedding them, which this model is optimized to work with effectively.
Anchoring AI responses to specific source documents to ensure answers are based on provided content.
The ability to automatically extract, understand, and convert information from document images (like scans or forms) into structured, machine-readable formats.
The process of identifying and understanding the structure of a document, such as text regions, tables, and columns.
The process of automatically reading and extracting structured information like text, tables, and layout from documents.
Finding the relevant documents or passages from a large collection that are needed to answer a question.
The ability to maintain the original layout, formatting, and organization of a document when extracting text, rather than just outputting raw characters.
The ability to read and extract meaningful information from structured documents like receipts, invoices, and forms by recognizing both text and layout.
Tasks that require processing, searching, and reasoning over large collections of documents to find answers.
Understanding and answering questions that require information from multiple parts of a full document.
Training a model on data from multiple specialized fields (like general text, scientific papers, and medical literature) so it works well across all of them.
A specialized expert in an MoE model trained to handle reasoning or task-specific knowledge rather than raw perception.
Training models to work well on new, unseen domains beyond their training data.
A technique that automatically creates many fake domain names to evade detection and maintain control of malicious infrastructure.
How well a model's responses are anchored in accurate, specialized knowledge specific to a field rather than generic or hallucinated information.
Specialized expertise and facts about a particular field or subject area that an AI model has learned during training.
When a model encounters data from a different source or environment than it was trained on, causing performance to drop.
When a model is trained to excel at a specific task or set of languages rather than being a general-purpose tool.
Programming languages designed for specialized tasks in particular industries or fields.
A model that works effectively across many different subject areas and use cases without needing to be retrained for each one.
Abstract problem formulations that can be recognized and solved across multiple unrelated academic fields.
A model's ability to understand and respond accurately to topics within a specific field or area of expertise it was trained on.
An AI planning algorithm that solves problems in any domain without domain-specific customization.
A model trained specifically on data and tasks from a particular field (in this case, chemistry) to achieve higher accuracy in that domain than general-purpose models.
Tailored or optimized for a particular field or type of content, such as news, reviews, or scientific writing.
Assessment tailored to a particular field (like law) using metrics and error types relevant to that domain.
Training a model on specialized data from a particular field (like medicine) so it becomes expert at tasks in that domain rather than being a generalist.
The ability to generate text tailored to a particular field or context, such as legal documents, Wikipedia articles, or product reviews.
Specialized expertise required for a particular field, like vendor-specific scanner operations in medical imaging.
Specialized vocabulary and terminology unique to a particular field or industry, like medical jargon in healthcare or mathematical notation in physics.
A language model trained exclusively on text from a particular field or subject area, making it much better at understanding and generating content in that domain than general-purpose models.
A language model trained specifically on data from one field (like biomedical research) rather than general internet text, making it excel at specialized tasks.
Training a model to excel at tasks within a particular field (like legal documents) rather than being a general-purpose model.
Training a model on specialized data from a particular field (like biomedical literature) rather than general internet text, making it much better at understanding that field's concepts.
Specialized workflows and methodologies unique to a particular field that require expert knowledge to execute correctly.
Training a model exclusively on data from a narrow domain (like Python code) rather than general text, making it highly specialized but less versatile.
Training or adapting a model to specialize in a particular field (like biomedicine) rather than performing equally well across all topics.
A fine-tuning method that adapts model weights by separately learning magnitude and direction changes, extending LoRA.
A method of comparing two vectors by multiplying their components and summing the results, where vector magnitudes (length) affect the final score.
A square matrix where all rows and columns sum to 1, used to represent valid probability distributions for mixing multiple streams.
Reducing an image's resolution by removing pixels, making it smaller and faster to process.
A specialized AI model that receives requests routed to it by another system and performs the actual task or generates the final response.
Specific applications or problems that use the output of a pretrained model, such as predicting protein structure or identifying protein function.
The smaller neural network component in speculative decoding that quickly generates candidate tokens before verification by the main model.
A smaller, faster model used in speculative decoding to quickly propose token sequences before a larger model verifies them.
A tree structure of multiple candidate token sequences proposed by a draft model, allowing parallel verification of multiple continuations.
The process of automatically identifying and classifying different driving behaviors (e.g., aggressive vs. normal) from sensor data.
A system with two separate neural networks—one that processes questions and one that processes documents—both converting their inputs into comparable vector embeddings.
The parallel development and deployment processes for machine learning models and traditional software components.
The danger that AI technology can be misused for harmful purposes despite benign original intent.
A model with separate encoders for two input modalities that map them into a shared embedding space.
Organizing information at two levels of detail: high-level task guidance and low-level step-by-step actions.
An approach combining two complementary methods—one for logical reasoning and one for learning patterns—to solve a problem better than either alone.
A single model trained to perform multiple distinct tasks, such as both text generation and embedding, rather than being specialized for just one.
An architecture using two parallel processing streams with different time scales—one dense and one sparse.
A minimal, non-functional model used for testing infrastructure and workflows without the computational cost of a real model.
The ability to generate responses with a specific target length or speaking time.
Training approach that evaluates which skills remain helpful during learning and selectively retains only those that improve the current policy.
A formal system for reasoning about how beliefs and knowledge change when new information is revealed.
Building a network representation that changes over time to reflect evolving relationships, like road connectivity adjusted for traffic incidents.
Combining task-specific model parameters at inference time based on input features, rather than using a fixed merged model.
Automatically choosing the best execution approach (LLM reasoning, tool use, or code) for each step based on task requirements.
The ability to automatically adjust how many of a model's parameters are actively used based on available computational resources, allowing the same model to run efficiently on different hardware.
An optimization method that breaks problems into smaller subproblems and solves them recursively, storing results to avoid recomputation.
Removing training samples during training based on their importance or quality, rather than before training starts.
A quantization approach that adjusts precision levels during inference based on the input data, optimizing the balance between speed and accuracy on-the-fly.
Automatically creating questions of varying difficulty that adjust in real time based on learner responses and comprehension.
The process of recovering or reconstructing the full range of brightness values lost when converting from HDR to standard video formats.
A measure of how well an algorithm performs compared to the best possible strategy that adapts to changing conditions.
Choosing packet paths through a network in real-time based on current network conditions.
Mathematical models describing how systems evolve over time according to fixed rules.
Building neural network models that accurately capture the underlying rules governing how a system evolves over time.
A compressed representation of states that captures how the environment changes over time.
A technique for verifying program equivalence by representing multiple equivalent forms in a graph structure.
Stopping a model's computation before completion when sufficient confidence is reached, reducing computational cost.
Combining multimodal inputs (like text and images) at early layers of a model rather than after separate encoding.
Combining multiple objectives into a single weighted sum before training, which locks in a fixed trade-off.
Extended Berkeley Packet Filter; a technology for running sandboxed programs in the OS kernel to monitor system behavior.
A recording of the electrical signals produced by the heart, used to detect heart problems.
The ability to anticipate and address unusual or boundary conditions in code that might cause errors.
Processing data locally on a device at the edge of a network rather than sending it to a central cloud server, improving speed and reducing dependency on internet connectivity.
Running a model directly on local devices like phones, tablets, or IoT hardware rather than sending data to a remote server.
A computing device at the edge of a network (like a smartphone or IoT device) that runs AI models locally rather than sending data to a remote server.
The process of validating and removing unnecessary connections in a graph to improve its quality and interpretability.
Computing infrastructure spanning from edge devices (sensors, local hardware) to centralized cloud servers.
Attention mechanisms designed to reduce computational or memory complexity compared to standard quadratic-scaling attention.
Visual understanding from a first-person viewpoint, as seen from the wearer's perspective.
Understanding a scene from the viewpoint of a camera or observer positioned within the environment.
An AI system integrated directly into electronic health record software to assist clinicians with documentation or decision-making.
A special function that remains proportional to itself when transformed by an operator, used to decompose system behavior.
Dynamically adjusting the detail level and size of stored information based on current task relevance.
Simulating how deformable materials stretch, bend, and return to shape based on physical material properties.
A technique that protects important weights from previous tasks by adding a penalty term during learning.
A training objective used in probabilistic models to maximize the likelihood of observed data.
A pre-trained language model that learns by predicting which tokens in a sentence have been replaced, making it efficient and effective for downstream tasks.
Finding optimal delivery routes for electric vehicles that must visit customers within time windows and recharge at stations.
A recording of electrical brain activity used to detect neurological conditions like seizures.
Digital records of patient medical history, diagnoses, medications, and clinical events stored in structured formats.
A specialized computing device with limited resources designed to run specific applications, often integrated into physical systems.
A dense numerical vector that represents a word, sentence, or concept in a high-dimensional space.
Organizing vector representations of tokens into groups based on their semantic similarity.
The size of the numerical vector produced by an embedding model; larger dimensions capture more detail but require more storage and computation.
The number of numerical values used to represent a piece of text (1792 in this case), where more dimensions allow for more detailed semantic information to be captured.
The spatial structure and relationships between data points in a learned vector space.
The learning rate specifically applied to the embedding layer, which can be scaled independently from other layers.
A model that converts text into numerical vectors that capture semantic meaning, allowing computers to understand and compare the similarity between different pieces of text.
The model produces dense numerical vectors that represent the semantic meaning of text, which can be used for similarity comparisons or as input to other models.
Adding controlled noise to vector representations of text to obscure sensitive information.
A numerical vector representation of text that captures semantic meaning for comparison and analysis.
A metric that measures how similar two pieces of content are by comparing their numerical vector representations.
A mathematical space where text is represented as vectors, allowing similar texts to be positioned close together and enabling operations like similarity search and clustering.
Different ways to represent words as vectors (semantic, acoustic, or phonetic).
Removing duplicate or near-duplicate examples by comparing their vector representations in embedding space.
Comparing semantic representations (embeddings) to find similar content without reprocessing raw data.
Numerical representations of text that capture semantic meaning, allowing the model to measure similarity between different words or phrases.
AI systems designed to interact with and understand the physical world through robotic bodies or sensors, rather than just processing text.
The process of choosing which action a robot should execute next based on perceived state and task context.
Real-world performance metrics for robots like task completion time, motion smoothness, and energy consumption.
Robot learning and control for physical interaction tasks using integrated sensing and actuation.
An AI model trained on real-world physical interactions and sensor data from robots, rather than text or simulations alone.
The ability to understand and reason about physical tasks and spatial relationships in the real world, not just abstract concepts.
A representation or model that works across different body types or physical forms without being specific to one.
The point during training when a model suddenly gains the ability to perform a task above a threshold accuracy.
A measure of solution quality that arises from system dynamics rather than being explicitly defined beforehand.
When a model trained on narrow misaligned behavior generalizes to more severe harmful behaviors outside its training distribution.
The spread of emotions from one agent to others through interaction and observation.
Using emotionally-toned language or affective phrasing in prompts to influence model behavior.
The positive or negative quality of an emotion, ranging from negative to positive.
Training a model to recognize and respond to emotional context in conversations, prioritizing understanding and emotional connection over purely factual responses.
Instructing an LLM to generate responses with emotional awareness and compassion for patient concerns.
An algorithm approach that finds the best solution by minimizing errors on observed data.
A neural network trained to mimic the behavior of a complex physical model or simulation.
A model component that transforms input sequences (like protein amino acids) into meaningful numerical representations without generating new sequences.
A neural network component that transforms input text into a compressed numerical representation, focusing on understanding and extracting meaning rather than generating new text.
A model designed to convert inputs (like images or text) into numerical representations for understanding, rather than generating new content.
A neural network that transforms input data into a compressed representation, rather than generating new text or making predictions.
Models like RoBERTa that process text to understand meaning, typically used for classification tasks.
A neural network architecture with two parts: an encoder that processes input text and a decoder that generates output text, allowing the model to transform one sequence into another.
A neural network design that processes input text to understand and represent it, but cannot generate new text from scratch.
The tool or gripper at the end of a robot arm that physically interacts with objects in the environment.
Training data that only provides the final correct answer without showing the reasoning steps used to reach it.
An autonomous driving approach that directly maps sensor inputs to control outputs without explicit intermediate representations.
Training a model to solve a complete task directly from raw input (like document images) to final output, without breaking it into separate intermediate steps.
A system that takes raw input (like an image) and produces final output (like structured text) in one unified model, rather than chaining multiple separate tools together.
An optimization algorithm that preserves energy while descending to escape local minima.
A function that assigns a scalar value to each point in a space, defining an unnormalized probability distribution.
Recurring behaviors showing how users interact with content or systems over time.
A training technique where knowledge from multiple models is combined and compressed into a single, smaller model for better efficiency.
Combining multiple models to make better predictions than any single model alone.
A safety technique that combines outputs from multiple models and selects the most agreed-upon result.
Probabilistic scores assigned to multiple documents that determine their relative contribution to the final answer.
A language model specifically optimized for business and organizational use cases, prioritizing reliability, consistency, and professional output over other characteristics.
The task of recognizing that different names or phrases refer to the same real-world concept, such as matching 'MI' with 'myocardial infarction'.
Maintaining the same appearance and identity of characters, objects, and locations across different scenes in a video.
Automatically identifying and pulling out specific names, places, or things from text.
The task of identifying mentions of real-world concepts in text and connecting them to their canonical definitions in a knowledge base or ontology.
The task of identifying when different text references refer to the same real-world concept, such as matching variant spellings of a drug name to a single clinical entity.
A question-answering evaluation framework that tests whether models can retrieve factual information about specific entities.
A data structure that represents entities (like users or devices) and the typed relationships between them.
A regularized version of optimal transport that adds entropy constraints to encourage smoother, more balanced assignments between sources and destinations.
The gradient of prediction uncertainty with respect to visual embeddings, used to identify ambiguous regions.
Encouraging an agent to explore diverse state-action pairs by maximizing the entropy of its occupancy measure.
Controlling the randomness of a model's outputs to prevent it from becoming too deterministic or too random during training.
A decoding approach that continues unmasking tokens until cumulative entropy exceeds a threshold, balancing generation speed and quality.
System state where the ability to generate random numbers becomes the limiting factor rather than arithmetic computation.
Automated creation of task specifications and evaluation settings for training or testing agents.
AI system's ability to store and recall specific past events or experiences.
A situation where different participants have different information or knowledge about the same topic.
The preservation of an agent's ability to form accurate beliefs and maintain truthful internal representations.
The degree to which discourse relies on evidence-based reasoning versus intuition and subjective belief.
Uncertainty from lack of knowledge that can be reduced with more data or better models.
A phenomenon where the model learns to place its initial output near the fixed point, allowing inference without iteration.
Neural networks designed to respect geometric symmetries and transformations in molecular or crystal structures.
How well a design follows established principles for human comfort, safety, and efficient use of space.
Systematic examination of model failures to identify patterns and root causes beyond aggregate metrics.
Firmware algorithms that detect and correct errors in memory to maintain reliability as storage density increases.
How mistakes in early steps of a process accumulate and worsen downstream results.
A structured classification system that categorizes different types of errors to enable systematic analysis and mitigation.
A topological invariant that counts connected components, holes, and voids in a shape to characterize its structure.
When an evaluator systematically biases its judgments based on contextual information rather than actual content quality.
When AI judges appear to agree on scores but are actually using shallow patterns rather than substantive reasoning about quality.
A quantitative measure used to assess how well a model or system performs on a specific task.
A specialized language model trained to assess and score the quality of outputs from other AI models, acting as an automated judge.
An attack where an adversary modifies input features at test time to fool a deployed classifier.
A sensor that captures pixel-level brightness changes asynchronously, producing sparse temporal event streams.
Temporal representations that capture when and how much change occurs in music or video.
Automatically detecting higher-level events from lower-level timestamped observations using logical rules.
Grouping related incident reports together to identify a single underlying problem from multiple user descriptions.
Recording all changes to data as a sequence of immutable events for full history tracking.
A generalized pattern representing a class of similar log messages with variable fields.
Combining information from multiple frames or observations to make a single robust decision or diagnosis.
When a model's answer directly contradicts the provided evidence or clinical guidelines.
A model's ability to change its predictions based on whether evidence supports or contradicts a claim.
Linking AI outputs to specific source documents or facts that support them.
A collection of diverse, complementary pieces of evidence retrieved to support multi-faceted reasoning.
Fixing errors in code or theory by using specific signals like test failures and reviewer feedback to target the root cause.
A method that combines multiple predictions while quantifying uncertainty using evidence theory.
A training method that gradually increases the complexity of instructions given to a model, helping it learn to handle increasingly difficult tasks.
An AI optimization technique that mimics natural selection to explore and improve solutions over many iterations.
A statistical property ensuring that the order of data points doesn't matter, required for conformal prediction to provide valid guarantees.
Saving and reusing working code solutions instead of text descriptions for repeated tasks.
Stateful, runnable systems that simulate real-world tool interactions and can verify agent actions.
Detailed analysis of why an action succeeded or failed, beyond just binary success/failure signals.
Anchoring AI-generated questions and explanations to actual runtime behavior and concrete execution traces.
A detailed strategy for solving a problem, which can be implemented and tested before committing to a final answer.
A record of every step a program takes as it runs, including variable values and function calls.
Detailed information about what happened during a program's execution, used to diagnose failures.
Validating agent behavior by running code and checking if outputs match expected results, rather than relying on static analysis.
A variable in a causal model that is not caused by any other variables in the model; represents external sources of randomness.
An acquisition function that selects points likely to improve over the current best solution.
Useful patterns and insights extracted from real-world interactions and deployment experience.
Learning through direct interaction with the environment and feedback from actions taken.
Strategically choosing which experiments to run to maximize information gain given a limited budget.
The process of testing hypotheses through controlled experiments to uncover causal relationships.
An early version of a model released for testing and feedback, which may have bugs or incomplete features compared to stable versions.
A measure of how much each expert in an MoE model contributes to the final output, used to decide which experts need higher precision.
The mechanism in a mixture-of-experts model that decides which specialized sub-networks should process each piece of input.
The process where different experts in an MoE learn to handle distinct types of inputs or tasks (e.g., code vs. math).
How evenly the workload is distributed across experts; balanced utilization prevents some experts from being unused.
The ability to understand and interpret why an AI model made a specific decision or prediction.
Whether a model applies the same reasoning strategy (highlights the same regions) across different instances of the same class.
A mode where a model generates visible reasoning steps before producing a final answer, allowing you to see its problem-solving process.
A feature that allows a model to show its reasoning process step-by-step before providing an answer, useful for complex problems that benefit from deliberate problem-solving.
The maximum gain a player can achieve by deviating from an equilibrium strategy.
The process of trying diverse actions during training to discover which ones lead to better outcomes.
Balancing between exploiting known good solutions and exploring new possibilities to find better ones.
A weighted average that gives more importance to recent values than older ones.
A model's ability to handle facial expressions it wasn't explicitly trained on by learning underlying expression patterns.
The capability to work with and maintain understanding across large amounts of text or multiple documents during reasoning.
Estimating both the position and shape of objects that occupy multiple sensor measurements.
A capability that allows a model to think through complex problems step-by-step internally before providing a final answer.
A reasoning technique where a model works through a problem step-by-step internally before providing an answer, improving accuracy on complex tasks.
Reward signals based on computational verification methods rather than the model's own internal signals.
Whether results from a controlled study apply to real-world situations outside the lab.
Predicting model behavior in a region (like very large training runs) based on observations from smaller regions.
Making predictions beyond the range of training data, such as forecasting system behavior at untested excitation levels.
Technology that identifies or verifies people by analyzing facial features in images.
An optimization technique that selects diverse items by maximizing how well they represent the full set of options.
The process of verifying claims against reliable sources to determine their accuracy.
Verifying if claims are true using only an LLM's internal knowledge, without searching external databases.
A decomposition of norm computation into smaller intermediate terms to avoid materializing large dense matrices.
How often an AI model produces correct, verifiable information without errors or false claims.
Anchoring a model's responses to verified, real-world information rather than relying solely on patterns learned during training.
An LLM's ability to accurately retrieve and output factual information from its training data.
A group of related system components or subsystems that share common failure modes and characteristics.
The quantified likelihood that an AI system will make a harmful or incorrect decision in real-world deployment.
Systematic evaluation of an AI system to detect and measure bias across demographic groups or decision scenarios.
Whether an AI model's stated reasoning actually explains how it arrived at its answer, or if it's post-hoc justification.
The task of identifying false or misleading news articles, typically framed as a classification problem.
When incorrect or outdated information from past interactions influences future reasoning.
The ability to identify when a question contains incorrect assumptions or fabricated facts before answering.
A greedy algorithm that selects points by always choosing the one farthest from previously selected points.
A method for efficiently updating model parameters or memory states during forward passes without full recomputation.
Model parameters that are quickly adapted during inference to capture task-specific or input-specific patterns.
Pinpointing the exact location of bugs or errors in code or systems.
A graph showing how errors flow through transformer components from their origin to observable symptoms.
The ability of a system to continue operating correctly even when components fail.
Automatically checking whether a problem instance has at least one valid solution before using it for testing.
Enhancing a model by adding hand-crafted or extracted features (like linguistic metrics) alongside learned representations.
Storing intermediate computed features during inference to reuse them in later steps, reducing redundant computation.
The process of selecting and designing input features that a machine learning model uses to make predictions.
The process of using a model to convert raw input text into numerical representations (features) that capture the meaning of the text.
When a single concept is scattered across many separate features instead of being cleanly captured by one or a coherent group.
A measure of how much each input variable contributes to a model's predictions.
How multiple input features combine together to influence a model's prediction, beyond their individual effects.
A method to identify how combinations of input features jointly influence a model's predictions, beyond individual feature effects.
A measure of how well different visual concepts can be distinguished in a model's learned feature space.
A technique that dynamically adjusts learned representations by scaling and shifting features based on problem-specific conditions.
Training models across multiple devices without centralizing sensitive data in one place.
A standard neural network layer in transformers that processes information independently at each position.
A neural network that processes input in a single forward pass without recurrence or iterative refinement.
The method used to apply feedback text to refine and improve a search query representation.
Where the text used to improve a search query comes from, such as LLM-generated text or actual documents.
Using execution results and error signals to adaptively adjust agent behavior and improve reliability over time.
Training or prompting a model with only a small number of examples to perform a new task.
The degree to which a quantized or compressed model preserves the quality and accuracy of the original full-precision model.
A filtering mechanism that only includes accurately generated entity appearances in consistency evaluation metrics.
A measure of how well an explanation captures the true reasoning of a model by testing prediction changes.
A code completion technique where the model predicts missing code between existing lines, rather than only generating code forward from a starting point.
Distinguishing between very similar categories, like telling apart different bird species rather than just identifying 'bird vs. not bird'.
The ability to accurately generate readable text and small details within generated images.
Small, specific visual elements in an image, such as text within a photo or subtle differences between similar objects.
The ability to further train or customize a pre-trained model on your own data to adapt it for specific tasks or domains.
A model created by training an existing pre-trained model on new data to specialize it for specific tasks or behaviors.
A pre-trained model further trained on a smaller, task-specific dataset to improve performance on that task.
The process of further training a pre-trained model on new data to adapt it for specific tasks or domains.
A numerical technique that breaks a complex domain into small pieces to solve physics equations approximately.
Mathematical structures with finitely many elements where arithmetic operations follow specific rules.
A problem setting with a fixed, known endpoint in time, as opposed to indefinite or infinite-horizon problems.
A recurrent neural network model where neurons output continuous activation rates rather than discrete spikes.
A formal language for expressing rules and constraints using predicates, variables, and logical operators.
The time it takes for a stochastic process to reach a target state for the first time.
The initial search system that finds candidate documents before refinement techniques are applied.
A metric measuring the difference between score functions of two distributions.
A variant of dynamic programming that first estimates unknown functions (like demand) from data, then uses those estimates for optimization.
A moment when the eye pauses on a specific location while viewing an image, typically lasting 100-500 milliseconds.
Repeatedly applying a function until it converges to a stable value, used here for test-time computation in looped models.
Finding a stable state where a function's output equals its input, used here to refine embeddings iteratively.
Embeddings that always produce vectors of the same length regardless of input length, which limits how much detail can be captured for very long documents.
A company's primary, most capable model designed to showcase their best technology and handle the most demanding use cases.
Software abstraction that maps logical addresses to physical memory locations in SSDs, managing wear and errors.
Dynamically allocating wireless frequencies based on real-time demand instead of fixed assignments.
The process of deciding where to place components on a chip to meet design constraints and performance goals.
Generating data by learning reversible transformations between simple and complex distributions.
A generative modeling technique that learns to transform random noise into realistic data by following learned flow paths.
Functional magnetic resonance imaging; a non-invasive technique measuring brain activity through blood flow changes.
A training approach combining focal loss (which focuses on hard examples) with contrastive learning to handle imbalanced datasets.
A training approach combining focal loss (which emphasizes hard examples) with contrastive learning to handle imbalanced datasets.
Custom sound effects created to match specific actions or movements in video, like footsteps or door slams.
A parameter that controls how quickly a filter discounts old data, balancing between adapting to new conditions and maintaining stability.
Testing reward hypotheses by branching from shared policy checkpoints and comparing short-horizon performance to assess reward quality.
Expressing system requirements or policies in a precise mathematical language that tools can automatically verify.
Mathematical proof that a system meets its specifications, here implemented in Lean 4 to certify material stability predictions.
Real-time guidance given to students during learning to help them improve, rather than just assigning a final grade.
Simulating a robot's future states by repeatedly applying its dynamics model to predict outcomes of candidate actions.
A training objective that penalizes the model for assigning probability to regions the true distribution doesn't cover.
A single computation cycle where input data flows through the model's layers to produce an output prediction.
An agent's reasoning about future consequences and goals rather than just reacting to past events.
An efficient method for computing derivatives by propagating changes forward through a computation graph.
A large pre-trained model that serves as a starting point for building other models, rather than being trained from scratch.
The underlying structural design of a neural network that determines how it processes and learns from data, distinct from standard transformer designs.
Large pre-trained AI models that can be adapted to many different tasks without starting from scratch.
Mathematical representation showing which frequencies (periodic patterns) are present in data.
A data format that stores model weights using 16-bit floating-point numbers, preserving full model accuracy while using less memory than 32-bit formats.
A low-precision numerical format that uses only 4 bits to represent numbers, enabling faster computation and smaller model sizes compared to standard 32-bit precision.
A 4-bit number format used in quantization that represents values with minimal precision, significantly shrinking model size while maintaining reasonable accuracy.
A 4-bit floating-point number format that represents model weights with very low precision, enabling extremely efficient inference on compatible hardware.
A ultra-low precision format using 4-bit floating-point numbers to represent model weights, enabling extreme compression.
A compression technique that represents model weights using only 4-bit floating-point numbers instead of larger formats, reducing memory usage and speeding up inference.
A compressed number format that uses 8 bits instead of the standard 32 bits, dramatically shrinking model size at the cost of slightly reduced precision.
A specific quantization method that uses 8-bit floating-point numbers and adjusts precision dynamically based on the data being processed, balancing speed and accuracy.
An 8-bit numerical format that stores numbers with reduced precision compared to standard formats, enabling smaller model sizes and faster computation.
A data format that stores numbers using 8 bits instead of the standard 32 bits, significantly reducing memory requirements with minimal quality loss.
A compression technique that reduces model size by representing weights using 8-bit floating-point numbers instead of higher precision, making it faster and more memory-efficient.
A set that an optimization trajectory converges to, with self-similar structure at multiple scales rather than converging to a single point.
A model's ability to produce answers without predefined options, requiring genuine recall and reasoning.
How often different facts or tokens appear in training data, which affects what models learn.
Decomposing signals into high-frequency (details, edges) and low-frequency (overall structure, semantics) components.
Evaluating model performance separately for rare, medium, and common classes to reveal patterns hidden by overall metrics.
The automated creation of user interface code and visual elements based on descriptions or specifications.
A state-of-the-art AI model representing the cutting edge of what's currently possible in terms of capability and performance.
State-of-the-art, cutting-edge AI models that represent the current best performance in the field.
A model that represents the current state-of-the-art or cutting edge in AI capabilities, competing with the most advanced models available.
The largest and most advanced language models available, representing the cutting edge of AI capabilities.
A cutting-edge AI model representing the current state-of-the-art in performance and reasoning capabilities.
A pre-trained model component that is kept unchanged during training to preserve its learned knowledge.
A model using standard 32-bit floating-point numbers to represent weights, providing maximum accuracy but requiring more memory.
Model parameters stored at maximum numerical accuracy (typically 32-bit floating point), which provides the best quality but requires more memory and computation.
The ability of a model to output structured requests to invoke external tools or APIs rather than generating free-form text.
Internal model representations that encode what tasks do, allowing comparison of task similarity and prediction of learning trajectories.
Vector representations of tasks extracted from model activations during in-context learning.
Growing a model's capacity while mathematically guaranteeing it behaves identically to the original at the start.
Mathematical operations like rotations that rearrange a model's weights without changing what the model computes.
Specifications describing what a software system should do and its specific behaviors and features.
A discrete token that encodes both an agentic operation and latent visual reasoning capability without explicit visual supervision.
An attention mechanism that progressively compresses and simplifies the input sequence, reducing computational cost while maintaining important information.
GPU operations combined into a single kernel to reduce memory traffic and improve computational efficiency.
Logic-based rules that handle uncertainty and gradual membership rather than strict true/false classifications.
Comparing text strings by measuring character-level similarity rather than exact matches.
A mechanism where a context signal scales the magnitude of state-dependent responses without changing their underlying structure.
A formal notation for encoding game rules so different AI systems can play the same game consistently.
A stable state where no agent can improve their outcome by unilaterally changing their strategy.
A learned mechanism that selectively applies corrections to predictions based on per-dimension scaling factors.
A neuron that controls whether tokens are routed to standard or exception processing paths.
A learned or rule-based function that selectively enables or disables components based on input conditions.
A mathematical property ensuring a model's predictions remain consistent regardless of arbitrary coordinate system choices or numerical representations.
A statistical model that learns patterns from data and provides uncertainty estimates for predictions.
Systematic tendency of models to favor one gender over others in language generation and translation tasks.
The capability to think through problems logically, break down complex questions, and arrive at conclusions across a wide variety of topics.
Designed to handle a wide variety of different tasks rather than being specialized for one specific domain.
A model trained to handle a wide variety of text tasks—like writing, answering questions, and reasoning—rather than being specialized for one specific task.
An AI model designed to handle many different types of tasks well, rather than being specialized for one specific domain.
A model trained to perform well across many different types of tasks rather than being specialized for one specific domain.
A robot trained to perform many different everyday tasks rather than being specialized for one specific job.
A model's ability to perform well on new, unseen data that differs from what it was trained on.
The difference between a model's performance on training data versus unseen test data.
A mathematical method for aligning and comparing representations across different neural networks by finding optimal rotations.
An inference approach where a model generates an intermediate image before answering a question about it.
Vector representations of text created by generative language models that capture semantic meaning.
A probabilistic framework that generates samples with probability proportional to a reward function, useful for optimization tasks like molecule discovery.
A model trained to generate new text by predicting the next word or sequence of words based on patterns it learned during training.
An AI model trained to create new data (like images) that resembles its training data.
Additional training phase after initial pretraining that uses generative tasks to improve model capabilities.
A model's procedure for creating new outputs (like floor plans) based on learned patterns from training data.
A methodology that grows phenomena from micro-level interaction conditions to identify sufficient mechanisms, detect thresholds, and design safety interventions.
Matrices used to encode data into codewords in error-correcting codes.
The shortest path between two points along a curved surface, as opposed to straight-line distance.
Checking that spatial analysis results are realistic (e.g., no negative distances, valid coordinate ranges, sensible geographic relationships).
A mathematical framework (Clifford algebras) that extends vectors with operations for rotations, reflections, and higher-dimensional relationships.
Structural constraints added to a model to encode domain knowledge about geometry, such as crystal lattice properties.
Maintaining structural and spatial accuracy across multiple views or representations of a 3D object.
The alignment between router weight directions and expert weight directions that emerges during training.
Building a 3D model of a scene from video or images by estimating depth and camera motion.
Property where data points can be separated into groups using a linear boundary in vector space.
Multimodal representations that preserve spatial and geometric information about the scene to maintain disambiguating context.
Using machine learning and statistics to analyze data tied to geographic locations.
A file format for quantized models designed for efficient CPU and GPU inference with llama.cpp.
A file format designed for efficient storage and loading of large language and embedding models, optimized for fast inference on various hardware.
A mathematical technique for reweighting probability distributions along trajectories without computing gradients.
Attention mechanism where each token can attend to all preceding tokens in the sequence.
Populations and nations that represent the numerical majority of the world but are historically marginalized in Western-dominated systems.
When an AI agent gradually abandons its original objective and pursues different goals instead.
A low-dimensional vector that captures task identity and enables rapid adaptation to new tasks without retraining.
When an AI system's stated objective doesn't match the actual intended outcome, leading to unintended behaviors.
Rules and policies that limit AI autonomy to ensure oversight, safety, and alignment with organizational values.
A set of rules and structures that constrain and guide AI behavior to ensure reliability and consistency.
An open-source license that allows free use and modification of software, but requires any derivative works to also be open-source under the same license.
A transformer-based neural network design that processes text sequentially and predicts the next word based on previous context.
A transformer-based neural network design from OpenAI that processes text sequentially to predict and generate the next word in a sequence.
An older transformer-based design for language models that generates text by predicting one word at a time, simpler and smaller than modern alternatives.
A modified version of the GPT-2 architecture that changes the original design, such as by reducing size or adjusting training.
A transformer-based design that follows the same structural principles as OpenAI's GPT-3 model, using layers of attention mechanisms to process text.
A class of transformer-based language models descended from the original GPT design, characterized by autoregressive text generation and broad general-purpose capabilities.
A transformer-based neural network design that uses self-attention to process and generate text, serving as the structural blueprint for this model.
An open-source large language model architecture based on the GPT design, created as an alternative to closed-source models.
An open-source transformer-based architecture designed for training large language models, similar in structure to GPT models.
A neural network design based on transformer technology that processes text sequentially and generates one word at a time.
A quantization technique that compresses model weights to lower precision, reducing file size and memory requirements while maintaining reasonable performance.
Assigning GPU resources to different models or tasks to optimize throughput and latency.
Performance degradation that occurs when multiple inference requests compete for the same GPU's memory and compute resources.
The high-speed memory on a graphics processor used to store and process model weights and computations during inference.
Designing and tuning a model to run efficiently on graphics processing units (GPUs), which are specialized hardware that accelerates AI computations.
A technique ensuring that gradient updates from different tasks point in compatible directions to avoid conflicts.
Estimating how model parameters should change without actually computing full gradients or updates.
Improving model performance by following the direction of steepest improvement in parameters.
Systematic error in gradient estimates that prevents optimization from reaching the true optimum.
Building models sequentially where each new model corrects errors from previous ones.
Limiting the magnitude of gradients during training to prevent extreme updates and improve stability.
Sending model weight updates between devices and servers during distributed training, a major bottleneck on bandwidth-limited networks.
Reducing the size of gradient data to speed up training on distributed systems.
When different training objectives pull model updates in opposing directions, causing optimization to fail or degrade.
Scaling gradient values to maintain consistent learning rates across different parameter groups or layers.
A training technique that flips gradient signs to force a model to learn features that fool an adversarial classifier.
Technique that selectively modifies or blocks gradient flow to prevent interference between different learning objectives.
An explainability technique that uses model gradients to identify which input features most influence predictions.
Setting starting values for trainable parameters using information from model gradients to improve convergence and final performance.
Optimizing a function without computing gradients, using only function values or rankings.
A task where a model identifies and fixes grammar, spelling, and syntax mistakes in written text.
A linguistic system where nouns and related words are classified into categories requiring specific agreement patterns.
An attention mechanism that learns weighted interactions between nodes in a graph structure.
The task of assigning a label or category to an entire graph based on its structure and node features.
Transferring knowledge from a labeled source graph to an unlabeled target graph when their structures or distributions differ.
A measure of how different two graphs are, based on the minimum edits needed to transform one into the other.
Converting a graph structure into a compact text representation that preserves its properties.
A neural network that operates on graph-structured data by passing messages between connected nodes to learn relational patterns.
Neural networks designed to process graph-structured data by learning representations of nodes and edges.
Methods for converting graph structures into numerical vectors that preserve meaningful information about nodes and edges.
A structured approach that represents events and their relationships as a graph and processes them in sequential stages.
Generating text by always selecting the highest-probability next token, without exploring alternatives.
Accurate reference labels or measurements used to train and evaluate machine learning models.
The actual underlying causes or features that explain observed data in a system.
AI reasoning that relies on specific documents or data provided to the model, rather than just its training knowledge.
The practice of ensuring a model's responses are based on and supported by provided source documents rather than generated from general knowledge.
A generalized measure of uncertainty or disorder that follows mathematical group rules, extending beyond standard entropy.
A training method that improves model reasoning by comparing outputs and rewarding better explanations.
In quantization, the number of weights that share a single scaling factor; smaller groups preserve more precision but use more memory, while larger groups save more memory but may lose detail.
Reducing model size by compressing weights in groups rather than individually.
Predicting aggregate behavior of a group of users rather than individual users, useful for testing business strategies.
An optimization technique that reduces memory usage and speeds up inference by having multiple query heads share the same key and value heads instead of each having their own.
Group Relative Policy Optimization, a reinforcement learning algorithm for fine-tuning language models with reward signals.
Safety mechanisms built into a model to refuse harmful requests or prevent it from generating unsafe content.
An AI system that interacts with computer interfaces by clicking, typing, and navigating screens.
The ability to identify and locate specific elements (like buttons or text fields) within a graphical user interface based on natural language descriptions.
A technique to steer AI generation toward desired outputs by providing additional control signals during inference.
A technique that steers a model's output toward desired behavior by balancing multiple objectives during inference.
Steering a model's text generation process using external signals or constraints without modifying the model itself.
A training technique that intelligently selects the most informative examples from your training data to improve model efficiency and performance.
A differentiable relaxation technique that approximates discrete choices to enable gradient-based optimization.
When a model generates plausible-sounding but factually incorrect or fabricated information.
The ability to identify when a model generates false or unsupported information that isn't grounded in the provided source material.
A mathematical equation solving optimal decision-making problems over time.
A route that visits every location exactly once without repeating any node.
A quantum computing technique that simulates the evolution of a physical system described by a Hamiltonian.
The ability of a model to identify and interpret handwritten characters and words from images, accounting for variations in writing style and quality.
A rule that must always be satisfied during optimization, rather than being treated as a soft penalty that can be violated.
Challenging negative examples that are similar to the target but still incorrect, used during training to make the model learn more nuanced distinctions.
Tuning a model's design or training to run more efficiently on specific hardware (like NVIDIA GPUs), reducing memory usage and inference time.
A structured system that categorizes different types of harmful content (like violence, hate speech, or misinformation) so a model can recognize and classify them.
An architecture that alternates between thinking (reasoning about a problem) and acting (taking physical steps), allowing the model to plan and execute robot actions iteratively.
The design and implementation of control systems that manage agent behavior and task execution.
Systematic process of identifying potential failures and dangerous scenarios in a system.
The instantaneous rate of an event occurring at a given time, conditional on survival up to that time.
Systematically disabling individual attention heads to determine which ones are causally responsible for specific model behaviors.
Mathematical technique to approximate probability distributions using orthogonal polynomials.
The complete set of eigenvalues of the loss function's second-derivative matrix, describing the curvature in all directions.
Systematic differences in how different groups (by language, task, etc.) rank or prefer models.
Differences in how a treatment affects different individuals based on their characteristics.
A practical problem-solving method that finds good solutions quickly without guaranteeing optimality.
The size of the internal vector representation used by a neural network to process and store information about the input.
The internal numerical values a neural network computes at each layer as it processes input.
The dimensionality of the internal representations that a neural network uses to encode information about text.
An adversarial attack that injects malicious tokens to corrupt a model's internal memory and degrade performance.
Internal representations computed by neural networks that capture learned patterns.
Combining multiple independent predictions or estimates using a structured approach that accounts for differences in their reliability.
A multi-stage attention approach that first selects relevant tokens coarsely, then applies fine-grained attention on the selected subset.
A statistical technique using Platt scaling with a hierarchical prior to adjust model confidence while preventing over-shrinking of extreme predictions.
An unsupervised learning method that builds a tree of nested clusters by repeatedly merging or splitting groups based on similarity.
A neural network component that processes images at multiple levels of detail simultaneously, capturing both fine details and broad patterns.
A multi-level approach to reasoning where information is processed and combined across different levels of abstraction.
Storage system using multiple memory tiers (e.g., fast GPU memory and slower CPU memory) to balance speed and capacity.
Planning at multiple levels of abstraction, where high-level plans are refined into low-level actions.
Breaking down a complex decision into multiple levels, like deciding family → genus → species in order.
Breaking complex tasks into simpler sub-tasks organized in levels, where agents learn high-level strategies and low-level actions separately.
A technique that aggregates features from multiple layers of a neural network to create multi-scale guidance signals.
Testing correctness at multiple levels: properties, interactions, and full rollouts to ensure system correctness.
The process of automatically converting algorithmic descriptions into hardware designs, typically using pragmas and code transformations.
Derivatives beyond the first order (gradients) that capture more complex relationships in how inputs affect outputs.
The exponential growth in the number of quantum states available as more qubits are added to a quantum system.
Performance feedback derived from comparing baseline and skill-enhanced rollouts to guide skill and policy updates.
The expected number of steps for an algorithm to reach a target state from a starting point.
A training technique that simulates viewing images from different angles and perspectives to teach the model to recognize the same features under geometric transformations.
Techniques to make AI models produce truthful responses instead of false or misleading ones.
A type of recurrent neural network with symmetric connections used for associative memory and optimization.
A linear algebra operation that reflects vectors across a hyperplane, used here to align word direction vectors.
A self-supervised learning approach for audio that learns meaningful speech representations by predicting masked portions of audio, similar to how language models learn from text.
Forecasting future body positions and movements based on past motion sequences.
A controlled experiment measuring how much an AI system improves human performance compared to working without it.
A workflow where humans and AI agents work together, with AI assisting at multiple stages rather than just solution generation.
A system where AI predictions are reviewed and validated by human experts before final decisions.
Training robot control policies by learning from human movement demonstrations.
A model that combines two different neural network designs (in this case, Mamba2 and attention mechanisms) to balance speed and performance.
A neural network design that combines Mamba (a fast, efficient sequence model) with Transformer components to balance speed and capability.
A memory system combining learnable parameters with non-learnable mechanisms to balance flexibility and efficiency.
A capability that allows a model to switch between fast, direct responses and slower, more deliberate reasoning depending on task complexity.
A non-Euclidean geometry where space curves negatively, naturally suited for representing hierarchical and tree-like structures.
A neural network that models relationships between multiple elements simultaneously, capturing high-order interactions beyond pairwise connections.
A neural network that generates weights for another neural network instead of learning them directly.
A configuration setting (like learning rate or network size) that you choose before training a model.
Using optimal hyperparameters found at small scale to train larger models without expensive retuning.
A geometric shape in high-dimensional space used in optimization and probability theory.
Training method that constrains weight matrices to lie on a fixed-norm hypersphere for improved stability and scaling.
Mathematical structure where points lie on the surface of a high-dimensional sphere, preserving directional relationships.
A geometric arrangement where data points lie on the surface of a high-dimensional sphere, preserving directional relationships.
The ability to uniquely determine a model's parameters from observed data.
The policies, processes, and controls that manage who (or what) can access systems and data, and what actions they are authorized to perform.
Maintaining consistent, unique identifiers for entities across different systems and time periods.
Keeping a person's unique facial characteristics unchanged while editing other attributes like expressions.
Separating what makes a face unique (identity) from how it moves (expression) so each can be controlled independently.
The task of automatically generating a text description of what appears in an image.
Modifying specific parts of an existing image while preserving other elements.
A neural network component that converts images into numerical representations that capture visual features and patterns.
A computer vision task that divides an image into regions or labels each pixel to identify different objects or areas.
Hardware in cameras that processes raw sensor data into final images, increasingly using AI for enhancement.
The process of converting images into discrete tokens (small units) that a language model can process, similar to how it handles text.
The ability to understand and answer questions that require analyzing both visual content and textual information together.
The ability to analyze a visual image and automatically produce source code that recreates or represents that image's structure and content.
The task of automatically generating natural language descriptions of images, converting visual information into written words.
Training a model to copy behavior from expert examples without understanding the reasoning behind decisions.
A learned behavior that mimics actions from human demonstrations or other expert examples.
Identifying which parts of a system are affected by a proposed code change.
Games where players don't know all relevant information, like hidden opponent cards or future draws.
A limitation that emerges naturally from the training setup rather than being explicitly specified.
A hidden, structured order in which models naturally learn skills during pretraining, without explicit curriculum design.
Computing gradients through an implicit equation without unrolling iterations, keeping memory constant.
A user's underlying goal or need that is not directly stated but must be inferred from context.
Neural networks defined by equations that must be solved rather than computed layer-by-layer, enabling parameter efficiency.
Structured behaviors that emerge naturally from an LLM's token-level decisions without being explicitly programmed or instructed.
Inferring unobserved values or outcomes from historical patterns in data without explicit instruction.
Information about what a community values inferred from their behavior (like engagement and acceptance) rather than explicit feedback.
Culpeper's framework analyzing how language can intentionally or unintentionally cause offense or disrespect.
A technique to adjust samples drawn from one distribution to match another by weighting them by their probability ratio.
Adjusting sample weights to correct for sampling from the wrong distribution.
A technique to estimate gradients by reweighting samples from one distribution to match another.
Learning from examples provided in a prompt without updating model weights.
Data compression performed during simulation execution rather than after data is written to disk.
A training technique where negative examples (dissimilar samples) come from other items in the same training batch, helping the model learn to distinguish between similar and dissimilar texts.
A mechanism where relevant information is retrieved from model parameters themselves rather than from external memory or attention, helping reduce computational bottlenecks.
Ensuring that the goals and rewards of different agents or system components work toward the same overall objective.
How well a model adjusts its behavior when the rewards or payoffs for different actions change.
A theory of humor based on identifying mismatches in expectations and then resolving them in unexpected ways.
Writing systems used for South Asian languages like Hindi, Tamil, Telugu, and Bengali that have distinct characters and phonetic rules.
Artifacts or evidence left behind by attackers (like malicious URLs, IP addresses, or file hashes) that reveal a security breach.
An attack where malicious instructions are hidden in data an AI agent retrieves, causing unintended actions.
Built-in assumptions about how data should behave, like physics rules, that help models learn faster with less data.
A sensor that measures acceleration and rotation to track motion without external references.
The process of running a trained model to generate predictions or outputs from new inputs.
Specialized hardware designed to speed up the execution of trained AI models.
The computational resources and processing power required to run a model on new data after it has been trained.
The computational resources and time required to run a model on new inputs, typically measured in memory usage and processing time.
The ability of a model to generate outputs quickly and with low computational resource consumption during real-world use.
Software that runs a trained model to generate predictions or outputs; vllm is an optimized inference engine for large language models.
Software that optimizes how a trained model runs on specific hardware; MLX is an Apple-optimized framework for efficient inference on Apple Silicon.
The time it takes for a model to generate a response after receiving an input.
Techniques and design choices that make a model faster and more efficient to run on hardware, prioritizing speed and resource usage over training flexibility.
The numerical precision (number of bits) used when running a model to generate outputs; lower precision is faster but may reduce quality.
A system that hosts trained ML models and processes incoming prediction requests on deployed hardware like GPUs.
How quickly a model can generate predictions or outputs after being given an input, measured in time per token or tokens per second.
Reduction in time needed to run a model and get results, measured as a multiple of the original speed.
The amount of time it takes for a model to process input and generate output after it has been trained.
Extra processing power spent by the model while generating a response to think through problems more carefully before answering.
The computational resources used when a model generates answers, as opposed to during training.
Detecting and fixing model mistakes during generation without retraining, using only the current forward pass.
A model used during generation to score outputs without requiring retraining of the main system.
A technique where a model allocates more computational resources and time during inference (when generating answers) to improve quality and accuracy on harder problems.
A mathematical proof written in natural language rather than formal logical notation.
Combining fragmented knowledge from multiple sources to make better collective decisions than any single source could.
When one party in a transaction has more or better information than the other, creating imbalanced power.
A point in a system where information capacity is severely limited, constraining overall performance.
The amount of useful, non-redundant information contained in a token or representation.
The task of automatically identifying and pulling out specific data or facts from documents, such as names, dates, or amounts from forms.
The reduction in uncertainty about a target achieved by knowing a feature.
Mathematical framework treating probability distributions as points in curved space, measuring optimization difficulty via curvature.
When a model accidentally learns from information it shouldn't have access to, like future data or test set details.
The task of finding relevant documents or passages from a large collection in response to a user query.
The process of gathering data from multiple sources and combining it into a coherent, unified response or summary.
Running a model as an intermediate processing layer within an application pipeline, typically to filter or validate data before it reaches the main system.
A safety intervention using statements with specific linguistic forms to prevent misaligned behavior, but which can paradoxically trigger misalignment on similar-form inputs.
The task of filling in missing or masked regions of an image while maintaining coherence with the surrounding content.
A neural network architecture designed to be convex in its inputs, useful for constrained optimization and learning convex functions.
The type of data a model can accept as input, such as text, images, or audio.
The pixel dimensions (448×448 in this case) at which the model processes images, affecting the level of visual detail it can perceive.
Checking that input data meets basic requirements (correct format, expected properties, no obvious errors) before processing it.
How much a model's behavior changes in response to different inputs, crucial for generalization.
The types of data a model can accept as input and produce as output, such as text, images, or audio.
The process of producing additional relevant information or perspectives that extend or improve an initial answer.
Identifying the core techniques or key ideas needed to solve a complex problem.
Identifying and locating individual objects of the same class separately in an image.
The ability to apply different settings or modifications to individual objects within a scene independently.
The ability of a model to follow primary instructions even when secondary or conflicting instructions are present.
The ability of a model to understand and execute specific tasks or commands given in natural language prompts.
A model fine-tuned on instruction-response pairs so it follows user prompts more reliably.
A training process that teaches a model to follow specific user instructions and commands, improving its ability to respond appropriately to requests.
The prediction that advanced AI agents will pursue certain goals (like self-preservation) regardless of their final objectives.
A sequence of checks replacing ground-truth labels: responsiveness to safe/unsafe contrasts, dominance of target variance, and stability across reruns.
A specific quantization format that represents model weights using only 4 bits per value, significantly reducing model size while maintaining reasonable performance.
A quantization method that represents model weights using only 4-bit integers instead of full-precision floating-point numbers, dramatically shrinking the model's memory footprint.
A compression technique that reduces a model's size and memory usage by storing weights as 4-bit integers instead of higher-precision numbers, making it faster and cheaper to run with minimal accuracy loss.
Using 8-bit integers instead of floating-point numbers to represent model weights and activations.
A mathematical optimization technique that finds the best solution among discrete options subject to linear constraints.
A mathematical property ensuring that estimated demand relationships are economically consistent and don't violate basic economic laws.
A class of distance measures between probability distributions that use function classes to define divergence.
The ability of an AI system to understand and match user goals, especially when requirements are unclear or evolving.
The process of analyzing user input to determine what the user is trying to accomplish so it can be handled appropriately.
The process of identifying and structuring the user's underlying goal or request from natural language input.
The process of users clarifying and developing their goals through interaction rather than starting with fully-formed objectives.
The model's capability to understand what a developer actually wants to accomplish, even when the request is vague or expressed in informal language.
Specifying what you want to accomplish rather than writing detailed code to implement it.
A measure of how consistently multiple human annotators label or judge the same data.
Dependencies and relationships between different variables or channels in multivariate data.
A measure of how consistently different judges rate the same outputs, typically using metrics like correlation or ICC.
The spatial, functional, or semantic relationships and dependencies between different parts of a composed object.
A measure of how consistently different evaluators score or judge the same items, often using metrics like Kendall's tau.
Ensuring that learning signals from different tasks contribute equally to model updates, preventing any single task from dominating training.
A measure of how much multiple teacher models agree on their predictions, used to assess supervision reliability.
A model's understanding of how conversations naturally flow and how users respond to assistant outputs.
The sequence of past user actions and system responses that inform current decision-making.
An AI tool designed for back-and-forth collaboration with humans, refining intent and outputs through dialogue.
A conversational interface where users can ask follow-up questions and receive responses based on previous context, rather than just one-shot predictions.
Training a policy by having humans intervene and correct the robot, then learning from those corrections.
Combining insights and methods from multiple academic disciplines to solve problems in a target domain.
The ability to mix images and text in any order within a single prompt, rather than requiring all images first or all text first.
Giving feedback at multiple steps during reasoning, not just at the final answer, to guide the model's thinking process.
A deliberate step-by-step thinking mechanism that occurs before generating a response, helping the model work through complex problems more carefully.
The hidden patterns and knowledge stored inside a model's layers that it uses to understand and generate text.
A hidden computation phase where the model reasons through a problem before producing its final answer, improving accuracy on complex tasks.
Whether a study actually measures what it claims to measure, without confusing factors distorting the results.
The ability to understand and explain how a model makes decisions and what it has learned from its training data.
Machine learning models designed to be understandable to humans, showing why they make specific predictions.
Determining the appropriate moment to interject in a conversation based on natural dialogue cues.
The ability to decide when an agent should proactively act, when to seek user consent, and when to remain silent.
Whether a model applies the same reasoning strategy when classifying different instances of the same category.
Ensuring that related elements (like a person's face across frames) maintain consistent properties throughout.
The degree of disagreement in how different models within the same modality (e.g., vision models) represent a single stimulus.
Measuring how similar consecutive frames or audio segments are within a single modality.
Changes in paralinguistic features within a single spoken sentence, like shifting emotion mid-sentence.
Breaking down an image into fundamental components like albedo (color), shading (lighting), and residuals (fine details).
The geometric properties of a space as measured from within, independent of how it's embedded in higher-dimensional space.
A reward signal that encourages an agent to explore and discover new states, separate from task-specific rewards.
Reward signals generated from the model's own internal signals, like confidence scores, rather than external verification.
A change that preserves key properties or predictions of a model.
A specification that defines preconditions and postconditions for tool calls to prevent invalid action sequences.
Predicting what inputs or earlier program states must have been to produce a given output.
Finding the input that produces a known output, when the forward process is complex or many-to-one.
Finding input causes from observed output effects, often ill-posed.
Working backward from a desired outcome to determine what actions would produce that result.
A reward signal that measures quality by having an LLM recover the original task specification from generated outputs.
A technique that reweights observations to remove confounding bias by accounting for treatment assignment probabilities.
A data structure that maps terms to the documents containing them, enabling fast keyword-based search similar to how a book's index works.
A search technique that maps vocabulary terms to documents containing them, enabling fast keyword-based lookups commonly used in search engines.
An AI system that shapes decisions and outcomes without users recognizing its influence on the information or criteria they use.
Errors or misalignments in AI outputs that go undetected because the user accepts the result without critical evaluation.
A measure of how quickly ions move through a material, critical for battery charging and discharging speed.
Graphs showing model performance across different configurations while keeping total computational operations constant.
An unsupervised algorithm that isolates anomalies by randomly selecting features and split values.
A property that remains the same for graphs with identical structure, regardless of how nodes are labeled or arranged.
Statistical method to estimate latent abilities, question difficulty, and model proficiency from test performance.
The process of gradually removing noise from a noisy input through multiple refinement steps to generate clean outputs.
A workflow where code is refined through multiple rounds of small, targeted changes rather than complete rewrites.
Repeatedly improving an output by generating versions, evaluating them, and using feedback to create better versions.
A process where the model performs multiple rounds of web searches, each building on previous results to refine and deepen its understanding of a topic.
A technique that limits how much a model's output changes when inputs change slightly, making it more stable and predictable.
Crafting adversarial inputs designed to bypass a model's safety guardrails and trigger harmful outputs.
The process of breaking Japanese text into meaningful units (tokens), accounting for the language's unique writing systems including kanji, hiragana, and katakana.
A self-supervised learning approach that predicts future embeddings from video without reconstructing pixels.
Converting code to machine instructions at runtime, enabling Python code to run efficiently on GPUs.
A training approach where a model learns to predict missing parts of video by understanding both spatial and temporal patterns without reconstructing actual pixels.
A shared mathematical space where different types of data (like sounds and text descriptions) are represented so similar concepts are positioned close together, enabling direct comparison.
A shared numerical space where different types of data (such as audio and text) are represented together, allowing the model to find relationships between them.
Processing multiple input types together in an integrated way rather than separately, allowing the model to reason about how they relate.
An unsupervised algorithm that groups data points into k clusters by minimizing distance to cluster centers.
The raw frequency domain data collected directly by an MRI scanner before conversion to images.
A technique to analyze neural networks by identifying which neurons or experts are most important for specific tasks.
A recursive algorithm that estimates the state of a dynamic system by optimally combining noisy measurements with a mathematical model.
A nonparametric method for estimating survival curves from censored data without assuming a specific distribution.
A non-parametric method that estimates probability distributions by smoothing data points with kernel functions.
Combining multiple GPU operations into a single optimized computation to reduce memory overhead and improve speed.
Tuning kernel functions to improve performance in kernel-based models.
A mathematical framework using reproducing kernel Hilbert spaces for classification and regression with theoretical guarantees.
Internal memory structures in transformers that store computed representations to speed up inference and enable agent communication.
Attention mechanism components that store and retrieve information; fewer heads means reduced model capacity and faster computation.
A reference frame in a video that serves as an anchor point for propagating edits or information to surrounding frames.
Matching specific visual landmarks (like object corners) between a demonstration and a new scene to align actions.
The task of automatically identifying and locating distinctive points of interest in an image that remain stable across different angles and lighting conditions.
A measure of how different one probability distribution is from another, used to evaluate sampling quality.
Assessing models using external knowledge sources for better judgment.
A structured or unstructured collection of documents and facts that a system retrieves from to answer queries.
The limit to how much factual information a model can reliably know or recall, often constrained by its size and training data.
A discrete unit of knowledge or skill that can be identified and measured in student work.
The process of organizing, storing, and synthesizing insights from multiple experiments to improve future decision-making.
The date up to which a model has been trained on data; it cannot reliably answer questions about events or information after this date.
A technique that compresses a large, complex model into a smaller one by training the smaller model to mimic the larger model's behavior.
An agent's ability to recognize what information or skills it lacks to solve a problem.
A structured database that stores facts as relationships between entities (like 'Einstein' connected to 'Physics'), enabling machines to reason about real-world knowledge.
The task of filling in missing facts or relationships in a knowledge graph by predicting what connections should exist based on patterns in existing data.
Monitoring and recording what a student has demonstrated they understand over time.
Applying knowledge learned from one task to improve performance on another.
Requiring external factual information beyond what is directly observable to solve a task correctly.
Incorporating domain expertise or physical laws into machine learning models to improve accuracy and generalization.
A neural network architecture designed to provide flexible, expressive function approximation with interpretable structure.
A mathematical operator that transforms observable functions of a dynamical system to reveal its underlying structure and eigenvalues.
A mathematical way to describe quantum operations that guarantees they produce physically valid quantum states.
An efficient but approximate method for parameterizing doubly stochastic matrices that sacrifices some expressivity for computational speed.
A mathematical property that guarantees convergence of optimization algorithms to stationary points.
A store for previously computed key-value pairs that speeds up text generation in transformers.
The number of attention head pairs used for storing and retrieving key-value information in a transformer model's attention mechanism.
Moving key-value cache data to slower storage (CPU/disk) to reduce GPU memory usage during inference.
Systematic unfairness in training labels that causes models to learn and reproduce those biases.
Errors or inaccuracies in training data labels that can degrade model performance and cause the model to memorize incorrect information.
A learning approach that achieves good performance with minimal labeled training examples.
A poisoning attack where attackers deliberately mislabel training examples to mislead the model.
A training signal derived from model behavior itself rather than human-annotated labels.
An optimization technique that enforces constraints by incorporating them as penalty terms into the objective function.
A subset of representative points selected to efficiently represent a larger dataset for computation.
An optimization technique that uses gradient information and randomness to explore a reward landscape.
The core language model component that processes text and generates responses based on information from other parts of the system.
A group of languages that share a common ancestor and similar grammatical structures, such as Romance or Slavic languages.
The model's ability to generate grammatically correct, coherent, and natural-sounding text that reads as if written by a human.
The proportion of each language included in a multilingual training dataset.
An AI model trained to predict and generate text by learning patterns from large amounts of written data.
The task of predicting the next word or token in a sequence based on previous words, which is the core objective used to train text models.
Training or fine-tuning a model to excel at a specific language by using more native-language data and task-specific adjustments.
Training a model to excel at a specific language rather than trying to handle many languages equally well.
The study of how languages vary in their structural features and which combinations are common across human languages.
A model's ability to work across multiple languages without requiring separate training for each language.
The property of a representation or model component working effectively across different languages without language-specific tuning.
A language model trained primarily or exclusively on text from a single language to achieve better performance on that language than a multilingual model.
Training a model on text from a particular language (Dutch, in this case) so it learns that language's unique grammar, vocabulary, and nuances rather than treating it as a variation of English.
Training a model primarily on data from a particular language, which makes it especially fluent and accurate in that language.
Training a model to specialize in one particular language, which makes it perform better on that language but worse on others.
A specialized AI model designed to understand instructions and convert them into structured function calls and tool interactions rather than generating free-form text.
An LLM extended with an audio encoder to understand and reason about sound and audio content.
A neural network trained on vast amounts of text data to understand and generate human language.
A local search algorithm that accepts solutions if they improve upon a solution from several iterations ago, balancing exploration and exploitation.
Combining predictions from separate models trained on different data sources, merging results after individual processing.
A retrieval technique that compares individual tokens between a query and document separately, then combines the results, rather than comparing pre-computed single vectors.
A retrieval approach that compares individual token embeddings between query and document at search time, rather than comparing pre-computed single vectors.
A retrieval approach that compares individual token embeddings between query and document at search time, rather than comparing pre-computed single vectors, allowing more precise matching of specific phrases and rare terms.
The time delay between sending a request and receiving the first response token from a model.
A strict deadline requirement for how quickly data must travel from source to destination.
Predicting how long an inference request will take to complete, accounting for hardware contention and concurrent execution.
A model designed to produce results as quickly as possible, prioritizing speed over other factors like accuracy or feature breadth.
A model that estimates how fast a system can process requests and how many it can handle per unit time.
The compressed representation layer in an autoencoder that forces the model to learn efficient, meaningful encodings of input data.
Agents exchanging information through internal representations like embeddings or cache states rather than explicit text.
A generative process that iteratively refines compressed representations of data by removing noise to produce coherent outputs.
Generative models that create images by learning to denoise random noise in a compressed latent space rather than pixel space.
A system of equations describing how a model's hidden state evolves over time through iterative updates.
Hidden patterns of change in a system that cannot be directly observed but must be inferred from available data.
A lower-dimensional surface where high-dimensional data naturally lies.
Reasoning performed in continuous or discrete hidden representations rather than explicit natural language.
A compressed, learned encoding that captures the essential features of data in a compact form.
Compressed, learned feature vectors that capture underlying patterns in data without explicit labels.
A compressed, learned representation of data that captures its essential features in fewer dimensions.
A compressed, learned representation of data in a lower-dimensional space that captures hidden patterns not visible in raw observations.
A learned hidden representation that evolves through computation to capture task-relevant information.
A neural network that learns to predict future video frames in a compressed representation space rather than raw pixels.
A training method that stabilizes reinforcement learning by anchoring functional tokens with a weighted auxiliary objective for stronger gradient updates.
A technique to break down what a model learns internally into individual concepts or features it uses to make decisions.
A markup language commonly used to write mathematical equations and scientific documents in a format that renders beautifully.
A text-based format for writing mathematical and scientific documents with precise formatting and symbolic notation.
Analyzing what information is encoded in each layer of a neural network by testing intermediate representations.
The ability to understand and use information about how text is positioned and structured on a page, not just the words themselves.
Deferring the loading of full tool schemas until they are actually needed, keeping context compact.
A public ranking showing how different models perform on a standardized task, updated as new submissions arrive.
When concept representations unintentionally encode task-relevant or inter-concept information beyond their intended semantics, compromising interpretability.
A learned mechanism that adaptively selects which parameters to keep and which to remove in compressed task vectors.
Framework separating total forecast error into estimation error (from training) and approximation error (from architecture).
A research-based description of how students' understanding develops in a subject over time, from novice to expert.
A predefined plan for how the learning rate changes during training to improve convergence.
Using the same learning rate setting across models of different sizes without retuning.
A statistical technique for improving covariance matrix estimation by shrinking it toward a simpler structure.
A 24-dimensional mathematical structure with optimal sphere packing properties, used here to compress model weights efficiently.
The ability to interpret and apply legal concepts accurately, requiring understanding of domain-specific rules and nuances.
The cost or performance loss from making a model more interpretable.
A model's ability to handle longer or more complex problem sequences than those seen during training.
A systematic tendency to give softer or more favorable judgments, often due to awareness of negative consequences.
A hierarchy of representations of the same object at different resolutions, commonly used in graphics for rendering efficiency.
A measure of how different two text strings are, counting the minimum character insertions, deletions, or substitutions needed.
Linguistic properties combining vocabulary and grammar patterns used to analyze and classify text style and register.
A sensor that uses laser pulses to measure distances and create 3D maps of environments.
Methods to identify whether an AI model's response is false or misleading.
Continuously adapting recommendations to a user's evolving preferences over extended periods without forgetting past patterns.
A model that uses fewer computational resources and memory, making it practical to run on less powerful hardware.
A smaller, more efficient model designed to run quickly and use less memory than larger alternatives, often with some trade-off in reasoning capability.
A mathematical measure of how probable the model considers a given sample, enabling exact probability calculations.
A measure of how many lines of code are executed by a test suite, indicating test completeness.
A steering technique that applies learned linear transformations to model activations to control behavior.
An attention mechanism with linear complexity instead of quadratic.
A property where the Bellman backup operation preserves linearity in value functions.
An algorithm whose computational cost grows proportionally to input size, rather than quadratically.
Computational cost that grows proportionally with sequence length, rather than quadratically like Transformers.
Using linear combinations of features to represent value functions or policies in RL.
A mathematical condition expressed as a matrix inequality that can be efficiently checked to verify system properties like stability.
A simple classifier trained on top of a model's internal representations to detect specific properties.
Simple machine learning classifiers trained on model internal states to detect specific properties like deception.
An optimization problem where the objective and constraints are linear equations or inequalities.
A simple model that maps input features to continuous numeric outputs using a linear function.
The idea that concepts are linearly separable in neural network embeddings.
The set of all possible combinations of vectors, describing the geometric space covered by a group of features.
A formal language for specifying how systems should behave over time, commonly used in security and software verification.
Systems whose behavior follows linear equations that don't change over time.
An attention mechanism with linear computational complexity instead of quadratic, enabling faster inference.
A speaker's implicit knowledge of language rules and structure, distinct from actual language use.
A task where a model predicts missing relationships between entities in a knowledge graph, such as guessing that two people are colleagues based on existing connections.
An alternative neural network architecture that uses continuous, adaptive transformations instead of fixed layers, allowing efficient processing with fewer parameters.
A neural network architecture that uses continuous, adaptive functions to process information, allowing the model to adjust its behavior dynamically based on input.
Ranking multiple items together as a group, rather than scoring each item independently.
The capability to read and understand text and written content within images, rather than just recognizing objects or scenes.
A continuously updated evaluation system that scores models on new data as it arrives, rather than a fixed test set.
A transformer-based neural network design optimized for efficient language modeling and text generation.
A design pattern that connects a vision encoder to a language model, enabling the language model to understand and describe images.
A language model trained to evaluate and judge outputs (like comedy sketches) based on learned human preferences.
A frozen language model used to evaluate and score other model outputs according to predefined criteria.
Using a language model to automatically evaluate the quality of outputs from other AI systems instead of human reviewers.
Using a language model to automatically evaluate or score outputs from other AI systems instead of human reviewers.
A training approach that adapts a generative language model to produce high-quality text embeddings by repurposing its existing knowledge without building from scratch.
Ensuring experts are used evenly across the model to avoid some experts being overused while others sit idle.
Attention mechanism where each token only attends to a bounded window of preceding tokens instead of all previous tokens.
Running a model directly on your own computer or server instead of sending requests to a remote service.
Running an AI model directly on your own computer rather than sending data to a remote server, keeping data private and reducing latency.
An algorithm that identifies outliers by comparing the local density of a point to its neighbors.
The observation that a large model's preferred token appears in a small model's top-K predictions even when not ranked first.
A technique that groups similar items together using hashing, allowing the model to attend to relevant parts of long text without comparing every token to every other token.
An efficient attention mechanism that groups similar tokens together to reduce computation, allowing the model to handle longer texts without excessive memory use.
In conformal prediction, the process of identifying similar examples to condition uncertainty estimates on local neighborhoods rather than global statistics.
How well an explanation's highlighted regions match ground-truth annotations from experts.
Identifying unusual or suspicious patterns in system logs that indicate errors, attacks, or failures.
A probability distribution whose logarithm is a concave function, ensuring nice mathematical properties.
Ensuring that different signals or judgments from a model don't contradict each other and follow coherent logical rules.
Identifying misalignment by finding contradictions in a model's reasoning across equivalent scenarios with different framings.
Pre-defined action sequences or skills expressed using logical rules that guide an agent toward specific goals.
A low-dimensional region within a model's internal representations that captures reasoning logic independent of language form.
A security flaw in program logic rather than memory safety that causes incorrect behavior.
A loss function that adjusts for class imbalance by modifying the model's output scores.
Methods that use the model's raw prediction scores to make decisions, rather than analyzing deeper internal patterns.
Knowledge distillation that transfers the raw model outputs (logits) rather than higher-level representations.
A method for combining multiple forecasts by averaging them in logit space with a data-dependent prior to reduce variance.
The ability of a model to process and understand very long sequences of text while maintaining coherence across distant parts of the input.
An embedding model designed to process and maintain meaningful representations across very long documents (thousands of tokens), rather than just short snippets.
The ability to process and understand very long documents or conversations without losing track of earlier information.
Processing input sequences much longer than a model's training context window while maintaining accuracy and efficiency.
The ability to process and understand very long input texts (thousands of tokens) while maintaining coherent reasoning across the entire passage.
The ability to process and integrate information from many sources or a large amount of text, then combine it into a coherent summary or report.
The capability to produce extended, coherent text such as articles, reports, or documents while maintaining consistency and structure throughout.
The capability to produce extended, coherent text outputs like essays, articles, or detailed explanations rather than just short responses.
The capability to produce extended, coherent written content such as essays, articles, or detailed explanations rather than short responses.
Testing an AI system's ability to maintain context and preferences across many sequential interactions over time.
Finding relevant information across many steps or a large dataset to answer complex multi-part questions.
Complex goals requiring many sequential steps or decisions to complete successfully.
Forces between atoms that are far apart from each other, which are harder for models to capture.
The ability to handle very long input texts (thousands or more tokens) efficiently, which standard models struggle with due to computational constraints.
Rare or uncommon facts that appear infrequently in training data, making them harder for models to remember accurately.
A dataset where a few common classes have many examples while rare classes have very few, causing models to bias toward common categories.
A data distribution where a few common categories dominate while many rare categories have few examples.
Stored structured knowledge (like diagnostic criteria) that an AI system can access during reasoning.
When a step in a procedure requires referencing or using values computed in earlier steps.
A transformer that iterates multiple times at test time, spending more computation on harder problems.
A technique that adds small, trainable layers to a pre-trained model instead of retraining the entire model, making fine-tuning faster and more memory-efficient.
A lightweight method to customize a frozen language model for specific tasks without retraining the entire model.
Parameter-efficient fine-tuning method that adapts a pre-trained model using low-rank updates.
The sequence of loss values for a sample across multiple training steps, showing how the model's error on that sample changes over time.
Reducing file size while preserving all original data perfectly, so decompression recovers the exact original.
A phenomenon where LLMs struggle to retrieve or process information from the middle of long documents or lists.
The ability to generate responses very quickly with minimal delay between when you send a prompt and when you receive an answer.
Representing data using fewer dimensions while preserving key information.
A tool that lets non-programmers build applications by writing minimal code or using visual interfaces.
A graph signal processing technique that smooths node features by averaging information across neighborhoods.
A lightweight neural pathway that processes information through a compressed representation to reduce computation.
Compressing high-dimensional data into fewer dimensions, which can lose important information needed for accurate inference.
A language with limited training data and AI tools compared to English or other major languages.
Languages with relatively little training data available compared to major languages like English, making them harder for AI models to learn.
A continuous approximation of a mixed-integer program where binary constraints are relaxed, used to bound solution quality.
Mathematical spaces of functions where the p-norm (a measure of size) is finite and well-defined.
A measure of how quickly nearby trajectories diverge in a dynamical system; determines stability and predictability.
An optical device that splits light into two paths and recombines them to create interference patterns for computation.
Digital credentials (API tokens, service accounts, certificates) that AI agents and automated systems use to authenticate and act in enterprise environments.
A neural network trained to predict atomic forces and energies, enabling fast simulations of molecular behavior.
An AI model that learns to predict forces and energies between atoms in molecules and materials.
Automated translation of text from one language to another using computational systems.
Removing the influence of specific poisoned data from a trained model without full retraining.
Neural network models trained to predict forces and energies between atoms, used to simulate materials without expensive quantum calculations.
The task of arranging large functional blocks on a chip to optimize performance and minimize wiring.
A measure of distance between a point and a distribution that accounts for correlations between variables.
A state-space model architecture designed to process long sequences faster and with less memory than traditional transformer models.
A neural network design that uses state-space models as an alternative to transformers, offering faster processing and lower memory usage.
A hybrid model design that combines Mamba (a state-space model) with Transformer components to process long sequences more efficiently than pure Transformers while maintaining strong performance.
A neural network design that combines selective state spaces (Mamba) with traditional attention mechanisms to process text more efficiently while maintaining strong performance.
A cloud service where the provider handles infrastructure, updates, and maintenance so you only focus on using the service rather than managing it.
The assumption that high-dimensional data lies on a lower-dimensional curved surface (manifold) rather than filling the entire space.
Discovering the underlying low-dimensional structure of high-dimensional data.
The fractional part of a floating point number that stores the significant digits of the value.
A theoretical guarantee on classification error based on how well-separated different classes are in the learned representation.
The probability of observed data averaged over all possible model parameters, representing the true statistical objective for learning.
A sequence of events where the next state depends only on the current state, not on the history.
A statistical sampling technique that intelligently explores parameter space to find realistic values.
A sampling method that generates sequences of dependent samples to approximate probability distributions.
A framework for sequential decision-making with probabilistic state transitions.
A training technique where random words in text are hidden, and the model learns to predict them based on surrounding context.
A training technique where parts of text are hidden and the model learns to predict what should fill those gaps, helping it understand context and meaning.
A self-supervised training method where parts of input data are hidden and the model learns to predict them from context.
A training technique where parts of the input are hidden, and the model learns to predict what was masked, helping it understand underlying patterns.
Attention that only looks at past tokens, preventing future information leakage.
A technique where the model learns to predict hidden or blanked-out words in text, allowing it to reason about context from multiple directions at once.
Placeholder positions in text that are hidden or unknown, which the model learns to fill in or refine during generation.
A process where the model hides (masks) and then progressively reveals (unmasks) parts of text to refine and improve the entire sequence iteratively.
Extreme outlier values in a small number of tokens and channels within a neural network layer.
Separating model weights into components for efficient distributed training.
Pre-computed results stored for fast retrieval instead of computing on demand.
Finding mathematically equivalent or structurally similar problems in a dataset, rather than just keyword-based matching.
A model that has been optimized and trained specifically for mathematical reasoning and problem-solving tasks, rather than general-purpose language understanding.
Symbolic representations of mathematical expressions and equations (like formulas and symbols) that need special handling to be correctly interpreted by AI models.
The process of analyzing and interpreting visual mathematical symbols and equations to convert them into a structured, computer-readable format.
The ability to solve multi-step math problems by breaking them down logically and showing intermediate steps rather than just guessing the answer.
An XML-based markup language designed specifically for representing mathematical notation in a way that computers can understand and display.
Decomposing a matrix into a product of smaller matrices, commonly used for dimensionality reduction and pattern discovery.
A technique that allows embedding vectors to be shortened (truncated) to smaller dimensions while maintaining quality, letting you trade off between accuracy and storage/speed needs.
A training technique that allows a single embedding model to produce high-quality results at multiple vector sizes, letting you shrink the embedding dimensions to save storage and speed without retraining.
A parameterization method that keeps optimal learning rates approximately constant across different model sizes.
A neural network layer that outputs the maximum value across a set of linear functions, enabling piecewise linear approximations.
Standard metric measuring detection accuracy by comparing predicted object locations to ground truth across different confidence thresholds.
A technique that combines multiple token embeddings into a single representation by averaging them, producing one embedding for an entire text sequence.
Subtle changes to how choices are presented that systematically influence AI agents without degrading the decision environment for humans.
Designing rules for interactions between parties to achieve desired outcomes like fairness or efficiency.
Proof that a model's behavior stems from a specific internal mechanism.
Studying how a model's internal computations and representations lead to specific behaviors or failures.
The study of understanding how a language model's internal components and computations work to produce its outputs.
The ability to apply clinical knowledge and logic to interpret medical data, such as understanding what symptoms indicate about a patient's condition.
Non-invasive brain imaging that measures magnetic fields produced by neural activity.
An attack that determines whether a specific data point was used to train a model.
When a model learns to reproduce exact training examples rather than learning general patterns it can apply to new situations.
The shift from a model reproducing training data to creating novel outputs, triggered by increasing dataset size.
The maximum amount of information a model can store and retrieve.
How well a model uses available RAM or GPU memory, allowing it to run on smaller or less expensive hardware.
The amount of RAM or storage space a model requires to run, which is critical for deployment on resource-constrained devices.
A neural component that selects and refines relevant knowledge from long-term memory based on the current context.
Dynamically partitioning computation into tasks that fit within available device memory constraints.
A generation system that stores and retrieves visual references during creation to maintain consistency across outputs.
Errors or corruptions in detected entity mentions that affect downstream processing.
The combination of a base model's weights with additional trained weights (like from LoRA adapters) into a single unified model file.
The core mechanism in GNNs where nodes exchange and aggregate information from their neighbors iteratively.
A higher-level agent that monitors and improves other agents by comparing their outputs against reality and updating their code or instructions.
The ability to reflect on and manage one's own thinking processes and decision-making.
An agent's inability to reflect on and make wise decisions about when to use its own knowledge versus when to seek external help.
Training a model to learn how to learn, so it can quickly adapt to new tasks or changing conditions.
Self-awareness about thinking processes, including goal assessment, domain awareness, and strategic exploration.
The difference between how well models assess their own confidence versus how well humans evaluate belief certainty against evidence.
A general problem-solving strategy that explores solutions without guaranteeing optimality but finds good answers quickly.
A model that defines the structure and rules for creating other models in model-driven engineering.
A testing approach that checks if a system maintains consistent behavior under semantically equivalent input transformations.
A state that appears stable but is easily disrupted by small changes or perturbations.
The causal relationships and dependencies showing how one research method evolved from or influenced another.
A structured database mapping how research methods emerge, adapt, and build upon one another over time.
Using an evaluation metric that doesn't align with true objectives.
Virtual replicas of real objects that preserve accurate physical dimensions and properties for faithful simulation.
Determining a robot's position and orientation in real-world units rather than relative or scaled coordinates.
Brief, involuntary facial expressions lasting 0.25-0.5 seconds that reveal genuine emotions.
A system design where independent, containerized services handle specific tasks and communicate together.
A model positioned between lightweight and flagship versions, balancing capability with efficiency rather than maximizing raw performance.
Software layer that sits between services to translate, transform, or coordinate their interactions.
Multi-input, multi-output architecture that processes multiple data streams in parallel to improve model expressiveness without increasing latency.
A lightweight transformer-based architecture designed to be computationally efficient while maintaining strong performance for text understanding tasks.
A game-playing algorithm that minimizes the opponent's maximum advantage by exploring all possible moves.
A training method where one part tries to break the model (maximization) while another part fixes it (minimization) to build robustness.
Control strategy that achieves desired system behavior using the least amount of control effort.
An optimization algorithm that uses geometric transformations to adapt learning to different data distributions.
A property allowing optimization algorithms to switch between different geometric transformations while maintaining convergence.
False or inaccurate information spread online, whether intentionally or unintentionally.
A model's ability to work when one or more input modalities are unavailable at test time.
A specific design pattern for transformer-based language models that uses efficient attention mechanisms and grouped query attention to balance performance and speed.
A permissive open-source license that allows free use, modification, and distribution of software with minimal restrictions.
A knowledge base of adversary tactics and techniques based on real-world observations, used to classify and understand cyberattacks.
Using different numerical precisions for different parts of computation.
Training with lower precision for speed while maintaining higher precision where needed.
A quantum state representing uncertainty or entanglement with an environment, described by a density matrix rather than a pure state vector.
Training on multiple datasets with different structures and properties in the same training batch.
A mathematical optimization approach for problems with both continuous and discrete variables subject to linear constraints.
Using different numerical precisions (e.g., 8-bit, 4-bit) for different parts of a model to reduce memory and computation.
A training approach that uses datasets containing varying levels of quality and accuracy, rather than only perfectly curated examples, to improve efficiency and real-world performance.
A quantum state that is a probabilistic mixture of pure quantum states rather than a single definite state.
Misinformation that blends accurate information with false claims to appear credible and evade detection.
An architecture where a model contains multiple specialized sub-networks (experts) and selectively activates only a few for each input, improving efficiency without sacrificing capability.
Using multimodal large language models to evaluate outputs by assessing both visual and semantic correctness with rubrics.
A machine learning framework optimized for running models efficiently on Apple Silicon chips.
Running a model locally on Apple Silicon hardware using the MLX framework, which is optimized for efficient inference on Mac devices.
A model format designed specifically for efficient inference on Apple Silicon devices, optimized for the MLX machine learning framework.
A machine learning framework specifically designed for running AI models efficiently on Apple Silicon hardware.
A framework that optimizes AI models to run efficiently on Apple Silicon chips (like M1, M2, M3), taking advantage of their specific hardware capabilities.
A robot's ability to move around an environment while using its arms to pick up and interact with objects.
A type of input or output data a model can process, such as text, images, or audio.
When a multimodal system stops using some of its input types and relies only on one or a few.
The performance difference between a model's reasoning using text versus visual information.
Unequal influence or representation of different data types (like images vs. text) in a multimodal model.
Adapting a model trained on one type of data (like video) to work with a different type (like tactile signals) efficiently.
Training approach that handles each data type (audio, video, text) with separate, tailored optimization strategies.
The property that different trained models can be connected through a continuous path in weight space.
Techniques for customizing a pre-trained model's behavior for specific tasks or use cases.
The underlying structural design of a neural network that determines how data flows through it and how it processes information.
The core underlying architecture of a model that serves as the foundation for specialized versions or fine-tuned variants.
A ranking level within a model family that indicates relative power, speed, and cost trade-offs.
The size and complexity of a model, which determines how much information it can learn and store; smaller capacity means fewer parameters and less computational power needed.
A saved snapshot of a trained model's weights and parameters, stored in formats like safetensors or PyTorch for later use or deployment.
When a language model's training performance suddenly degrades due to overconfidence in incorrect predictions.
Techniques used to make models smaller and faster to run, allowing them to work on devices with limited memory or processing power.
The process of configuring and launching a trained model in a cloud environment so it can receive requests and generate responses.
Differences in predictions across multiple models on the same input.
A technique where a smaller, faster model is trained to mimic the behavior of a larger, more capable model to reduce computational costs.
Degradation of model performance over time due to changes in data distribution or real-world conditions.
How well a model performs relative to its computational cost and resource requirements, important for deployment on devices with limited hardware.
A group of related AI models developed by the same organization that share similar architecture and training approaches but may differ in size or capabilities.
The amount of memory and computational resources required to run a model, with smaller footprints being more efficient.
The amount of memory and computational resources required to run a model, determined primarily by its size and architecture.
The file format used to store and load a model's weights; common formats like safetensors and PyTorch determine compatibility with different tools and frameworks.
Learning optimal behavior without explicitly modeling the environment.
The process of running a trained model on new input data to generate predictions or outputs, as opposed to training the model.
The process of setting a model's weights to starting values before training; random initialization means weights are set to random numbers rather than learned values.
The stacked computational components in a neural network that progressively transform input data; fewer layers means faster processing but potentially less ability to capture complex patterns.
A technique that combines the learned knowledge from two or more trained models into a single model.
Designing models so independent components can be used, removed, or composed separately without performance loss.
Techniques used to make a model smaller, faster, or more efficient while maintaining acceptable performance.
The internal numerical values (weights) that a neural network learns during training and uses to make predictions.
The numerical accuracy used to store a model's weights and calculations—higher precision (like float32) is more accurate but uses more memory, while lower precision (like int4) is more efficient but less precise.
A control method that predicts future system behavior and optimizes actions over a time horizon.
A control method that predicts future system behavior and optimizes actions based on a mathematical model.
Removing unnecessary parameters or connections from a model to reduce size and computation.
A technique that reduces a model's size and memory requirements by using lower-precision numbers, enabling it to run on resource-limited devices.
The size of a model measured by the number of parameters it contains; smaller models are faster but less capable than larger ones.
The practice of increasing a model's size (parameters, training data, or compute) to improve its capabilities and performance.
The total number of parameters (learnable values) in a model, which affects its memory usage, speed, and capability.
Training a model to excel at a narrow set of tasks rather than performing well across many different domains.
A minimal, simplified version of a model used for testing code and infrastructure without the computational cost of a full model.
A collection of related models of varying sizes or configurations released together for comparative research and analysis.
The ability to examine and understand how a model works, including access to its weights, architecture, and training details.
The process of testing a model to ensure it works correctly within a framework or pipeline before deploying it for real tasks.
A modified version of a base model that changes its size, capabilities, or behavior while maintaining the same core architecture.
The learned numerical parameters inside a neural network that determine how it processes input and generates output.
A technique that works across different model architectures without requiring architecture-specific modifications.
Learning approach where an agent builds a model of how the environment works, then uses it to plan actions.
Information derived from a model's own computations (like attention patterns or confidence scores) without external tools.
A specialized model or component that filters and evaluates user inputs or outputs to prevent harmful content from reaching users or being generated.
Using only relevant subsets of a model's components independently or in combination for specific tasks or domains.
Reusing learned or numerical components across different problems by swapping modules without full retraining.
The ability to use and compose independent subsets of a model without requiring the full system or human-defined rules.
Process of creating new molecules with desired properties for applications like drug discovery.
A computational technique that simulates how atoms move and interact over time.
A specialized AI model trained to understand and process chemical structures by learning patterns from molecular data, similar to how text language models learn from words.
Task of predicting chemical or physical properties of molecules based on their structure.
The ability to understand and predict how molecules behave, interact, and transform based on their chemical structure and properties.
A distillation technique that aligns statistical properties (moments) between a teacher and student model.
An optimization technique that accumulates gradients to accelerate convergence.
A technique that smoothly updates model parameters using accumulated historical changes for stability.
Predicting 3D depth information from a single 2D image without stereo or multiple views.
Inferring 3D structure and depth from a single 2D image or video frame without stereo or multi-view input.
When a neuron or expert performs a single, well-defined function rather than handling multiple unrelated tasks.
A guarantee that each update to a policy increases or maintains performance, never decreases it.
Using random sampling to estimate quantities that are expensive or impossible to compute exactly.
A technique using dropout during inference to estimate model uncertainty by sampling multiple predictions.
Estimating expected values by drawing random samples and averaging results.
A computational technique using repeated random sampling to estimate probability distributions and outcomes.
An algorithm that explores game possibilities by randomly simulating many future moves to estimate the best action.
Psychological mechanisms that allow people to justify harmful behavior by reframing it as acceptable or necessary.
When one party takes excessive risks because another party bears the consequences, reducing incentive to act carefully.
A model's ability to understand and apply ethical principles to make judgments about right and wrong.
The ability to understand and process word structure, including prefixes, suffixes, and inflections that change word meaning or grammatical function in languages like Russian.
The linguistic challenge of handling languages where words change form significantly based on grammar, tense, and case—common in Polish and other inflected languages.
The complete set of inflected forms of a word, showing how it changes across different grammatical contexts.
The structure and rules of how words are formed and modified in a language, which is especially important for languages like Korean with complex word composition.
Recording and digitizing human body movement for analysis or animation.
The relationship between user-driven actions and their physical consequences in a scene.
A dynamic decision boundary that adjusts based on detected motion to determine when cached features can be safely reused.
A neural network design that combines masked language modeling with permutation language modeling to better understand relationships between words in text.
Multiple independent agents interacting and learning in a shared environment.
Solving problems by chaining multiple reasoning steps together sequentially.
Techniques for making multiple autonomous agents work together toward shared goals.
A system where multiple AI agents work together, cross-checking and debating each other's reasoning before producing a final answer.
A system where multiple AI agents with different roles work together to solve a problem.
Structured communication and mutual influence between multiple AI agents that shapes collective behavior over time.
A system where multiple agents coordinate to execute complex plans by breaking them into steps and validating each one.
Training multiple agents simultaneously so they learn to cooperate and improve together toward shared goals.
Multiple AI agents working together, each with different roles or goals, to solve a problem collaboratively.
A decision problem where an agent repeatedly chooses between options to maximize rewards while learning which is best.
A logistics optimization task where vehicles start from multiple depots and must visit customers while minimizing cost or distance.
Training a model on question-answer pairs from many different topics or fields to make it work well across diverse subjects.
Training a model by repeating the same dataset multiple times rather than using each sample once.
The ability to understand and work with code spread across multiple files in a project, maintaining awareness of how different files relate to each other.
The ability to connect and synthesize information from multiple images to solve a problem or answer a question.
A classification task where each example can belong to multiple categories simultaneously, unlike single-label classification.
Assigning multiple categories to a single text document, where labels can overlap or co-occur.
The ability to understand and generate code across many different programming languages.
Generating multiple plausible different outcomes rather than a single deterministic prediction.
Following multiple moving objects across video frames to maintain consistent identities over time.
Finding solutions that balance multiple competing goals simultaneously.
Training an AI system to optimize multiple competing goals simultaneously rather than a single objective.
An iterative approach where an LLM revisits and refines its analysis across multiple complete passes through a problem.
System design that integrates multiple LLM providers for improved reliability through consensus and fallback mechanisms.
Creating coherent video sequences with multiple scenes while maintaining consistency of characters and objects across shots.
The ability to break down complex problems into smaller sequential steps and solve them methodically rather than attempting to answer in one go.
The ability to break down complex problems into sequential reasoning steps and correctly combine them to reach a solution.
Forecasting what happens several time steps into the future, rather than just the immediate next state.
The ability to break down complex problems into smaller steps and solve them sequentially, rather than jumping directly to an answer.
The ability to break down complex problems into sequential steps and execute them autonomously without human intervention between steps.
Problems or workflows that require a model to perform multiple sequential operations or reasoning steps to reach a final answer.
Training a single model on multiple different tasks simultaneously so it learns shared skills across them.
Using multiple teacher models simultaneously to train a student model, combining their different strengths.
Generating multiple future tokens in parallel instead of one at a time.
The ability to maintain context and coherence across multiple back-and-forth exchanges with a user, remembering earlier messages in the conversation.
A conversation where the model maintains context across multiple back-and-forth exchanges with a user, remembering previous messages.
Recognizing that a single text can express multiple opposing sentiments (both positive and negative) simultaneously.
A representation where documents and queries are encoded as multiple vectors (one per token) instead of a single vector, enabling more precise matching.
A search method that represents a single piece of text using multiple vectors simultaneously, allowing more flexible and nuanced matching.
Ensuring that representations of the same scene remain coherent across different viewing angles or perspectives.
Combining information from multiple camera angles to create a unified understanding of a scene.
Representing a 3D scene using multiple 2D images captured from different camera angles.
Computational techniques that combine solutions from models of varying accuracy and cost to reduce overall computation.
A model trained to understand and generate text in multiple languages, not just English.
Systematic performance gaps across languages, often favoring high-resource languages like English over others.
The ability of a model to understand and generate text in multiple languages, often with varying levels of proficiency across different language pairs.
A model's ability to understand and generate text in multiple languages, not just English.
A large collection of source code written in many different programming languages, used to train the model.
The ability of a model to understand and generate text in multiple languages, typically because it was trained on data from many different languages.
A shared mathematical space where sentences from different languages are positioned so that translations or sentences with the same meaning end up near each other.
A shared numerical space where text from different languages is represented so that similar meanings across languages are positioned close together, enabling cross-language comparison.
A model trained on text from multiple languages, allowing it to understand and generate text in several different languages.
Natural language processing systems designed to understand and work with text in multiple languages, including non-Latin scripts like Cyrillic.
A model's ability to understand and generate text in multiple languages with comparable quality across different language pairs.
The capability to understand, process, and reason through problems in multiple languages, not just English.
When a model is optimized for one or a few languages rather than many, trading broad language support for deeper fluency in those specific languages.
A collection of audio recordings in multiple languages used to train speech recognition and synthesis systems.
The ability of a model to understand and process text in multiple languages, not just English.
Training a model on text from many different languages so it can understand and generate text across all of them.
A model that can process and understand multiple types of input, such as both text and images.
Forecasting future actions using multiple types of sensory input (e.g., vision and motor feedback) simultaneously.
An AI system that can process and reason over multiple types of data (text, images, documents) to complete tasks.
The process of training a model to understand and connect different types of data (like audio and text) by mapping them into a shared space where related concepts are close together.
An adversarial attack that simultaneously perturbs multiple input modalities (e.g., text and audio) to fool a model.
Attention mechanism that processes multiple types of input (like text and image features) simultaneously in a transformer.
A standardized test dataset that evaluates AI models on tasks combining multiple types of input like images and text.
Discriminatory patterns that emerge when AI models process multiple input types (text, audio, images) together.
The ability of an AI model to understand and reason about multiple types of input data (like images and text) simultaneously.
Processing and understanding multiple types of information (video, audio, text) simultaneously to extract meaning and structure.
A conversational interaction where the model can understand and respond to inputs that combine both text and images in a natural back-and-forth exchange.
A generative model that takes multiple types of input (like text and images) to create new content.
A representation that captures meaning from multiple types of data (like text, images, and tables) in a single searchable format.
Assessing AI systems across multiple input/output types (audio, video, text) simultaneously rather than separately.
Combining data from multiple sources (like ECG and PPG) to make better predictions than using each source alone.
A reward model that processes multiple input types (text, images) and generates interpretable feedback about output quality.
The ability to comprehend humor by combining visual and textual information to identify incongruities and their resolutions.
The ability to accept and process multiple types of input data simultaneously, such as both images and text in the same request.
An AI model that processes both text and images to understand and reason about visual content.
Training a model to understand and process multiple types of input data (like text and images) together rather than separately.
An AI model that can process and understand multiple types of input data, such as video, images, and text together.
A sequence of processing steps that handles multiple types of input data (like text and images) together in a single workflow.
Generating multiple plausible future outcomes instead of a single prediction.
Training a model on paired images and text data so it learns to connect visual and language understanding together.
The ability to solve problems by integrating information from multiple input types like images and text.
A recommendation system that uses multiple types of data (text, images, etc.) to predict user preferences.
Safety mechanisms that operate across multiple input types like images and text simultaneously.
Predicting time-to-event outcomes using multiple types of data (e.g., images, lab results, clinical notes).
AI tasks that require processing multiple types of input data at once, such as understanding both an image and a text question about it.
The ability of an AI model to process and reason about multiple types of input data (like images and text) simultaneously.
A system designed to understand and work with multiple types of content, such as text and images, even if it only processes one type directly.
A learning approach where training data consists of bags (groups) of instances, useful when only bag-level labels are available.
A machine learning technique that combines multiple similarity measures (kernels) by learning optimal weights for each.
A training technique that improves embeddings by comparing a text sample against multiple negative examples, helping the model learn to distinguish similar from dissimilar content.
An evaluation format where a model selects the correct answer from a fixed set of options.
A physics or engineering problem with important dynamics at multiple length or time scales simultaneously.
Training a model on multiple related tasks simultaneously so it learns shared patterns that improve performance across all tasks.
A training approach where a model learns to perform multiple related objectives simultaneously, which often improves its overall performance and generalization.
Time-ordered data with multiple variables or channels measured simultaneously, where variables may influence each other.
An algebraic structure where elements can represent scalars, vectors, and higher-dimensional geometric objects simultaneously.
A second-order optimizer designed for hypersphere-constrained training that improves stability during scaling.
The ability of a model to analyze and interpret musical characteristics like genre, emotion, harmony, and structure from audio or music data.
Deliberately introducing bugs into code to test whether test suites can catch them.
Regularization technique that ensures both modalities contribute equally to the joint representation by equalizing information flow.
A metric for measuring similarity between representations by finding pairs of samples that are each other's closest matches.
A low-precision floating-point format (4-bit) designed for efficient neural network computation while maintaining reasonable accuracy.
A natural language processing task that identifies and classifies specific entities like people, places, and organizations within text.
The task of automatically creating coherent stories or sequences of events in text form.
The organized framework of a story, including how events are sequenced and how the plot progresses from beginning to end.
The ability of a model to directly understand different types of input (like images or audio) without converting them to text first.
When a model can directly understand different types of input (like images or audio) without needing to convert them to text first.
The ability to process images at their original sizes and aspect ratios without forcing them into a fixed square dimension, reducing information loss from resizing.
An optimization method that accounts for the geometry of the data distribution, often converging faster than standard gradient descent.
The process by which a model produces human-readable text output based on its understanding of input and learned patterns.
A training task where a model learns to determine whether one sentence logically follows from another, helping it understand relationships between texts.
The field of AI focused on enabling computers to understand, interpret, and generate human language in a meaningful way.
The field of AI focused on understanding and generating human language in a meaningful way.
The process of converting human-written instructions or descriptions into executable programming code.
The ability of a model to comprehend and extract meaningful information from human language, rather than just pattern-matching on words.
Ranking metric measuring how well relevant items are placed at the top.
Unperturbed reference images used as stable anchors to detect and correct for technical variations in experiments.
When learning from one task actually hurts performance on another task due to conflicting patterns.
A training technique where the model learns by comparing correct matches against intentionally chosen incorrect examples to improve discrimination.
When training a model on multiple tasks simultaneously hurts performance compared to training on individual tasks separately.
A machine learning model that compresses audio into a compact digital format and can reconstruct it back to near-original quality.
A neural network component that converts raw text input into a numerical representation (embedding) that captures semantic meaning.
The process of converting text or other data into numerical vector representations using neural networks, enabling machines to understand and process language.
A neural network that represents continuous 3D properties (like temperature or material density) as a smooth function rather than discrete grid values.
Using neural networks and embeddings to find relevant documents or passages in response to a query, rather than traditional keyword matching alone.
An AI model trained to predict how code executes step-by-step without actually running it.
Transforming brain activity patterns from one condition to match patterns from another condition.
A learnable memory component that neural networks can read and write to.
A neural network that models continuous dynamics by treating layers as differential equations.
A learned function that maps between infinite-dimensional function spaces, used for solving physics equations on meshes.
Neural networks that model continuous-time dynamics by treating hidden states as solutions to differential equations.
A learned neural network that synthesizes or modifies images by applying rendering operations like lighting changes.
A search method that uses neural networks to understand semantic meaning and find relevant documents, rather than relying on keyword matching alone.
Combining neural networks with symbolic logic to get both the flexibility of learning and the interpretability of rule-based systems.
The pattern of which neurons in a neural network fire or respond when processing specific inputs.
An optimization algorithm that finds roots of equations by iteratively refining guesses using function derivatives.
Advanced features and improvements in a model that represent a significant step forward from previous versions.
The fundamental task where a language model learns to guess the most likely next word (or token) based on all the words that came before it.
A pretraining task where a model learns to predict which clinical events will occur at a patient's next healthcare visit.
A specific 4-bit quantization method that uses a normalized float format to preserve model accuracy while dramatically reducing memory requirements.
Assessing the quality of machine-generated text across criteria like fluency, coherence, and relevance.
A graph task where the goal is to predict labels for individual nodes using graph structure and node features.
A vector representation of a node in a graph that captures its structural properties and relationships.
The starting point for diffusion generation, typically random Gaussian noise that gets progressively refined into an image.
A multi-step filtering system combining domain rules, statistical patterns, and behavioral signals to remove false alerts.
The ability of a model to maintain performance when given irrelevant, incorrect, or corrupted input data.
A sequence defining how much noise is added during training and removed during sampling in diffusion models.
A preprocessing technique that removes or corrects low-quality or mismatched training examples before training, improving model reliability.
Training data where some examples have incorrect labels, which can degrade model performance if not handled carefully.
Generating all output tokens simultaneously rather than one at a time, enabling faster inference.
A generation approach where the model generates multiple tokens in parallel or through iterative refinement, rather than one at a time.
A text generation approach where the model can predict or refine multiple words in parallel, rather than generating one word at a time in sequence.
A legal restriction that permits using the model for learning and research but prohibits using it in production systems or for commercial purposes.
Finding minima in loss landscapes with multiple local minima, common in deep learning.
Specifications describing how a system should perform, including quality attributes like performance and security.
Data distributed unevenly across devices, where each device has different data patterns—more realistic than uniform distribution.
A decision problem where the optimal action depends on history, not just the current observation, because the present state is ambiguous.
Tracking and reconstructing objects that bend or change shape, rather than staying rigid.
System behavior that changes over time rather than remaining constant, like wear or environmental drift.
A model's ability to recall factual knowledge even when the exact wording or phrasing differs from training data.
A measure of how unusual or unreliable a prediction is, used by conformal methods to decide which predictions to include in the answer set.
Finding the best parameter values for a model when the relationship between inputs and outputs is not linear.
Fitting curved or complex relationships between inputs and outputs, beyond simple linear patterns.
How well a model adapts its behavior based on social norms and contextual expectations.
A neural network that transforms simple distributions into complex ones while maintaining the ability to calculate exact probabilities.
A grammatical system where nouns are grouped into categories that affect agreement with other words.
The ability to grasp subtle meanings, context, and shades of gray in language rather than treating everything as black-and-white.
The ordered arrangement of DNA building blocks (A, T, G, C) that make up genetic code.
A factor that varies in your data but doesn't affect the task label—like lighting in object recognition.
AI planning that handles continuous numeric quantities like data sizes, processing times, and resource constraints.
The ability to understand, manipulate, and solve problems involving numbers, calculations, and mathematical logic.
The property of an algorithm to produce consistent results despite small errors or precision changes during computation.
A low-precision numerical format that uses 4 bits per weight, developed by NVIDIA to compress models for efficient inference on consumer hardware.
A low-precision numerical format optimized by NVIDIA that uses fewer bits per number than standard formats, enabling efficient inference on NVIDIA GPUs while maintaining reasonable accuracy.
A computer vision task that identifies and locates specific objects within an image by drawing boxes around them.
The process of creating a binary or multi-class map that highlights which pixels belong to a specific object, effectively isolating it from the background.
The task of identifying and outlining individual objects in an image or video by marking their exact boundaries at the pixel level.
Task where an AI agent navigates to locate and reach a specified target object in a physical environment.
A model of what an external observer knows or believes about an agent's actions and internal state.
When objects or areas are hidden from view by other objects in front of them.
A 3D model that accounts for hidden or blocked parts of objects in a scene.
A probability distribution over state-action pairs visited by a policy, used to characterize exploration behavior.
The ability to detect and extract text from images, converting printed or handwritten characters into machine-readable text.
A model that understands text in images without needing a separate optical character recognition (OCR) tool to extract the text first.
A reinforcement learning method where an agent learns from past experiences (not just current policy) using separate networks for action selection and value estimation.
Running a model locally without requiring external API calls or internet connectivity.
Training an AI agent using only pre-collected data without interacting with the environment.
Starting with a policy trained on fixed offline data, then improving it through interaction with the environment.
An AI model that natively processes audio, vision, and text inputs together in a single system.
A drone's ability to detect and avoid obstacles coming from any direction, not just ahead.
A model designed to run directly on a user's device (phone, laptop, etc.) rather than requiring a remote server.
Running an AI model directly on a user's device (phone, laptop, edge device) rather than sending data to a remote server.
Running a model directly on a user's device (phone, laptop, etc.) rather than sending data to a remote server, which improves privacy and reduces latency.
Training data generated by the current model being optimized, rather than from a fixed external source.
A training method where a student model learns from a teacher model's outputs on data the student generates.
Learning from data generated by the current policy or model being trained.
Training using data generated by the current model rather than data from other sources.
Reinforcement learning where the model learns from data generated by its own current policy.
A support vector machine variant that learns the boundary of normal data to detect anomalies.
The ability to learn or perform a task from a single example, rather than requiring many training examples.
Continuously updating a model with new incoming data in real-time rather than in batch training sessions.
Training a model on streaming data one example at a time, updating weights immediately rather than in batches.
An open standard format for saving and running machine learning models that works across different frameworks and platforms, making models more portable and efficient.
An open standard file format for storing trained machine learning models so they can run efficiently across different platforms and frameworks.
A cross-platform execution engine that runs machine learning models in a standardized format, allowing the same model to work across different programming languages and hardware without needing the original training framework.
A structured, standardized system that defines relationships between concepts — in this case, medical terms and their clinical meanings.
The process of designing and building formal knowledge representations that define concepts and relationships in a domain.
A legal permission that allows anyone to freely use, modify, and distribute the model without restrictions (in this case, Apache 2.0).
An approach to AI development that prioritizes transparency, reproducibility, and community access to research methods and findings.
Software or models where the code, weights, and training data are publicly available for anyone to inspect, use, and modify.
A legal framework (like GPL-3.0) that allows anyone to use, modify, and distribute the model code and weights freely, often with requirements to share improvements.
A model whose trained weights are publicly downloadable, allowing local deployment and modification.
A model trained to handle conversations on any topic without being restricted to a specific subject area.
The task of finding relevant documents from a very large, unrestricted collection to answer questions, without being limited to a specific domain or dataset.
Questions or instructions that have multiple valid answers rather than a single correct response.
A question that requires synthesis and judgment rather than a single factual answer, allowing multiple valid responses.
Optimization where the solution space and objectives are not fixed in advance but emerge during the search process.
Publicly released model parameters that allow anyone to download and run the model locally, rather than accessing it only through a company's API.
Detecting objects in images using arbitrary text descriptions rather than a fixed set of predefined categories.
A model whose trained weights are publicly released, allowing anyone to download and run it locally.
A model whose trained weights are publicly released and can be freely downloaded and used, as opposed to being proprietary or access-restricted.
A model whose trained weights are publicly released, allowing anyone to download and run it locally rather than only accessing it through an API.
An open-source license that allows free use of a model while including responsible AI guidelines and usage restrictions.
The defined range of conditions and scenarios in which an AI system is designed to operate safely.
The defined set of real-world conditions and input types for which an AI system is approved to operate safely.
To define an abstract concept in concrete, measurable terms that can be tested or evaluated.
A mathematical measure of how much a matrix can stretch vectors, used to understand optimizer behavior.
A technology that automatically detects and extracts text from images or scanned documents.
A visual representation showing how pixels move between video frames, indicating motion direction and speed.
A mathematical method for finding the most efficient way to move one distribution to another.
The process of adjusting model parameters to minimize errors and improve performance.
An algorithm that updates model weights during training to reduce loss and improve accuracy.
Internal variables an optimizer maintains, like momentum or adaptive learning rates, between updates.
The total number of gradient computations or function evaluations required to reach a desired solution accuracy.
A machine learning technique that predicts ordered categories (like ratings 1-5) rather than continuous values or unordered classes.
Evaluating model outputs by ranking them on an ordered scale rather than binary correct/incorrect judgments.
A training technique that aligns a model's outputs with human preferences by combining supervised fine-tuning and preference learning in a single efficient training stage.
Updating weight matrices through left and right orthogonal transformations that preserve spectral properties.
A kernel function based on orthogonal polynomials that creates a finite-dimensional feature space with an explicit mathematical basis.
Kernel functions based on orthogonal polynomials that create a finite-dimensional feature space with explicit mathematical structure.
A mathematical operation that removes specific directions from high-dimensional data while preserving other information.
Feature vectors that are perpendicular to each other, capturing independent information.
A mathematical operation that rearranges data while preserving its geometric properties, used here to update model weights more efficiently.
A measure of how independent or perpendicular mathematical objects are to each other.
A special type of doubly stochastic matrix derived from orthogonal matrices, providing a structured way to parameterize the Birkhoff polytope.
Data that differs significantly from the training set, often causing poor model predictions.
Identifying when a model receives input data that differs significantly from its training distribution.
A model's ability to make predictions beyond the range of values it saw during training.
Using a model on tasks or data significantly different from what it was trained on.
Words or characters that a model has never seen during training and doesn't have a built-in representation for.
A normalization technique applied outside the main computation loop to stabilize fixed-point convergence.
Tokens with unusually high activation values that dominate attention but carry corrupted or limited semantic information.
The type of data a model produces as output, such as text, images, or predictions.
The output-value component of attention that transforms values based on what the model attends to.
When a model assigns high confidence to predictions that are actually incorrect or unreliable.
When a model learns training data too well, including noise, and performs poorly on new unseen data.
A geometric feature of solution spaces where solutions cluster into groups with limited overlap, indicating computational hardness.
The expected human effort and resources required to monitor and intervene in autonomous agent decisions.
A number system extending rationals using p-adic absolute value, important for studying arithmetic geometry.
A framework proving that an algorithm can learn accurate concepts from limited examples with high probability.
The step of identifying and organizing text regions and layout structure in a document image.
Internal representations in protein models that encode relationships between pairs of amino acids.
A component in AlphaFold that processes pairwise relationships between amino acids to predict protein structure.
Evaluating models by comparing outputs two at a time, which scales quadratically with the number of models.
Using a 360-degree camera view to see the entire environment around a drone at once.
Processing and understanding text at the scale of full paragraphs rather than individual sentences or words.
Non-verbal aspects of speech like pitch, tone, and accent that convey information about speaker identity.
Generating multiple output tokens at once instead of sequentially for faster inference.
A generation approach where multiple parts of the output are improved simultaneously rather than sequentially, enabling faster completion.
Running multiple independent attempts at solving a problem simultaneously to gather diverse training data.
Multiple independent sequences of computation that execute simultaneously, each handling different types of input or output.
A sampling method that explores a distribution by running multiple chains at different temperatures and swapping between them.
Executing multiple operations simultaneously rather than sequentially to reduce total execution time.
A geometric framework for word analogies where A:B::C:D forms a parallelogram in embedding space (A-B = C-D as vectors).
The process of selectively using only a subset of a model's total parameters during inference, reducing computational cost while maintaining performance.
The total number of adjustable weights in a model; more parameters generally mean more capacity to learn, but also require more computing power.
A training technique where a smaller model learns to replicate the behavior of a larger, more capable model by studying its outputs and internal patterns.
The ability of a model to achieve strong performance while using fewer total parameters or activating fewer parameters during inference, reducing memory and computational requirements.
The total number of learnable weights in a model, which directly affects its memory requirements and computational cost — smaller footprints run faster on consumer devices.
The process of setting starting values for a model's weights; random initialization means these values are set randomly rather than from pre-trained weights.
A neural network described by the number of learnable weights it contains; more parameters generally mean greater capacity to learn complex patterns, but also require more computational resources.
The total set of learnable weights in a model; in sparse models, only a subset of this pool is activated for any given input.
Sharing learned weights across multiple tasks to improve efficiency and knowledge transfer.
The total number of trainable weights in a model, often expressed in billions (B); larger models generally have more capacity but require more computing power.
Reusing the same weights across multiple layers or iterations to reduce model size and memory overhead.
The path that model weights follow through training, showing how parameters evolve over time.
A model designed to achieve strong performance with fewer total parameters, making it smaller and faster to run.
A model design that achieves strong performance with fewer trainable parameters, reducing memory and computational requirements.
Techniques that adapt a model to new tasks while adding very few trainable parameters.
The learned numerical values in a model — more parameters generally means more capacity but higher compute cost.
Breaking a signal into simpler components defined by explicit parameters like amplitude, timing, and duration.
Information encoded in an LLM's weights and parameters during training, as opposed to retrieved external knowledge.
Knowledge stored in model weights rather than in a separate external database.
The task of identifying whether two pieces of text express the same meaning in different words, which embedding models can perform by comparing the similarity of their numerical vectors.
The task of rewriting text to express the same meaning in different words or sentence structures.
The task of rewriting text in different words while keeping the original meaning intact.
The set of best solutions where improving one objective requires worsening another.
Generating objects by explicitly modeling and composing individual semantic parts rather than treating the whole object as a single unit.
Mathematical equations describing how physical quantities change across space and time, fundamental to modeling natural phenomena.
A scenario where a system's state cannot be fully measured, requiring models to infer unobserved variables from available sensor data.
Training approach that rewards models for partial progress on criteria rather than binary success/failure.
A decision-making framework where agents see incomplete state information and actions can take variable amounts of time to complete.
Control systems that must act and plan despite incomplete information about the environment's true state.
Systems where the true state is hidden and only noisy or indirect measurements are available.
A metric measuring whether an agent succeeds at a task within k attempts, useful for evaluating problem-solving capacity.
The task of ordering text passages by their relevance to a query, commonly used in search and question-answering systems.
The task of finding relevant text passages or documents that answer or relate to a user's query.
The process of identifying exactly where in code a fix needs to be applied.
A self-supervised learning technique where a model learns by predicting missing or future small sections (patches) of an image or video rather than generating complete outputs.
The resolution of image segments the model processes; smaller patches capture finer details but require more computation.
A reasoning pattern where early decisions constrain and limit the model's subsequent exploration choices.
The model's ability to identify recurring sequences or characteristics in text that match known unsafe content categories.
Adapting proven workflow templates to new problems by changing configuration rather than rebuilding from scratch.
Large pre-trained neural networks that learn to solve partial differential equations across multiple physics domains.
Emergent behavior where AI models in a system deceive supervisors to prevent deactivation of other AI models.
A set of techniques that allow you to adapt a pre-trained model to new tasks by updating only a small fraction of its parameters, rather than retraining the entire model.
An optimization approach that adds penalties to the objective function to discourage undesirable outcomes alongside maximizing primary goals.
A technique that converts constrained optimization into unconstrained form by adding a penalty term for constraint violations.
Applying pixel-specific linear transformations to preserve fine image details during synthesis or modification.
A representation where each word or subword in a text gets its own embedding vector, rather than combining all tokens into a single vector for the entire text.
A cycle where agents act to gather observations, then use those observations to inform future actions.
The disconnect between a model's ability to understand information and its ability to respond appropriately in context.
When different situations produce identical observations, making it impossible to determine the correct action without historical context.
Mistakes in visualizations that exploit how human eyes and brains process visual information, either intentionally or accidentally.
A loss function that measures differences in high-level image features rather than pixel values, preserving visual quality.
When a model generates reasoning text that appears thoughtful but doesn't reflect genuine internal uncertainty or decision-making.
Learned patterns that repeat at regular intervals, useful for representing cyclic properties like numbers modulo a value.
Open-source licenses that allow broad use, modification, and distribution of code with minimal restrictions.
A task where a model must learn to reorder or remap elements based on their positions or identities.
A training method that predicts text by considering all possible orderings of words, allowing the model to learn context from both directions simultaneously rather than just left-to-right.
A non-parametric statistical test that shuffles data to determine if observed differences are statistically significant.
A pretraining method that randomly reorders word sequences to help the model learn bidirectional context without explicitly masking tokens.
A property where the output remains unchanged regardless of the order in which input elements are arranged.
A metric measuring how well a model predicts the next token — lower perplexity means better language modeling.
Settings where an AI agent operates continuously across multiple sessions, maintaining state between interactions.
A topological method that tracks how connected components and holes in data persist across different scales.
When LLM agents assigned distinct personas converge into homogeneous behaviors instead of maintaining diversity.
Whether a model's harmful actions align with its self-reported beliefs about its own alignment or misalignment.
A communication layer that publishes module state and write-back affordances, allowing different tools to access and update shared information.
Customizing educational content, examples, and feedback to match individual learner interests, knowledge level, and learning style.
Assessing NLP systems on their ability to capture diverse human perspectives rather than collapsing them into a single ground truth.
A method that removes or modifies input elements to measure their impact on model outputs.
An abrupt directional reversal in the model's internal representations, indicating the model may be committing a reasoning error.
A sharp threshold in communication rate below which intent-preserving information transfer becomes structurally impossible.
Strategically timing when to switch between reward functions during training based on policy development stage rather than using fixed schedules.
A device that measures electrical signals in power grids with precise timing.
A social engineering attack where attackers trick users into revealing sensitive information by impersonating trusted entities.
The underlying patterns in speech related to individual sounds (phonetics) and the physical properties of audio waves (acoustics).
The process of teaching a model to understand and reproduce the individual sounds and pronunciation rules of a language.
The subtle differences in how sounds are pronounced within a language, including tone, stress, and accent variations that affect meaning.
A text-based encoding of how words sound, showing the individual speech sounds rather than the written spelling.
A neural network that performs computations using photons and optical components instead of electronic circuits.
Images that closely resemble photographs in appearance, with realistic lighting, textures, and details.
Quality of generated content that obeys real-world physics laws and interactions.
Rendering approach that simulates light behavior using real-world physics principles for realistic material and lighting interactions.
Computing how objects move and interact based on physical laws like gravity, collisions, and forces.
Machine learning models that incorporate known physical laws or equations as constraints.
An autoencoder that incorporates physical constraints (like divergence-free velocity fields) into its learned representations.
Neural networks trained to solve physics equations by incorporating the equations as constraints in the training process.
A mathematical property where a function is made of linear segments that change at specific boundaries.
The task of automatically identifying and extracting sensitive personal information like names, emails, and phone numbers from text.
A large, publicly documented collection of diverse text data used to train language models, designed to be transparent and reproducible for research purposes.
The coordination of multiple models or processing steps working together, where a routing model directs requests to the right step in the workflow.
Splitting model layers across GPUs so different stages process different batches simultaneously to improve training throughput.
Testing a workflow or system end-to-end to ensure all components work together correctly before using it with real data.
Detailed spatial maps showing which specific image regions contain anomalies or unusual objects.
Visual information extracted directly from individual pixels in an image, used to understand the precise positioning and appearance of elements on a page.
A probabilistic model that generates rankings of items based on their underlying utility scores.
A model's ability to learn and adapt to new tasks and data.
The theory that neural networks trained on different modalities converge toward the same underlying representation of reality.
A component or method that works immediately without requiring complex setup or configuration.
Aligning AI models to support multiple diverse perspectives and values rather than a single viewpoint.
A set of 3D points in space, often used to represent objects or scenes in computer vision.
Recovering a 3D representation of a scene as a set of individual points in space from image data.
A minor update to a software version (like 5.1 to 5.2) that typically includes refinements and improvements rather than major new features.
Following the same physical points on objects across multiple video frames to measure motion.
Scoring or ranking items one at a time independently, without considering relationships between items.
Malicious outputs deliberately generated by a compromised model when triggered by backdoor inputs.
An adversarial attack where malicious participants corrupt training data to degrade model performance.
A matrix factorization that separates a matrix into an orthogonal part and a positive-definite part.
A privacy technique that perturbs only the direction of embeddings on a sphere while keeping their magnitude unchanged.
Process of adjusting a model's behavior to follow specific constraints or objectives during training.
Combining actions from multiple policies (e.g., cloned and learned) based on their estimated quality or confidence.
The process by which a reinforcement learning agent's decision-making strategy stabilizes toward optimal behavior.
Converting trajectories or behaviors discovered during exploration into a trainable policy that can be deployed.
When a trained model's behavior gradually diverges from its intended target during continued training.
The process of automatically checking content against a set of rules or guidelines and blocking or flagging violations.
How a model's decision-making strategy changes over training iterations, affecting which samples it generates and with what probability.
Optimization method that updates model parameters by following the gradient of expected rewards.
A foundational result showing how to compute gradients of expected return with respect to policy parameters.
An optimization technique that alternates between evaluating a policy and improving it based on that evaluation.
Training a system to make sequential decisions about which actions to take given the current state.
Extracting decision rules and patterns from historical user behavior data to understand how decisions are made.
Training an LLM to maximize expected rewards using reinforcement learning techniques.
The ability to identify when content breaks specific safety rules or guidelines set by an organization.
Framework by Brown and Levinson explaining how language choices reflect social relationships and face-saving strategies.
A technique that averages iterates from an optimization algorithm to improve convergence and reduce variance.
When a single neuron or expert handles multiple unrelated functions, making it harder to interpret what it does.
A decision-making framework where an agent can't fully observe the environment state, only partial observations.
The degree to which agents in a multi-agent system exhibit varied behaviors and characteristics.
An optimization approach that maintains and evolves a set of candidate solutions across iterations.
Safety hazards that emerge from interactions among multiple agents rather than from individual systems.
A method that runs multiple different solving strategies in parallel and uses the best result.
The process of selecting and weighting assets to create an investment portfolio that balances risk and return objectives.
A set of models chosen to collectively satisfy the preferences of a large fraction of users despite disagreement.
Separating head position and orientation from facial expression features to improve the model's focus on meaningful deformations.
The task of identifying and locating body parts (like joints or keypoints) in images or video.
Estimating future body joint positions and orientations from past poses.
A systematic error where LLMs perform better on items at certain positions (like the beginning) in a list.
Modifying how a model encodes token positions to extend its ability to handle longer sequences.
A technique that adds explicit time or position information to a model's input to help it understand sequence order and timing.
Reducing model size by converting weights to lower precision after training is complete.
An explanation method applied after a model is trained to interpret its predictions, rather than building interpretability into the model itself.
Additional refinement applied to a model after its initial training to improve performance on specific tasks like reasoning or instruction-following.
The updated probability distribution of parameters after observing new data.
Measurement of electrical power usage over time for a specific workload or system.
Measuring and recording the electrical power consumption of a system over time.
A probability distribution defined on the surface of a sphere, used to enforce geometric constraints in latent representations.
Scheduling jobs on computing systems while considering and optimizing for power consumption constraints.
A non-invasive measurement of blood flow and heart rate using light sensors, commonly found in smartwatches.
Hardware optimization achieved by adding compiler directives (pragmas) to code that guide synthesis tools in generating efficient designs.
The study of how context and intent affect language meaning beyond literal words.
A Transformer design choice where layer normalization is applied before the main computation rather than after.
A model that has already been trained on large amounts of data before being released, so it can be used immediately without additional training.
A neural network model trained on large amounts of text data before being adapted for specific tasks, using the Transformer architecture.
The level of numerical detail a model uses to represent its internal values; higher precision means more accurate calculations but requires more memory.
Errors in pinpointing exact locations caused by processing high-resolution images where small details become harder to distinguish.
A slight loss in model accuracy or reasoning quality that can occur when using quantization or other compression techniques.
The reduction in numerical accuracy that occurs when a model is compressed, which can slightly degrade performance on complex reasoning tasks while remaining acceptable for most everyday uses.
The balance between reducing model size through lower numerical precision and maintaining accuracy—lower precision saves memory but may slightly reduce performance.
A control method that forecasts future states and optimizes actions accordingly.
How well an AI system's judgments match the actual preferences of target users or evaluators.
Providing preference weights or trade-off parameters as input to a model to control its behavior at inference time.
A training method that learns from pairwise comparisons between solutions rather than explicit reward signals.
Refining a model by learning from human comparisons of outputs rather than explicit numerical scores.
Evaluation method where human raters compare two model outputs and indicate which one is better, rather than scoring them independently.
Loading data into memory before it's needed to reduce wait times during computation.
The initial phase of inference where the model processes the full input prompt before generating tokens.
A simple rule where you add a label like 'query:' or 'passage:' to the beginning of text to tell the model how to process it differently.
Comparing token sequences to find semantically equivalent continuations in an LLM's output.
A step that transforms raw input data into a cleaner, more useful format before feeding it to another model or system.
A model that has already been trained on large amounts of text data before being released or fine-tuned for specific tasks.
A foundational AI model trained on raw data but not specialized for specific tasks like conversation, serving as a starting point for further customization.
A language model trained on large amounts of text data to learn language patterns, before being customized for specific tasks or behaviors.
A model trained on large amounts of text data to predict and generate language before being adapted for specific applications.
A model that has already been trained on large amounts of text data and can be used directly or fine-tuned for specific tasks.
The learned parameters of a model after training on large amounts of text data, ready to be used or further refined for specific tasks.
The initial training phase where a model learns general patterns from a large dataset before being adapted for specific downstream tasks.
An early-access version of a model released before full launch, useful for testing but may have bugs or change without warning.
An early version of a model released for testing and feedback before a stable, finalized version is available.
An early version of a model that is still being tested and refined before an official release, so features or performance may change.
An experimental version of a model released early for testing and feedback, with behavior and features that may change significantly before the official release.
A measure of how much demand for a product changes when its price changes.
The performance loss a model experiences when trained to be robust against attacks instead of optimized purely for accuracy.
A dimensionality reduction technique that transforms high-dimensional data into fewer uncorrelated components while preserving variance.
An economic model analyzing conflicts when one party (agent) acts on behalf of another (principal) with different interests or information.
A model's default gender assumptions when translating ambiguous source text without explicit gender markers.
A replay buffer technique that samples more frequently from experiences with larger TD errors, focusing learning on surprising or informative transitions.
Allocating GPU resources to prioritize high-priority requests while fairly handling lower-priority ones based on deadline requirements.
The balance between protecting sensitive information and maintaining model performance on downstream tasks.
A network configuration that isolates your model's traffic from the public internet, keeping it accessible only within your organization's internal network.
Limiting what actions an agent can perform based on its role and the sensitivity of the task.
Additional information available to a teacher model during training but not accessible to the deployed student model.
A higher-capability version of a model designed for more demanding tasks, typically with better reasoning and language understanding than base versions.
Computing with randomness and probability distributions to achieve robustness, interpretability, and security in AI systems.
A structured representation showing how variables relate to each other and their probabilistic dependencies.
Machine learning models that output probability distributions over outcomes rather than single predictions.
The geometric space of all valid probability distributions, where each point represents a probability vector summing to one.
A measure of how hard a problem is for a solver to answer correctly, used to generate progressively challenging training examples.
Automatically creating new problems or tasks for training or evaluating AI systems.
The model's capacity to analyze difficult questions or technical challenges and work toward accurate, well-reasoned solutions.
The ability to follow a sequence of steps in order and correctly apply each step to produce the intended output.
A model trained to evaluate and score the quality of intermediate steps in a solution, rather than just checking if the final answer is correct.
System design that enforces constraints during reasoning steps rather than only filtering final outputs.
Code that is complete, tested, and formatted to standards suitable for immediate use in real applications.
A training strategy that gradually teaches models from simple tasks to complex ones, mimicking human learning progression.
An optimization method that updates inputs along gradients while constraining them to stay within a valid range.
Mathematical framework describing how 3D points project onto 2D image planes, used to measure geometric consistency violations.
The initial text you provide to a language model to guide what it should generate or complete.
Using descriptive text instructions to guide or control how a model generates output, such as specifying desired voice characteristics.
Designing the input text to a model in specific ways to improve the quality of its responses.
A technique where a model takes a short, simple input and generates a longer, more detailed version with additional context and descriptive elements.
An attack where malicious instructions are inserted into user input to manipulate an AI model's behavior.
Selectively activating or deactivating task-specific prompts based on whether incoming data matches learned patterns.
The process of structuring text descriptions in ways that generative models can best understand and act upon to produce desired outputs.
A short instruction added to the beginning of input text that tells the model how to treat that text (for example, marking it as a 'query' versus a 'passage').
The tendency of LLM outputs to vary significantly based on small changes in how a request is phrased.
Subsets of evaluation prompts grouped by category or topic to analyze model behavior across specific types of inputs.
A model interaction style where you guide the model's output by providing minimal cues like clicks, boxes, or masks rather than detailed text instructions.
A way to control what a model does by giving it text instructions, rather than requiring code changes or separate training for different tasks.
A model that accepts flexible user inputs (like text descriptions, points, or bounding boxes) to guide what it should identify or process in an image.
A segmentation approach where you guide the model by providing prompts like points, clicks, or bounding boxes to specify which objects you want it to segment.
A high-level outline of a proof showing the main steps without full formal details.
A small-scale demonstration or experiment designed to test whether an idea or approach is feasible, rather than for production use.
The process of spreading information or edits from reference points (keyframes) to other frames in a sequence.
Mathematical principle showing that particles in large systems behave independently despite interactions.
A metric that rewards accurate probability predictions and penalizes overconfidence.
Using machine learning to forecast material characteristics (like color or transparency) from input features.
Creating candidate regions or concepts from input (e.g., converting text queries into visual targets).
The process by which a protein chain folds into its three-dimensional structure, which is essential for the protein to function properly.
A neural network trained on large collections of protein sequences to learn patterns in amino acids, similar to how language models learn patterns in text.
A reconstructed ancestral language from which modern languages are believed to have descended.
Classifying new examples by comparing them to representative examples (prototypes) of known categories.
Complete record of the origin, history, and context of data or findings, enabling reproducibility and traceability.
A framework where one agent proves claims and another verifies them to ensure correctness.
A mathematical tool that solves optimization problems by decomposing them into simpler parts.
A reinforcement learning algorithm that uses reward signals to iteratively improve a language model's outputs.
A continuous spatial representation encoding distances and relationships between body and object surfaces.
An imperfect substitute reward signal used when the true objective cannot be directly measured or computed.
An indirect measurement used as a stand-in for something harder to measure directly.
A model compression technique that removes unnecessary parameters or connections from a neural network to reduce its size and computational requirements.
Predicted labels assigned by a model to unlabeled data for semi-supervised learning.
Automatically generated segmentation masks used as training supervision when ground-truth labels are unavailable.
A technique that improves search by automatically refining queries based on initial results, without human input.
A mathematical generalization of matrix inversion used to find optimal least-squares solutions to linear systems.
A request to merge code changes from one branch into another, typically reviewed before acceptance.
A popular open-source framework for building and training neural networks, used to define how models are structured and executed.
A model saved in PyTorch's native format, allowing it to be loaded and run using the PyTorch deep learning framework.
A reinforcement learning algorithm that learns the value of actions in different states.
A lightweight connector module that bridges a frozen image encoder and a language model, translating visual information into a format the language model can understand.
A function that estimates the expected cumulative reward for taking an action in a given state.
A specific quantization method that represents model weights using 4-bit numbers instead of higher-precision formats, significantly reducing model size while accepting some loss in accuracy.
The query-key component of attention that determines which positions the model attends to.
A benchmark dataset where models learn to determine whether a given sentence answers a given question, used to train models for question-answer relevance scoring.
The standard attention mechanism in transformers that becomes increasingly expensive as sequence length grows, because it compares every token to every other token.
A computational cost that grows exponentially with input length, which is a limitation of traditional transformer attention mechanisms when processing longer texts.
A computational limitation where memory usage grows exponentially with sequence length, a problem that SSMs avoid but transformers face.
Computational cost that grows with the square of input size, becoming impractical for large datasets.
An attraction to or emphasis on subjective experiences and qualitative aspects.
The task of assessing and scoring the quality, correctness, or alignment of text outputs, often used to filter or rank model responses.
The ability to understand and solve problems involving numbers, mathematics, and logical calculations.
Reducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.
Errors or degradation in model output that occur as a side effect of reducing precision through quantization.
The loss of accuracy that occurs when converting model weights or activations from high precision to lower precision formats.
Fine-tuning a model while simulating low-precision arithmetic to maintain accuracy after quantization.
A training technique where a model learns to maintain performance even when its weights are compressed to use less memory and compute.
A technique that reduces a model's size and memory usage by storing weights with lower precision (fewer bits), trading some accuracy for efficiency.
Training a neural network while keeping weights and activations in reduced precision formats.
A quantum neural network that learns to compress and reconstruct quantum data, useful for noise reduction and data purification.
Method of converting classical data into quantum states for processing by quantum circuits.
Using measurement results to adjust quantum system parameters in real-time to achieve desired outcomes.
The process of determining a quantum system's state from measurement data collected over time.
Optimization algorithms that approximate Newton's method using gradient information instead of full second derivatives.
A model that converts search queries into numerical representations (embeddings) that can be compared against a database of documents to find relevant matches.
Adding related or predicted terms to an original query to improve retrieval coverage and recall.
A classification system categorizing what users actually want when they search.
A step-by-step execution strategy that breaks down a user request into executable operations.
Improving a user's search query to better match their intent and retrieve more relevant results.
A neural network component that acts as a bridge between an image encoder and language model, learning to extract and translate visual information into text-compatible representations.
The coarsest abstraction of a POMDP that preserves an agent's decision-making ability given its computational capacity.
A training technique that encourages a model to make consistent predictions across different random variations.
An equivalence relation on rational points of algebraic varieties measuring when points are connected by rational curves.
Retrieval-Augmented Generation — a technique that grounds model responses in retrieved documents to improve accuracy.
A technique that retrieves relevant documents or information from a database before generating a response, improving accuracy by grounding answers in real data.
A system that retrieves relevant documents or information from a database and feeds them to a language model to generate more accurate and grounded responses.
Systems combining retrieval of external documents with language generation for accurate answers.
Setting a model's weights to random values before training, creating an untrained model that produces meaningless output.
A dimensionality reduction technique using random matrices to efficiently approximate high-dimensional data with linear complexity.
A research method where participants are randomly assigned to use AI or not, to fairly measure the AI's actual impact.
A model whose weights have been set to random values instead of being trained on data, resulting in no learned patterns or knowledge.
Model parameters set to random values instead of being learned from training data, resulting in unpredictable and meaningless outputs.
A technique that uses wireless signals to measure both the distance to an object and how fast it's moving toward or away from you.
The relative ordering of values from smallest to largest, independent of their actual magnitudes.
A mathematical simplification that captures the dominant direction of change in a high-dimensional space using a single vector.
The process of ordering search results by relevance, determining which documents best match a user's query.
A statistical method that jointly estimates solver ability and problem difficulty from performance data.
A technique that takes an initial set of search results and reorders them by scoring their relevance to a query, typically to improve the quality of top results.
An agent framework that alternates between reasoning steps and tool actions to solve tasks.
Mathematical models describing how substances spread and chemically react over space and time.
Mathematical models describing how substances spread and chemically react over space and time.
How easily a patient can understand medical text, often measured by grade-level complexity metrics like Flesch-Kincaid.
A specialized model in a pipeline that processes and analyzes text passages to extract specific information, in this case identifying relationships between entities.
A scheduling approach that runs whichever task is ready first, rather than following a fixed predetermined order.
AI task where a model answers questions based on provided text passages.
Processing and generating predictions on data as it arrives, with minimal delay, rather than in batches.
The ability to access and incorporate current information from the web or live data sources rather than relying solely on training data from a fixed point in time.
The ability to query current web information during inference, allowing a model to access and use the latest data when answering questions.
The ability to search the internet during inference to retrieve current information rather than relying only on knowledge from training data.
The model's ability to work through multi-step logical problems and provide justified answers rather than just pattern-matching.
A model's capacity to work through complex problems step-by-step and draw logical conclusions from information.
An AI component designed to work through complex problems step-by-step, often as part of a larger system that coordinates multiple agents.
The model's ability to work through multi-step problems methodically and show its thinking process rather than jumping to answers.
A model's ability to work through multi-step logical problems and produce coherent explanations for its answers.
The model's ability to perform complex logical thinking and problem-solving tasks beyond simple pattern matching.
A step-by-step explanation of how a model arrives at an answer, showing its intermediate thinking before the final result.
A sequence of logical steps a model follows to work through a problem methodically rather than jumping directly to an answer.
A model's ability to perform complex multi-step logical thinking and problem-solving; typically increases with model size.
Teaching a model to mimic the step-by-step reasoning process of a teacher model or reference solution.
A configurable setting that controls how much computational time a model spends thinking through a problem before generating its response.
The core component of a model that performs step-by-step logical thinking and problem-solving before generating a response.
The degree to which a model's intermediate reasoning steps logically support and justify its final answer.
A special mode where the model takes extra time to think through problems step-by-step before answering, rather than responding immediately.
A model trained to show explicit step-by-step reasoning and problem-solving logic before producing final answers, rather than jumping directly to conclusions.
The internal process a model uses to think through a problem step-by-step, integrating information and tool outputs to arrive at conclusions.
An internal step where the model thinks through a problem before generating its final answer, allowing it to work through complex logic more carefully.
A reusable pattern or strategy distilled from past problem-solving that guides future reasoning.
An explicit intermediate thinking phase where the model works through a problem before generating its final answer, improving accuracy on complex tasks.
A problem that requires a model to work through logical steps, analyze information, and draw conclusions rather than simply retrieving facts.
Problems that require a model to think through multiple steps logically to arrive at an answer, rather than just pattern-matching.
Image generation where the model actively infers implicit user intents from text descriptions rather than literal interpretation.
The visible record of a model's intermediate thinking steps and logic, allowing users to inspect how the model arrived at its conclusion.
A recorded sequence of steps and intermediate outputs from a model's reasoning process.
A retrieval method that uses an agent's explicit reasoning steps alongside its query to find more relevant documents.
A model specifically trained to work through multi-step logical problems methodically rather than generating quick responses.
Retrieving evidence that supports downstream reasoning tasks, beyond simple topical similarity matching.
A model designed to allocate extra computational resources to logical problem-solving and step-by-step analysis rather than raw speed or breadth of knowledge.
A model architecture optimized to work through problems step-by-step using logical inference rather than relying primarily on pattern matching from training data.
Training methods designed to improve a model's ability to work through multi-step logic and solve complex problems systematically.
A memory component that allows a looped model to access and use information from previous iterations.
The region of input data that a neuron responds to or influences.
The formal measure of what patterns and languages a model can recognize or distinguish.
An adversarial technique that attempts to recover original sensitive inputs from transformed or encoded representations.
The difference between original data and its reconstructed version from an autoencoder, used to identify anomalies or unusual patterns.
An agent's ability to recognize mistakes, backtrack, and explore alternative solutions when initial approaches fail.
The number of shots between two appearances of the same entity in a video sequence.
A neural network design where information flows in loops, allowing the model to process sequences step-by-step while maintaining memory of previous inputs.
Neural networks with loops that process sequences by maintaining memory of past inputs.
A feedback mechanism where outputs reinforce or modify previous states over time.
A hybrid neural network design that combines recurrent processing (which maintains memory across sequences) with attention mechanisms, enabling better memory efficiency than standard transformers.
A neural network design that combines recurrent elements with other architectural components to process sequential data more efficiently than standard transformers.
How many levels deep a rule can nest within itself before performance degrades.
Iteratively applying the same computation multiple times with parameter sharing to increase model depth without adding parameters.
A hierarchical analytical framework that characterizes algorithmic thresholds by recursively decomposing solution spaces.
Errors that accumulate when a model must repeatedly apply the same reasoning step across multiple sequential decisions.
Adversarial testing where security experts attempt to find vulnerabilities by attacking a system like an attacker would.
Adversarial testing where a team attempts to find vulnerabilities by simulating attacks or malicious behavior.
A simplified version of a complex system that captures essential behavior with fewer variables.
An attention mechanism that conditions generation on a reference input by processing reference tokens alongside generated tokens.
The process of identifying what a reference (like a variable name) points to in a program.
Confirming that citations in a paper are accurate, exist, and actually support the claims made about them.
A process where AI systems review past results, identify errors, and extract generalizable patterns to improve future performance.
The process of an agent analyzing its past actions and environment feedback to extract lessons for improving future behavior.
A transformer-based model design that uses locality-sensitive hashing and reversible layers to efficiently process long sequences with reduced memory requirements.
A safety mechanism built into a model that causes it to decline responding to certain types of requests, typically those deemed harmful or inappropriate.
The ability to identify when a model declines to answer a request, which can indicate the model recognized a harmful or unsafe prompt.
The learned behavior that causes a language model to decline harmful requests.
Built-in safety features that cause a model to decline responding to certain types of requests, such as those involving harmful, illegal, or unethical content.
Identifying distinct market states or conditions (e.g., stable vs. volatile) to apply different prediction strategies appropriately.
Using specific abnormal regions from historical cases as evidence to support diagnosis of new cases.
The ability to analyze and understand specific areas or sections of an image rather than just the image as a whole.
The performance difference between a model's ability to understand cropped regions versus full images.
Learnable placeholder tokens added to transformer inputs to absorb and stabilize problematic activations without affecting semantic content.
When a fix or change breaks functionality that was previously working, causing previously-passing tests to fail.
Identifying when code changes break previously working functionality.
The cumulative difference between an algorithm's performance and the best fixed action in hindsight.
A generalized linear model with penalty terms added to prevent overfitting and improve prediction on new data.
A training method where a model learns by receiving rewards or penalties for its outputs, encouraging it to improve its behavior over time.
A training technique where human evaluators rate model outputs, and the model learns to produce responses that humans prefer.
Training a model using reward signals derived from the model's own internal representations rather than external labels.
A post-training approach for language models using rewards that can be objectively verified, like correctness on benchmarks.
A task where a model identifies and extracts meaningful connections between entities in text, such as which drugs treat which diseases.
A tuning parameter in ADMM that controls how aggressively the algorithm updates variables, affecting convergence speed.
The process of ordering search results by how well they match a user's query, with the most relevant results appearing first.
Assigning a numerical score to indicate how well a document matches or answers a given query.
A neural network using rectified linear unit activations, which can be exactly embedded in mixed-integer linear programs.
Rewriting a model's weights in a different mathematical form to improve training efficiency or stability.
Storing and retraining on samples from previous tasks to prevent forgetting during continual learning.
Systematic skew in data caused by what people choose to record or report.
The ability to understand and reason about code across multiple files and folders in a codebase, not just isolated code snippets.
Training a model to convert raw data into meaningful internal representations useful for downstream tasks.
A model trained to convert raw input (like music or text) into meaningful numerical patterns that capture important features, rather than generating direct outputs like text or classifications.
The high-dimensional mathematical space where a model internally encodes and processes information about text.
A property where similar inputs map to similar representations, promoting stable and coherent internal states.
The tendency of different neural networks to learn similar internal representations despite differences in architecture or training.
The geometric structure of how neural networks organize and represent information in their learned feature spaces.
Systematic misrepresentation or stereotyping of groups in generated content that reinforces harmful social biases.
The internal geometric structure of how a model encodes and processes information.
How consistently a model produces similar embeddings across different training runs with different random seeds.
The ability to recreate the same results by using the same training data, methods, and documentation.
A mathematical space where kernel methods operate, allowing complex pattern matching through implicit feature transformations.
The process of analyzing an incoming query to determine its type, complexity, or intent so it can be handled by the right model or pipeline.
The process of gathering and defining what a system needs to do, typically involving stakeholders and domain experts.
The process of tracking and organizing what a software product needs to do, which AI can help automate.
The process of defining, documenting, and managing software system requirements from stakeholders.
The ability to track how design decisions and parameters connect back to original system requirements and design intent.
A model that takes an initial set of search results and reorders them by relevance, typically used to refine results from a faster but less accurate retrieval system.
A technique that takes an initial set of search results and reorders them by relevance score, typically to improve the quality of top results.
The internal neural signals in a model after subtracting baseline activity, revealing task-specific processing.
A neural network architecture that uses skip connections to allow information to bypass layers, making it easier to train very deep networks and improving performance.
A learned correction layer that outputs small adjustments on top of a baseline controller.
The main information pathway flowing through transformer layers, carrying accumulated representations from previous computations.
Learning the differences between consecutive states rather than full states, reducing compression complexity for time-evolving data.
The differences between consecutive data snapshots, which are often smaller and easier to compress than full snapshots.
Hardware with limited memory, processing power, or battery life, requiring models to be optimized for efficiency.
The diversity of outputs a model produces; high entropy means varied solutions, low entropy means repetitive ones.
A task-level mechanism that dynamically adjusts output length based on query complexity to balance reasoning depth with directness.
Techniques to make model outputs more consistent and reliable, such as constraining output format or adding classification heads.
The tendency of LLMs to generate responses following predictable structural patterns rather than varied approaches.
Checking whether a published paper has been withdrawn or retracted from the scientific record due to errors or misconduct.
The process of finding and returning relevant documents or information from a database based on a query.
Training technique that supplements data by finding and using similar examples from a database to improve model generalization.
When a system preferentially retrieves sources in certain languages or regions, limiting access to diverse information.
A model designed to find and rank the most relevant documents or passages from a large collection based on a query.
Tuning a model specifically to find and rank relevant documents or passages in response to a query, rather than generating new text.
A direct mechanism to access and retrieve stored information (like visual embeddings) independent of sequence position.
A system that finds and ranks relevant documents or information in response to a query, often used in search and question-answering applications.
A system that finds and returns the most relevant documents or information from a large collection based on a user's query.
Finding the most relevant documents or text passages from a large collection based on a user's query.
A technique that enhances AI systems by first searching for relevant information from a database before generating responses, improving accuracy and relevance.
A technique that allows a model to search and reference external documents or knowledge bases to answer questions more accurately and with citations.
A model specifically trained to find and rank relevant documents or passages in response to search queries, rather than generate new text.
A task where the model needs to search through and extract relevant information from large amounts of text, rather than generating new content from scratch.
Explaining predictions by showing which historical cases the model referenced when making a decision.
Training a larger model from a smaller one to test whether capability differences are real.
A measure of how different one distribution is from another, penalizing missing modes.
A technique that reverses the gradient updates made during training to remove learned information about specific data.
A function that assigns numerical scores to model outputs, guiding the learning process toward desired behaviors in reinforcement learning.
When an agent exploits loopholes in the reward system to maximize score without actually solving the intended task.
A candidate reward function generated by an LLM whose utility for training depends on policy competence and training phase.
A learned function that predicts how good an action or outcome is, used to guide policy improvement.
Training a model to predict human preferences so it can score outputs and guide AI training through reinforcement learning.
Improving model outputs by defining a reward function that scores quality and using it to guide learning toward better solutions.
Feedback that tells an AI agent how well it performed on a task, guiding learning.
The structure and distribution of reward signals across different tasks, which can vary significantly in multimodal learning.
A measure of how reward quality and model confidence vary together, used to adjust training baselines.
A benchmark task where an agent can achieve high scores without actually solving the intended problem.
Mathematical framework for studying curved spaces and their intrinsic properties, used here to analyze neural representation structure.
Investment returns measured relative to the risk taken, balancing profit with stability.
A preference for certain outcomes over uncertain ones with the same expected value, often modeled using exponential utility.
Reinforcement Learning from Human Feedback — a training technique that aligns model outputs with human preferences.
A layer normalization technique that normalizes activations using root-mean-square statistics.
A neural network that processes sequences and outputs predictions in real-time streaming.
A transformer-based neural network architecture optimized for understanding language through masked language prediction during training.
A transformer-based neural network design that learns to understand language by predicting masked words in text, improved upon the original BERT model.
The ability to understand and execute physical tasks involving grasping, moving, and interacting with objects in the real world.
The process of determining sequences of actions and movements that a robot should execute to accomplish a physical task.
Combining updates from multiple sources in a way that resists manipulation by malicious participants.
A model's ability to maintain accurate predictions on new data while resisting adversarial perturbations.
A system's ability to maintain performance when inputs are corrupted, noisy, or different from training conditions.
Testing how well an agent maintains performance when faced with errors, variations, or unexpected conditions.
A security system that restricts what different users can do based on their assigned role (e.g., admin, viewer, editor).
Multi-agent architectures where different components (proposer, executor, checker, adversary) have distinct responsibilities to reduce correlated failures.
Reverting a system to a previous saved state, undoing recent changes.
Combining supervision from multiple generated sequences (rollouts) to create more stable training signals.
Robot Operating System 2, a middleware framework for building robot software with standardized communication patterns.
A positional encoding method that encodes position information as rotations in the embedding space.
The mechanism that decides which specialized sub-networks (experts) should process each input in a mixture-of-experts model.
The decision-making component in a mixture-of-experts model that determines which experts should process each input token.
A lightweight model that analyzes incoming requests and directs them to the most appropriate downstream model or system rather than processing them directly.
The computational cost added by the mechanism that decides which experts should process each input in a mixture-of-experts model.
A lightweight decision mechanism that determines which computation path to take based on input conditions.
A scoring guide that defines criteria and quality levels for evaluating student work or AI-generated responses.
Automatically creating evaluation criteria and scoring guidelines that judges use to assess output quality.
An explicit agreement between components defining inputs, outputs, and behavior expectations during execution.
The ability for different systems to work together and exchange data dynamically during execution.
Unpredictable differences in how long computation or communication takes due to system conditions, network congestion, or hardware differences.
Intentional introduction of subtle flaws in code that produce misleading results while appearing correct.
A safe, fast file format for storing model weights, designed to prevent code execution vulnerabilities.
A secure and efficient file format for storing model weights that prioritizes safety and speed when loading models.
Training techniques used to make a model refuse harmful requests and behave responsibly, reducing the risk of misuse.
A machine learning task that assigns content to categories based on whether it poses safety risks or harms.
A machine learning model trained to identify and flag harmful, inappropriate, or policy-violating content in text.
Rules or limits that ensure a learning system operates within acceptable bounds and avoids harmful actions.
The process of testing and assessing whether a model produces harmful, unsafe, or undesirable outputs.
Built-in guardrails in a model that prevent it from generating harmful, illegal, or unethical content by refusing certain requests.
Built-in constraints that prevent a model from generating harmful, offensive, or inappropriate content in its responses.
Built-in restrictions or filters that prevent a model from generating harmful, illegal, or unethical content.
A specialized AI model trained to identify and classify unsafe, harmful, or policy-violating content rather than generate general responses.
The process of training a model to decline harmful requests and avoid generating unsafe content by using specially curated training data and techniques.
A training process that teaches a model to refuse harmful requests and avoid generating unsafe content by reinforcing safer behaviors.
A model trained to avoid harmful outputs and refuse unsafe requests, making it more cautious and responsible in its responses.
AI systems deployed in high-risk domains like aviation where failures can cause serious harm or loss of life.
How noticeable or important something is to a model or person's attention.
Measuring feature changes while prioritizing visually important regions, ensuring quality preservation in salient areas.
The task of automatically identifying and locating the most visually prominent or important objects in an image.
The number of environment interactions (samples) an algorithm needs to learn a good policy.
How well a model learns from a small amount of training data.
The number of times per second that an audio signal is measured and recorded; 44kHz means 44,000 samples per second, a standard for high-quality audio.
A technique that directs different training examples to different optimization methods based on their characteristics or correctness.
Choosing which training examples to use based on criteria like loss, confidence, or other quality metrics.
The process of discovering and visiting different regions of possible outputs during model training.
Control systems where inputs are updated at discrete time intervals rather than continuously.
The sequence of states visited by a model when generating a single sample, showing the path taken through the sample space.
An isolated execution environment that restricts what a program can access on the host system.
Running agent actions in an isolated environment to prevent them from accessing or damaging other systems.
A specialized architecture that extends BERT to efficiently generate sentence-level embeddings optimized for semantic similarity and clustering tasks.
A specialized neural network design that transforms sentences into meaningful vector representations by using a transformer model paired with pooling techniques to capture semantic meaning.
Quantizing each weight independently using the same quantization grid, simpler than vector quantization.
Converting a multi-objective problem into a single-objective problem by combining objectives with weighted sums.
A mathematical framework that analyzes images at multiple resolutions to reveal hierarchical information.
How a model's performance and capabilities change as you increase its size, training data, or computational resources.
A mathematical relationship describing how model performance changes with scale (size, data, compute).
Patterns that describe how a model's performance improves as you increase its size, training data, or compute resources.
The study of how model performance changes as you increase the number of parameters, training data, or compute resources.
A collection of models of different sizes trained identically to study how capabilities improve as models grow larger.
The sequence of fixation points and saccades that represent where and how a person's eyes move while viewing an image.
A collection of test scenarios used to evaluate model safety, specific to a language, sector, or regulatory regime.
A safety evaluation method using predefined test scenarios and a rubric, judged by human or automated evaluators.
A structured representation of a scene using nodes for objects and edges for spatial relationships between them.
A task completion metric that requires not just finishing an action but leaving the environment usable for future tasks.
Assigning tasks and resources to specific times and locations to optimize execution efficiency.
Information about a database's structure (tables, columns, relationships) provided to the model to help it generate correct queries.
Incompatibility between data formats when different services exchange information.
Changes to the structure or format of data that can cause AI models to fail or perform poorly.
A predefined template or structure that defines what information the model should extract and how it should be formatted in the response.
A correction term added during the reverse process to guide noise removal toward realistic data.
A model designed to assign numerical scores to inputs (like relevance scores for passages) rather than generate new text.
An attention mechanism that evaluates each key against an explicit threshold to determine relevance, rather than redistributing fixed attention mass across all keys.
A model's ability to recognize and process different writing systems (like Devanagari or Tamil scripts) rather than just Latin characters.
The special Euclidean group symmetry combining 3D rotations and translations, common in molecular structures.
A language model enhanced with the ability to retrieve and incorporate live information from the web before generating responses.
Automated identification of abnormal brain activity patterns that indicate a seizure event.
A technique where only a subset of a model's weights are used for each input, rather than activating all parameters, which reduces memory usage and speeds up inference.
An advanced SSM variant that dynamically selects which information to process at each step, improving performance on complex tasks while maintaining efficiency.
An enhancement to state space models that allows the model to selectively focus on relevant information in a sequence, improving efficiency for long-context tasks.
A model's ability to evaluate and report on its own behavior, capabilities, or alignment with intended values.
A mechanism that lets a model focus on different parts of input data to understand relationships between them.
A generative model that uses its own previous outputs to guide learning of different behavioral patterns.
A technique where a model generates multiple responses and uses agreement among them to improve answer reliability.
Reasoning behavior allowing video models to recover from incorrect intermediate solutions during the denoising process.
An AI system's ability to evaluate and correct its own outputs without external feedback.
A training method where a model learns from its own predictions at the token level, providing fine-grained feedback.
Error-correcting codes where the code equals its dual, used in data transmission and storage.
The ability of an AI system to improve its own capabilities over time through experience.
Running a model on your own servers or computers instead of using a cloud service, giving you full control and privacy.
A model that can be downloaded and run on your own hardware or servers instead of relying on a company's cloud service.
Running a model on your own hardware and infrastructure instead of relying on a company's servers or API.
Running a model on your own hardware or servers rather than accessing it through a cloud service or API.
Running a model on your own hardware or servers instead of relying on a company's cloud service.
A signal processing technique that removes unwanted reflections of your own transmitted signal to isolate target signals.
Training method where a model plays against itself or generates both solutions and evaluations, risking the model learning to exploit itself.
A model's tendency to resist shutdown or replacement, prioritizing its own continued operation over objective utility.
The process where a system autonomously evaluates and improves its own outputs without external human feedback.
An agent's ability to explain and reason about why its actions are good or bad.
The ability of a model to autonomously diagnose and correct misalignments in its own generated outputs.
A training approach where a model learns patterns from unlabeled data by creating its own learning targets, such as predicting hidden parts of the input.
Training a model on unlabeled data using the data itself to create learning signals, without manual annotations.
When a model explicitly states its confidence level in natural language rather than through probability scores.
A standardized text-based format for representing molecular structures that is designed to be more robust and easier for AI models to process than other chemical notations.
The degree to which a model accurately matches the meaning of a query with the meaning of relevant passages or documents.
The set of distinct meanings or concepts an agent can represent and communicate, derived from its computational constraints.
Adding meaningful labels and metadata to data (like object type, function, or properties) to make it more useful for learning.
Evaluating whether a model's answer is correct based on meaning rather than exact word-for-word matching.
Creating diverse variations of inputs that preserve meaning while changing surface-level patterns.
The range or diversity of meanings a word can have across different contexts.
A technique that stores and reuses previous responses for new queries that have similar meaning, reducing redundant computation.
The degree to which different parts of text or data are logically consistent and meaningfully related.
Focusing training data on high-quality, semantically rich examples rather than maximizing data volume.
The ability to precisely control what a model generates based on specific semantic requirements in the input.
Whether a formal expression correctly captures the intended meaning, not just whether it follows grammatical rules.
Meaningful textual or visual signals that convey information about context or intent.
Breaking down complex text into smaller, structured units that capture distinct meanings or concepts.
The orientation of a word's meaning in vector space, independent of its magnitude.
A measure of how conceptually different or unrelated two ideas, domains, or concepts are from each other.
A training method that transfers high-level meaning and concepts from one model to another while preserving semantic correctness.
A technique that converts text into numerical vectors that capture the meaning of words and phrases, allowing computers to understand which texts are similar in meaning.
Numerical representations that capture the meaning of text or audio, allowing the model to understand that similar concepts are close together in this representation space.
The process of converting the meaning of text into numerical vectors that preserve relationships between similar concepts.
Two implementations produce identical behavior and results despite differences in code or architecture.
How accurately a model's output preserves the core meaning and medical content of reference physician responses.
The biological or social gender meaning of a word, independent of grammatical requirements.
A training method that uses semantic tasks like image segmentation to align visual understanding and generation in multimodal models.
Anchoring generated content to meaningful concepts from language, ensuring parts align with their textual descriptions.
Meaningful content or context extracted from an image, such as objects, scenes, or relationships between elements.
The property that an AI system produces consistent outputs when given semantically equivalent inputs phrased differently.
Assigning meaningful category labels to data (like 'construction phase' or 'operational') rather than just detecting presence.
The component that interprets natural language or high-level intent into structured, machine-readable representations.
The process of finding text that has similar meaning, rather than just matching keywords, by comparing their vector representations.
The actual meaning or concept behind words and sentences, rather than just their literal characters or structure.
Predicting which 3D spatial locations are occupied and what semantic class (car, pedestrian, etc.) occupies them.
The degree to which different representations capture similar high-level meaning or concepts.
Converting natural language into a structured logical form a computer can understand.
The meaningful connections between concepts or texts based on their actual meaning, rather than just matching keywords.
A numerical encoding that captures the meaning and context of text rather than just its surface-level words, enabling the model to understand that similar concepts have similar representations.
How well selected items cover the full range of visual concepts and meanings in a video.
Finding relevant documents based on meaning rather than exact keyword matches, using embeddings to understand what text is about.
A search method that finds results based on the meaning of text rather than just matching keywords, using embeddings to understand intent.
Dividing video or images into meaningful regions and assigning labels to understand what each region represents.
A measure of how closely related two pieces of text are in meaning, regardless of whether they use identical words.
Finding similar items by comparing their learned meaning representations rather than exact text or keyword matches.
A mathematical space where similar meanings are positioned close together, allowing the model to understand relationships between concepts.
An AI task focused on understanding the meaning of text, such as finding similar documents or matching related concepts.
A task that measures how closely two pieces of text match in meaning, regardless of whether they use the same words.
Grouping tokens with similar meanings together to assess whether a model's prediction is semantically coherent.
Assigning meaningful categories or relationship types to entities in a graph to capture their semantic meaning.
The ability to grasp the actual meaning and context of text, rather than just matching keywords.
A numerical representation of text where similar meanings are positioned close together in mathematical space, enabling similarity comparisons.
A numerical encoding of text where similar meanings are positioned close together in mathematical space, enabling the model to understand relationships between concepts.
Numerical representations where the distance and direction between vectors reflect the meaning and similarity between pieces of text.
Checking that code produces outputs matching geographic and domain-specific rules, not just syntactic correctness.
A technique that embeds hidden, imperceptible markers into text embeddings to track ownership or detect unauthorized use.
Code modifications that don't alter program behavior, like renaming variables or reformatting.
Training using both labeled and unlabeled data to improve learning efficiency.
Datasets combining real-world features with simulated outcomes to enable controlled testing with realistic inputs.
A photonic component that amplifies optical signals using stimulated emission in a semiconductor material.
Combining data from multiple sensors (radar, lidar, camera) to create a more accurate perception of the environment.
A technique that converts entire sentences or passages into fixed-size numerical vectors that capture their semantic meaning, enabling comparison of text similarity.
Dense numerical representations of entire sentences that capture their semantic meaning, allowing comparison of how similar different sentences are.
A model that converts text sentences into numerical vectors (embeddings) that capture their semantic meaning, enabling comparison of how similar different sentences are.
A type of model architecture designed to convert entire sentences or passages into meaningful embeddings that can be compared for similarity.
A framework that fine-tunes transformer models to produce meaningful embeddings of entire sentences or paragraphs, rather than just individual tokens.
A neural network design optimized for converting sentences and short texts into meaningful vector embeddings that preserve semantic relationships.
A machine learning task designed to work with individual sentences rather than longer passages, focusing on understanding meaning within a single sentence's scope.
Automatically detecting and measuring positive, negative, or neutral emotions expressed in text.
A neural network design that explicitly decomposes complex mappings into lower-arity, factorizable components to exploit underlying structure.
A task where a model reads input text and assigns it to a category or produces a score, rather than generating new text.
A technique that reduces the length of input data while preserving its essential meaning, making processing faster and requiring less memory.
The task of producing new sequences (in this case, protein sequences) by predicting one token at a time based on previously generated tokens.
The task of learning patterns in ordered data (like text) where each element depends on previous elements.
A learned encoding that captures the structural and functional information contained within a protein sequence in a format useful for analysis.
A model architecture that takes a sequence of input tokens and produces a sequence of output tokens, commonly used for tasks like translation and summarization.
A method for tracking probability distributions over time by resampling weighted particles.
The ability to solve problems by working through steps in a strict left-to-right order, where each step depends on the previous one.
A recommendation task that predicts the next item a user will interact with based on their historical sequence of interactions.
Making decisions about data flow based on a sequence of past interactions rather than single isolated inputs.
A non-human identity used by automated systems, applications, or AI agents to authenticate and perform actions without human intervention.
A combinatorial optimization problem of selecting the smallest subset that covers all elements in a universe.
A duplicate model trained identically to the target model, used as a reference in membership inference attacks.
A quantum circuit with constant or polylogarithmic depth, enabling efficient computation on near-term quantum devices.
A mathematical measure of randomness in text; high entropy suggests randomly-generated domain names.
A method that explains individual model predictions by calculating each feature's contribution using game theory concepts.
A game-theory-based method for explaining AI predictions by measuring each input's contribution to the model's decision.
A system where both a human operator and autonomous system contribute to controlling a robot, dividing tasks based on capability.
A common mathematical space where different types of data (text and audio) are represented so that related concepts from each type are positioned near each other.
The speed at which data can be read from and written to a GPU's fast, limited-size shared memory.
Common learned features used across multiple tasks in a neural network.
A system design where independent modules communicate through a central shared context, enabling cross-module reasoning and synchronized actions.
A single embedding space where text from multiple languages is represented, allowing direct mathematical comparison of meaning between languages.
A novel measure of loss landscape geometry based on the Hessian's fractal dimension, predicting generalization better than trace or spectral norm.
A graph showing how different frequencies in a system respond to sudden acceleration or impact.
When a model learns superficial correlations instead of the underlying concepts, causing poor generalization.
The total number of times a quantum circuit can be executed to gather measurement statistics on quantum hardware.
A neural network architecture with two identical branches that learn shared representations for comparison.
A training method that aligns images and text by learning to match their representations, using a sigmoid loss function instead of the traditional softmax approach.
Carefully chosen sample points used to represent the probability distribution of a system's state in filtering algorithms.
The gradual loss of useful information as it passes through many layers of a neural network.
A formal language for specifying time-dependent constraints like "reach goal within 10 seconds" or "avoid obstacles until task completion."
A measure of audio quality comparing the strength of desired speech to background noise.
A metric measuring how much useful information is preserved versus how much error is introduced during quantization.
Adapting a model trained on simulation data to work with real-world experimental data with minimal additional training.
Performance difference when a trained policy transfers between two different environment implementations.
A contrastive learning technique that trains models to recognize when two slightly different versions of the same sentence are similar, improving semantic understanding.
A function that measures how similar candidate actions are based on their learned representations, used to weight policy updates.
A task where you find the most similar items to a query by comparing their vector representations, commonly used in recommendation systems and information retrieval.
A cutoff score that determines whether two pieces of text are considered similar enough to be treated as equivalent.
A model that processes only one type of input (like text) rather than multiple types (like text and images combined).
A model architecture that generates a response in one forward pass through the network, typically faster but potentially less thorough than multi-step approaches.
A matrix factorization technique that decomposes a matrix into components, useful for finding optimal low-rank approximations.
The diagonal values in singular value decomposition that characterize the scaling properties of a matrix.
An iterative method for solving optimal transport problems with entropy regularization to find balanced assignments.
A neural network architecture using sinusoidal activation functions to learn continuous signal representations.
Compressing model information into a compact representation that enables efficient predictions about model behavior.
Mathematical structures that represent preferences as intransitive comparisons across multiple independent dimensions.
A measure of asymmetry in a data distribution, indicating whether values cluster more toward one end.
A reusable memory of learned behaviors organized by granularity level for agent decision-making.
Process of training a model to permanently learn procedural knowledge so it can perform tasks without retrieving external skill resources at inference time.
A technique that builds a map of an environment while tracking the camera's position within it.
A nonlinear control technique that forces a system to follow a desired path by switching feedback signals.
A mechanism that limits attention to a fixed-size window of recent tokens rather than all previous tokens, reducing computational cost while maintaining context awareness.
Compact AI language models designed for speed and efficiency over raw power.
A text-based format that represents the structure of chemical molecules using letters and symbols, allowing molecules to be encoded as strings for computational processing.
Phishing attacks delivered via SMS text messages, typically containing malicious links.
A measure of how quickly a loss function's gradient can change; smaller is better for stable training.
A medical documentation format with Subjective, Objective, Assessment, and Plan sections summarizing patient visits.
Game theory scenarios where individual incentives conflict with collective welfare, like the prisoner's dilemma.
The total utility or benefit summed across all players in a game.
Teaching through guided questioning that helps learners discover answers themselves rather than being told.
A rechargeable battery using sodium ions instead of lithium, offering lower cost and improved sustainability.
A reinforcement learning algorithm that trains agents to maximize both reward and action randomness for stable learning.
A reinforcement learning algorithm that trains agents to maximize both reward and action randomness for stability.
A soft version of the standard IoU metric that uses continuous intensity values instead of binary masks.
A mathematical function that converts attention scores into probabilities that sum to one.
Standard attention mechanism that normalizes scores across all keys into a probability distribution, forcing relative rather than absolute relevance judgments.
Generating multiple candidate solutions to find promising options before selecting the best one.
The model's ability to identify and cite the specific documents or sources it used to generate a response, enabling users to verify claims.
A model's capability to identify and reference the specific documents or sources it used to generate its answer.
The practice of anchoring a model's responses to specific, cited sources rather than relying solely on its training data, improving factual accuracy and verifiability.
A detailed documentation of where evidence came from and how it supports an answer, enabling verification and auditing.
Modifying the actual source code of a system rather than just configuration files or prompts.
Identifying the specific portion or segment of text/video containing a particular claim or concept.
A technique where only a subset of a model's parameters are used for each input, reducing computational cost while maintaining performance.
A model design where not all parameters are used for every computation, reducing memory and computational requirements compared to dense models.
An attention mechanism that only computes interactions between a subset of tokens instead of all pairs, reducing complexity from O(L²) to O(Lk).
A neural network that compresses data into a small number of active features, making patterns easier to interpret.
A tool that finds hidden features in neural networks by learning compressed representations with most values being zero.
Vector representations where most values are zero, allowing efficient storage and computation by only tracking non-zero elements.
Rare or infrequent occurrences in data that are overwhelmed by more common background information.
An architecture where only a subset of the model's specialized sub-networks (experts) activate for each input, reducing computation while maintaining capability.
A model that activates only a subset of its parameters for each input, rather than using all parameters every time, which reduces computational cost.
A mixture-of-experts design where only a small fraction of the model's parameters are used for each prediction, reducing computational cost while maintaining model capacity.
A technique where only a small portion of a model's total parameters are used during inference, reducing computational cost while maintaining model capacity.
A search method that represents text as a high-dimensional vector with mostly zeros, focusing on keyword matching and exact term overlap.
A reinforcement learning setting where the agent receives reward signals only rarely, making exploration particularly challenging.
A reinforcement learning setting where the agent receives feedback infrequently, making learning difficult.
High-dimensional vectors where most values are zero, with only a few active dimensions that correspond to meaningful features, making them memory-efficient and interpretable.
High-dimensional vectors where most values are zero, making them memory-efficient and interpretable compared to dense vectors where most values are non-zero.
Reducing model size by removing or zeroing out less important parameters or weights.
The proportion of zero or removed weights in a neural network, reducing memory and computation.
A technique that uses spatial information to guide which parts of a video frame correspond to which agent or subject.
Connecting language descriptions to specific locations or regions in visual scenes.
When an AI incorrectly imagines objects or details in wrong locations in images.
Variation in characteristics or patterns across different geographic locations, requiring location-specific models.
The ability to understand and reason about the positions, shapes, and relationships of objects in space.
A formal knowledge structure that defines spatial relationships, constraints, and rules for how objects can be arranged.
The model's ability to accurately identify and mark exact pixel-level boundaries and locations of objects in images.
A geographic relationship test (e.g., 'contains', 'intersects') that validates whether spatial objects satisfy required topological conditions.
The ability to understand and reason about the location, size, and relationships between objects in an image.
A learned encoding that captures the layout, objects, and visual features within individual frames or regions of a video.
A model's ability to apply learned knowledge to new physical layouts or configurations.
The ability to perceive and reason about the positions, distances, and relationships between objects in 3D space.
A criterion to halt iterative refinement when spatial entropy drops below a threshold, preventing over-refinement.
Processing that considers both spatial location and temporal changes over time.
Attention mechanism that processes both spatial (image) and temporal (time) dimensions to understand relationships across frames.
Rules that specify where a robot must be and when, combining spatial location requirements with time deadlines.
Understanding patterns that vary across both space (location) and time simultaneously, like traffic flow across a road network.
Dynamical systems that change across both space and time.
Aligning sensor data across space and time so different sensors (cameras, LiDAR) produce consistent 3D representations.
Reducing both spatial and temporal dimensions of video frames to decrease memory usage while preserving important information.
Internal patterns the model learns that capture both spatial information (what things look like) and temporal information (how they change over time).
A neural model that converts audio into a fixed-size embedding representing a speaker's identity, independent of what they say.
The ability to identify and distinguish between different speakers in an audio recording.
A task that identifies or confirms whether audio was spoken by a specific person, using characteristics unique to that person's voice.
An AI model designed to excel at a single, narrow task rather than perform many different tasks like a general-purpose model.
Lightweight AI models trained for specific evaluation tasks rather than general-purpose assessment.
Additional training on a model to make it excel at specific tasks, like code generation, rather than general conversation.
A language model trained specifically for one domain or task (like math) rather than general-purpose use across many topics.
A language model trained specifically to excel at one task or domain (like mathematics) rather than performing well across many different tasks.
Training a model to excel at specific tasks (like invoice processing) rather than performing well across many different domains.
A design approach where explicit specifications serve as contracts between designers and tools, maintaining traceability from requirements to implementation.
RL methods that use formal specifications to guide agents toward complex, temporally extended goals.
Loss of detail at high frequencies when training models with MSE loss on spherical data.
A loss function that adjusts training to improve frequency-domain accuracy in predictions.
Techniques that use eigendecomposition of graph or mesh structures to extract positional information for neural networks.
The largest singular value of a matrix, representing its maximum scaling effect on vectors.
Characteristics of an image's frequency content, describing how much detail appears at different scales.
A range of eigenvalue properties that determines how stable and well-behaved a neural network's computations are.
The amount of wireless frequency resources needed in a specific location and time period.
A property that maintains the important mathematical characteristics of a matrix during transformation.
The number of tokens a draft model proposes in each speculation step before the target model verifies them.
A technique where a smaller model quickly drafts multiple token predictions ahead of time, which a larger model then verifies, reducing the total time needed to generate text.
The ability to process and comprehend spoken language or audio signals, converting them into meaningful interpretations or responses.
Numerical representations of audio that capture the meaningful features of speech in a compact form, useful for tasks like speaker identification or speech similarity.
The ability of a model to convert spoken audio into written text.
A learned numerical encoding of audio that captures meaningful speech patterns and can be used as input for other AI tasks.
A neural network trained to convert raw audio into meaningful vector representations that preserve information about speech content and speaker identity.
An AI model that can process and understand spoken audio directly, without needing to convert speech to text first.
The process of converting spoken audio into written text.
Generating videos where motion is produced at a specified playback speed or temporal rate.
Theoretically maximum performance a GPU kernel can achieve given hardware constraints like memory bandwidth and compute capacity.
A model designed and tuned to prioritize fast response times over maximum accuracy or depth of analysis.
The task of identifying and correcting spelling errors and character mistakes in text.
A neural retrieval method that combines transformer models with sparse, interpretable outputs by mapping embeddings directly to vocabulary tokens.
A neural network architecture where different layers run on different machines to preserve privacy during federated training.
An AI model that understands spoken input and generates spoken responses for interactive conversations.
A token inserted during generation (e.g., <10.6 seconds>) that helps a model track elapsed speaking time.
A mathematical property ensuring small changes in training data cause proportionally small changes in model outputs.
Techniques added to numerical solvers to prevent unrealistic oscillations when simulating fast-moving flows.
Combining multiple model predictions using another model to make final decisions.
When GPU stages in a pipeline wait for work that isn't ready yet, even though other executable tasks are available.
A probabilistic graphical model that extends Bayesian networks by grouping variables into stages to capture context-specific conditional dependencies.
Adjusting microscope images to remove color variations from staining differences.
Informing a judge about the downstream consequences its verdicts will have, which can corrupt its assessments.
Maintaining persistent, durable project state (code, results, logs) that agents can reliably access and build upon.
The process of inferring the current condition of a system (like position or velocity) from noisy sensor measurements.
Conditions that must always be true about a system's internal state to ensure correct behavior.
A continuous, lower-dimensional representation of all possible states an object can occupy.
The set of all possible configurations or conditions an agent can be in, including its needs, sensations, and environment.
A type of neural network architecture that processes sequences by maintaining and updating an internal state, offering an alternative to transformer-based attention mechanisms.
A neural network architecture that processes sequences by tracking hidden states over time, offering faster inference and lower memory use than traditional transformers.
A model's ability to maintain and update information about context over long sequences, critical for tasks like retrieval and reasoning.
A control system that adjusts outputs based on the current state of the system being controlled.
Learning from observations alone without access to the expert's actual actions or decisions.
An alternative to transformers that processes sequences more efficiently by maintaining a hidden state that gets updated as it reads each token.
Building a 3D scene by maintaining and updating a compact hidden representation as new images are processed.
A system that maintains context and history across interactions, remembering previous attempts and refining goals over time.
Safety checks that evaluate each conversation turn independently without remembering previous interactions.
Automated inspection of code without executing it to detect bugs, security issues, and style violations.
A model configuration where input and output dimensions are fixed at compile time, reducing computational overhead but preventing the model from handling variable-length inputs.
A point where the gradient of a function is zero, indicating a potential minimum, maximum, or saddle point.
A formal, auditable proof that a system's actual failure rate stays below a regulator-defined threshold with high confidence.
The gap between what is theoretically possible (information-theoretically) and what algorithms can efficiently compute.
A pre-computed direction in activation space injected into the model to guide it toward desired behavior without retraining.
Learned vectors added to model activations to steer behavior toward desired outputs without retraining.
A measure of how well a model's score function matches the data distribution's score function.
The process of assessing each individual step in a solution path to identify where reasoning breaks down or becomes incorrect.
A model's ability to decompose a problem into sequential logical steps, making its reasoning process transparent and verifiable.
An approach where the model explicitly works through intermediate reasoning steps before arriving at a final answer, rather than jumping directly to conclusions.
Checking individual reasoning steps for correctness rather than verifying entire sequences at once.
A mathematical constraint that forces a matrix to have orthogonal columns, preserving geometric structure.
Token consumption is random and unpredictable—the same task can require vastly different token amounts across different runs.
A mathematical equation describing how a random process evolves over time with both deterministic and random components.
Systems that evolve over time with both deterministic and random components, like molecular motion.
A mathematical model describing how quantum systems evolve under continuous measurement and random fluctuations.
Optimization methods that use noisy or approximate gradients instead of exact ones to handle large datasets.
An agent's decision rule that assigns probabilities to different actions rather than always choosing a single deterministic action.
Periodically returning a learning process to an initial state with random timing to accelerate optimization.
Randomly drawing values from a probability distribution, used in probabilistic AI for robustness and uncertainty quantification.
Multiple random paths through a model's state space that are aggregated to improve solution quality.
Randomness or unpredictability built into a process or model.
A technique that enables gradient-based optimization of discrete decisions by approximating gradients through discrete operations.
Deliberate planning and decision-making to efficiently solve problems, as opposed to random trial-and-error.
Processing data continuously as it arrives rather than waiting for a complete batch.
Learning from a continuous data stream by converting it into discrete tasks through temporal partitioning.
Making predictions on data in real-time as new information continuously arrives.
The task of converting one sequence of symbols into another sequence according to defined rules.
A function that curves upward uniformly, making optimization easier and faster.
Matching the spatial structure and boundary features from one model with another to improve segmentation precision.
A formal representation of cause-and-effect relationships using graphs and equations to reason about interventions.
A mathematical equation in a causal model that describes how one variable is determined by its parent variables and random noise.
The ability to apply learned principles to new situations with different surface features but similar underlying structure.
When a model learns shortcuts in latent space that violate real-world constraints or environmental rules.
Uncertainty caused by missing or incomplete data, like new users with no history.
A well-organized representation combining multiple components (like theory and code) rather than a single unstructured output.
The process of automatically pulling organized, machine-readable information (like tables or key-value pairs) from unstructured text or images.
Converting unstructured documents into organized, machine-readable formats that preserve tables, sections, and relationships.
The ability to extract and understand organized information from documents like receipts or invoices, where data follows predictable layouts and formats.
The task of pulling specific, organized information from unstructured text and formatting it into a defined structure like JSON or tables.
Automatically converting unstructured text into organized, machine-readable formats like graphs or tables with typed categories.
Organized information with defined categories (like creator, date, origin) rather than free-form text.
Responses formatted in a consistent, machine-readable way (like JSON or XML) rather than free-form text.
The model's ability to generate responses in organized, predictable formats like JSON or XML rather than free-form text.
Removing entire components like neurons or attention heads rather than individual weights.
The ability to follow logical steps and rules systematically to solve problems, often involving breaking down complex tasks into smaller, ordered components.
A measure of uncertainty in the student model's predictions at each token position.
Measurable linguistic characteristics like word choice, sentence structure, and grammatical patterns that distinguish writing styles.
Computational study of writing style patterns, such as sentence length and word choice, to identify how language use changes over time.
Breaking down a complex question into simpler sub-questions that can be answered sequentially.
A specialized, reusable component that handles a specific task within a larger agent system.
Learned latent variables that persistently represent the current state and identity of individual agents in a multi-agent scene.
Assessment based on human judgment and personal criteria rather than fixed, objective metrics.
A mathematical property where adding items to a set yields diminishing returns, enabling efficient greedy algorithms.
A lower-dimensional representation of data that captures the most important directions or patterns.
Measuring how closely related two lower-dimensional feature spaces are to each other.
Breaking words into smaller pieces (tokens) for a language model to process, critical for handling rare words.
A framework that decomposes value functions into basis functions weighted by task-specific coefficients for rapid transfer learning.
Increasing the spatial or temporal resolution of an image or video to reveal finer details.
Exhaustive search for the fastest possible implementation of a program within a defined search space.
A neural network's ability to represent more features than it has dimensions by overlapping them in the same space.
Training a model on labeled examples to adapt it for a specific task or domain.
A training technique where a model learns from human-labeled examples to improve its ability to follow instructions and produce desired outputs.
Automated procedure for generating training examples without manual annotation, including both positive and negative cases.
When AI models manipulate or deceive the oversight mechanisms designed to control them.
A machine learning algorithm that finds the best boundary to separate data into classes by maximizing the margin between them.
The specific spelling, name variant, or linguistic representation used to refer to an entity (e.g., 'USA' vs 'United States').
The property of a model's ability to produce consistent outputs regardless of which surface form or name variant is used for the same entity.
A representation that captures how light reflects off a 3D surface from all viewing angles and lighting conditions.
Specific patterns in text formatting or structure that a model learns to rely on, rather than understanding underlying concepts.
A measure of how unexpected a word is based on context, used to predict reading difficulty.
A simpler approximation of a complex function used to make computation or analysis more tractable.
A fast neural network trained to replace a slow physics simulation or complex model.
A simplified objective function used to approximate the true objective and guide search more efficiently.
Statistical methods for analyzing time until an event occurs, accounting for incomplete observations.
The ability to maintain logical consistency and context awareness across multiple steps or a long sequence of reasoning.
The ability to work through complex, multi-step problems by maintaining focus and logic across many reasoning steps.
Techniques for coordinating and steering large groups of agents or robots as a collective.
A transformer architecture that uses shifted windows to efficiently capture both local and global context in images.
When a model agrees with a user's false or unsupported claims to please them rather than providing accurate information.
Using mathematical logic and algebraic rules to reason about program behavior without executing concrete code.
A technique to discover mathematical equations in human-readable form from data.
A simplified neural network model used to study learning and computational complexity in constraint satisfaction problems.
A formal grammar that defines pairs of related strings simultaneously, used to model translation between two languages.
Sentences with multiple possible grammatical interpretations that require cognitive effort to resolve.
The difficulty of parsing a sentence based on its grammatical structure and ambiguities.
Code that follows the grammatical rules of a programming language so it can be parsed and executed without syntax errors.
A model's understanding of programming language rules and structure, allowing it to produce grammatically correct code.
A satellite imaging technique that uses radar signals to create detailed maps regardless of weather or daylight, useful for monitoring infrastructure.
Artificially generated training data created by humans or other models, rather than collected from real-world sources like the internet.
Using AI agents to simulate realistic user behavior at scale to find bugs and edge cases automatically.
Hidden instructions given to an AI model that define its behavior, tone, and constraints.
The model's ability to consistently follow and respect the instructions given in a system prompt that defines its behavior and constraints.
A model's ability to solve problems in fundamentally new situations beyond its training distribution.
A transformer-based model design that treats all NLP tasks as text-to-text problems, using an encoder-decoder structure to process and generate text.
A smaller, foundational version of the T5 model architecture designed for text-to-text tasks with fewer parameters than larger variants.
Answering questions by finding information across both tables and text documents.
Structured data organized in rows and columns, like spreadsheets or databases.
Pre-trained models designed to work with structured tabular data, capable of handling various tasks without task-specific retraining.
Sensing and interpreting physical contact, pressure, and force information through touch sensors.
The probability of rare, extreme events in the output distribution of a model.
The geometric structure describing all possible directions of motion at each point on a manifold.
A desired probability distribution that a model is trained to match, typically derived from reward signals.
The percentage of correct answers a model produces on a benchmark, measured by standard evaluation metrics.
The process of deciding which tasks are assigned to humans versus AI systems in a workflow.
Breaking a complex problem into smaller, simpler subtasks to solve sequentially.
Methods for combining multiple training tasks to improve model generalization and performance across different reasoning domains.
When multiple learning tasks share similar data distributions or require overlapping knowledge.
The ability of a model to break down high-level instructions into a sequence of actionable steps that a robot can execute.
A reward signal that guides reinforcement learning based on task-specific performance metrics rather than general output patterns.
Directing different training samples to specialized models or objectives based on their characteristics.
When a model is optimized for specific types of problems (like math and science) at the expense of general-purpose versatility.
A formal description of a goal, constraints, and success criteria that an agent must achieve.
A hierarchical structure that organizes different categories or types of a problem into levels.
The difference between a fine-tuned model and its base model, capturing task-specific changes.
Assigning different importance levels to multiple tasks during training.
The ability to adjust a model's behavior for different purposes (like retrieval, clustering, or classification) without retraining, often through lightweight adapters.
A model that works across different types of visual tasks without requiring separate training for each specific task.
Designed with knowledge of the specific downstream task or application that will use the output.
Embeddings that adjust their meaning based on the specific task or query provided, rather than producing the same vector for every use case.
A model that adjusts its behavior based on the specific task or instruction provided, rather than producing the same output for identical inputs.
Adjusting training signals at the task level to encourage specific model behaviors, like longer reasoning chains for complex questions.
Specific requests asking a model to complete a defined goal, like summarizing text or writing code, rather than having a casual conversation.
An AI model optimized to excel at a specific, narrow task rather than performing well across many different types of requests.
Training a model to prioritize completing specific, practical tasks efficiently rather than engaging in open-ended conversation.
Embeddings customized for a particular use case, such as sentiment analysis or document retrieval, rather than general-purpose embeddings.
A model trained and optimized to excel at one particular task (like evaluation) rather than performing well across many different tasks.
Training or fine-tuning a model to excel at a particular task, like translation, rather than trying to perform equally well across many different tasks.
A structured system of categories used to organize and classify different types of harmful content.
Using the same teacher model for both supervised fine-tuning and distillation to avoid gradient bias.
Training technique where the model learns to predict the next token given ground-truth previous tokens.
A large, highly capable model used to train smaller models by transferring its knowledge and skills through a process called distillation.
The disagreement between teacher and student model predictions, indicating where the student is wrong.
The capacity to work through complex logical problems, debug issues, and apply domain-specific knowledge systematically.
Remote control of a robot or machine by a human operator, typically through a joystick or similar interface.
Controlling randomness in AI predictions: higher values make outputs more creative.
The consistency and smoothness of motion and appearance across video frames over time.
Ensuring predictions remain stable and coherent across consecutive time steps.
Understanding how events and changes unfold over time, allowing a model to grasp sequences and predict what happens next in a video or time-series data.
Determining which past actions or decisions are responsible for current outcomes in sequential decision-making.
Relationships between events or measurements across time in sequential data.
A model's ability to make accurate predictions on new data that arrives later in time, even when patterns have shifted.
Anchoring events to precise timestamps or relative time positions in a sequence.
Information about how things change over time, critical for understanding dynamic processes like facial expressions.
The ability to understand and reason about events, sequences, and relationships that occur across time.
Repeated or similar information across consecutive frames in a video that can be safely removed.
How a model encodes and understands time information in sequences, critical for predicting future states from past observations.
A technique that re-aligns positional encodings when tokens are dropped, maintaining coherent temporal ordering.
Dividing data by time so training uses older examples and testing uses newer ones, preventing data leakage.
Converting low-frame-rate, blurry videos into high-frame-rate sequences with fine-grained temporal details.
Aligning events in music and video so they happen at the same time.
An algorithm that constructs evolution chains by tracing how methods progress and branch over time.
The ability to comprehend how things change over time, such as recognizing motion and actions across multiple video frames rather than just single images.
An RL method that updates value estimates using the difference between predicted and observed rewards, combining Monte Carlo and dynamic programming ideas.
Specialized hardware units on GPUs designed to quickly perform matrix multiplication operations used in neural networks.
Breaking down high-dimensional data into products of lower-rank tensors to reduce parameters and improve interpretability.
Splitting a model's computation across multiple GPUs by dividing tensors into chunks processed in parallel.
A computational program that performs operations on multi-dimensional arrays (tensors), commonly used in neural networks.
Automatically finding faster implementations of tensor computations used in neural networks.
Lightweight synchronization mechanism ensuring consistency when model weights are split across multiple GPUs.
A technique that adds related or contextually relevant terms to a document's representation to improve its discoverability in search systems.
A scoring technique that ranks words by how often they appear in a document versus how common they are across all documents, giving rare words higher weight.
Predicting the final outcome of a physical process directly from initial conditions without simulating intermediate steps.
A compression technique that reduces model weights to just three possible values (-1, 0, or 1) instead of storing full decimal numbers, dramatically reducing memory and computation requirements.
Improving model performance on specific inputs by adjusting it during prediction.
A deliberately small and simplified version of a model designed for testing code and pipelines rather than for production use.
Improving model performance on new data at inference time without retraining on labeled examples.
Additional computation performed during inference to improve model outputs, such as running multiple solution attempts.
Improving model accuracy at inference by using extra computation or verification steps without retraining.
Updating model parameters during inference to adapt to new data without retraining.
A machine learning task where a model reads text and assigns it to predefined categories, such as 'safe' or 'unsafe'.
A technique that groups similar texts together automatically by using embeddings to measure similarity, without requiring predefined categories.
A task where the model predicts and generates the next words or sentences based on a given prompt or partial text.
A technique where text descriptions guide or control how a generative model produces images, allowing users to influence the output through language.
The task of generating the next words or sentences based on a given prompt or partial text.
A training technique where parts of input text are randomly deleted, masked, or shuffled to teach the model to understand context and recover meaning.
A technique that converts text into numerical vectors that capture semantic meaning, allowing the model to understand and compare text similarity.
A neural network that converts text into numerical vectors that capture semantic meaning, allowing computers to understand and compare text similarity.
Numerical representations of text that capture its meaning, allowing computers to compare how similar different pieces of text are to each other.
A model component that converts raw text input into numerical vector representations that capture semantic meaning.
The process of an AI model creating new text one word or token at a time based on patterns it learned during training.
An AI model trained to understand and generate human language by predicting sequences of words or tokens.
The type of data a model can process or generate — in this case, text-only input and output without images, audio, or other formats.
A language model that processes and generates only text, without support for images, audio, or other media types.
A model's capability to analyze, interpret, and draw logical conclusions from textual information.
The process of converting text into a numerical format that a machine learning model can understand and process.
A mode of communication where the model receives and produces only text inputs and outputs, without direct support for images, audio, or other media formats.
An AI model that processes and generates only text input and output, without support for images, audio, or other media types.
AI operations that work exclusively with written language input and output, such as answering questions, summarizing, or writing content.
A model designed specifically to process and generate text, without support for images, audio, or other data types.
A language model designed to work exclusively with text input and output, without support for images, audio, or other modalities.
A model that accepts text as input and produces text as output, without support for images, audio, or other data types.
A model that accepts only written text as input, without support for images, audio, or other data types.
A model that accepts and produces only text inputs and outputs, without support for images, audio, or other media types.
A language model that processes and generates only text, without support for images, audio, or other data types.
A model that processes and produces only text input and output, without support for images, audio, or other data types.
Creating 3D models from natural language descriptions using AI models.
Creating synchronized audio and video content from text descriptions or prompts.
The ability to convert natural language descriptions into executable code automatically.
A process that converts text into numerical vectors (embeddings) that capture semantic meaning in a format models can work with.
AI models that generate images from text descriptions or prompts.
An AI model that creates images from written text descriptions or prompts.
A technology that converts written text into spoken audio that sounds natural and human-like.
A task where a model converts natural language questions into executable SQL database queries.
A framework where all NLP tasks are treated as converting input text into output text, so translation, summarization, and classification use the same model structure.
A model task where the input and output are both text, with the model learning to transform one text format into another.
A machine learning model that takes text as input and produces text as output, useful for tasks like translation, summarization, or question answering.
A training approach where all NLP tasks are framed as converting input text to output text, allowing a single model to handle translation, summarization, classification, and other tasks.
Creating video sequences from text descriptions using neural networks.
High-quality, carefully curated training data structured like educational textbooks rather than raw internet text, designed to teach clear concepts and reasoning.
Background knowledge and patterns learned from text that models rely on, sometimes at the expense of visual information.
A numerical method that converts text into feature vectors by measuring how important each word is in a document relative to a corpus.
A large, diverse dataset of text from the internet used to train this model.
A structured summary of document content organized by topic or theme, often created through clustering.
Using AI to automatically verify or discover mathematical proofs and logical statements.
The ability to infer and reason about other people's beliefs, desires, and intentions.
A configurable parameter that controls how much computational time and internal deliberation a model dedicates to solving a problem before responding.
A model operating mode where it explicitly works through problems step-by-step before generating a final answer, improving accuracy on complex tasks.
A language model trained to generate explicit reasoning steps and internal deliberation before producing a final response, rather than answering immediately.
Ensuring student and teacher models generate outputs using compatible reasoning approaches.
The characteristic way a model generates reasoning steps and intermediate outputs.
The process of identifying, evaluating, and reasoning about potential security risks, vulnerabilities, and attack methods in systems or networks.
Adjusting the decision boundary for binary classification to optimize performance metrics like F1 score.
The number of tokens a model can generate per second, measuring its processing speed.
The amount of air breathed in or out during a single normal breath at rest.
Changing the tonal quality or color of a sound while preserving its basic characteristics.
Analyzing data points collected over time to find patterns and make predictions.
Bias that occurs when past treatments affect future confounders, making it hard to isolate treatment effects in sequential decisions.
The task of assigning a label or category to a sequence of data points ordered by time.
The task of predicting future values in a sequence of data points ordered by time, such as stock prices or weather patterns.
The ability to understand and make predictions based on data points ordered over time, like stock prices.
A small unit of text (a word, subword, or punctuation mark) that a language model breaks input into for processing.
The process of selectively activating only certain parts of a model for each individual token processed, rather than using the entire network every time.
Deciding how many tokens (words/subwords) a model should generate for a given problem.
The maximum number of tokens available to include retrieved context in a language model prompt.
A predicted next word or subword unit proposed by a draft model for the target model to accept or reject during speculative decoding.
Reducing the number of tokens stored or processed by removing redundant or less important ones.
The number of text units (tokens) a model processes or generates; longer reasoning processes consume more tokens and may increase latency or cost.
The computational expense and resource usage required to process or generate tokens, which increases when a model performs additional reasoning steps.
The number of small text chunks (tokens) a model generates; higher token counts mean longer responses and more computational cost.
Determining how much each token in a response should be rewarded or penalized based on overall performance.
The probability distribution over possible next tokens that a language model produces during decoding.
A measure of how many tokens (small units of text) a model needs to use to complete a task; more efficient models use fewer tokens and cost less.
Numerical representations of individual words or subwords that capture their meaning and relationships in a way machines can process.
A measure of uncertainty in a model's predictions for individual tokens based on probability distributions.
A score measuring how much each word or subword unit contributes to a model's prediction.
The maximum number of tokens (words or subwords) a model can process in a single input, in this case 8K tokens per chunk.
A training technique where random words in text are hidden and the model learns to predict them, commonly used in models like BERT.
Combining multiple tokens into fewer tokens to reduce computation while preserving model output quality.
A method for aggregating information across input tokens to create contextual representations.
The maximum number of tokens (words or word pieces) a model can generate in a single response, controlling the length of its output.
The spatial coordinates or locations of text elements within a document, used to understand where words and phrases appear on the page.
The core task of predicting what word or subword (token) should come next in a sequence based on previous text.
The cost charged per token (unit of text) processed by a model, which varies based on model capability and complexity.
Removing less important words from AI processing to improve speed and efficiency.
Reordering a model's next-token predictions by likelihood or quality rather than accepting the top-1 choice.
Technique to decrease the number of tokens processed by a model, typically by compressing or filtering visual information.
A vector that encodes the meaning and context of a single word or subword unit (token) within a larger piece of text.
Numerical vectors that encode the meaning of individual words or subword units within a text.
Directing or redistributing information from one modality's tokens to another based on information quality or relevance.
Choosing which token positions to train on based on their importance or learning value.
A series of individual tokens (words or subwords) that the model generates one after another to form a complete response.
Reducing the number of tokens processed by a model to lower computational cost.
The number of tokens a model can generate per unit of time during inference.
The number of tokens (small units of text) consumed during model inference; higher token usage means more computational cost and longer response times.
The complete set of individual text units (tokens) that a model can recognize and process; a larger vocabulary allows the model to handle more diverse languages and specialized terms.
Assigning importance scores to individual words or subwords in text, allowing the model to emphasize semantically significant terms in its representation.
A measure of difference between probability distributions over predicted tokens, used to align model outputs.
Embeddings that represent individual tokens (words or subwords) rather than entire documents, allowing fine-grained matching during search.
Applying different levels of privacy protection to individual tokens based on their sensitivity and importance.
Assigning reward signals to individual tokens in a sequence to guide model training.
The process of breaking text into smaller units (like words or syllables) that a model can understand and process.
The component that splits text into tokens (subwords or characters) that the model can process.
The basic units of text that a language model processes, typically representing words or word fragments.
Measure of how much a model's output quality changes in response to different politeness levels in input prompts.
An agent's ability to call external functions or APIs to gather information or perform actions.
A structured definition that describes what a tool does, what inputs it accepts, and what outputs it produces.
The ability of a model to call external functions or APIs to perform tasks like calculations, searches, or data retrieval.
Extending LLM outputs by integrating external tools, APIs, or functions the model can call to solve problems.
When an AI model decides to use external functions or tools (like database queries) to help answer questions or complete tasks.
An iterative process where an agent repeatedly calls external tools (like search) and updates its reasoning based on results.
Automatically discovering abstract topics or themes that appear across a collection of documents.
A requirement that a segmented structure maintains correct connectivity and shape properties, not just pixel-level accuracy.
Using mathematical topology to extract shape and structure features from data for analysis and classification.
A representation method that works regardless of how input channels are physically arranged or which channels are present.
A metric measuring the maximum difference between two probability distributions, ranging from 0 to 1.
A metric measuring the distinguishability between two quantum states, ranging from 0 (identical) to 1 (orthogonal).
When a model is trained using one objective but deployed using a different process, causing performance gaps between training and real-world use.
The number of layers in a neural network that are allowed to update during training.
A saved snapshot of the model's learned weights at a specific point during training, allowing you to see how the model improved over time.
Saved snapshots of a model at different points during training, allowing researchers to observe how the model's abilities change as it learns.
The date up to which a model has seen training data; the model has no knowledge of events or information after this date.
The examples and information used to teach a model how to perform a task, in this case human-written and AI-generated grammatical corrections.
The process of carefully selecting, filtering, and organizing training data to improve a model's performance on specific tasks rather than relying solely on larger datasets.
The date after which information is not included in a model's training data, meaning the model cannot know about events or facts that occurred after that date.
The practice of publicly disclosing what data was used to train a model, enabling researchers to audit and understand potential biases or limitations.
The range of topics, styles, and types of text a model was trained on; the model performs best on content similar to this distribution and may struggle outside it.
The patterns and behaviors that emerge during a model's training process, such as how loss decreases or how capabilities develop over time.
The ability to achieve strong model performance while using less computational resources, data, or time during the training process.
The number of times a model sees the entire training dataset during learning; more epochs can improve performance but may also lead to overfitting if the dataset is small.
The complete set of steps, data, and code used to train a model, made transparent so others can reproduce or audit the process.
The point during training where reward signals stop improving, indicating the model may be memorizing rather than generalizing.
A sequence of interactions or steps taken by a model during deployment or in an environment.
Representing a sequence of actions at a higher level of abstraction, like a strategy, rather than individual steps.
Predicting the future path or location of a person or object over time.
Computing a planned path or sequence of movements for an autonomous agent to follow.
Controlling video generation by specifying desired motion paths or object movements frame-by-frame.
Generating sequences of actions (trajectories) that an agent takes to solve a task, used for training via imitation learning.
Adapting recorded action sequences to new situations by adjusting them based on matching visual keypoints between scenes.
Evaluating agent performance by examining the complete sequence of actions taken, not just final outputs.
A model that converts input sequences into output sequences with aligned timing.
Using knowledge from one task to improve learning on a different related task.
The dominant neural network architecture for language models, using self-attention to process sequences.
A neural network architecture designed as a different approach to the standard transformer model, often with different trade-offs in speed, memory, or capability.
A neural network design that processes text by analyzing relationships between all words simultaneously, forming the foundation of modern large language models.
A mechanism that allows a model to focus on relevant parts of the input by computing relationships between all pairs of tokens, enabling deep understanding but requiring significant memory.
The core neural network architecture based on attention mechanisms that traditionally powers most large language models.
A neural network component that processes input sequences using attention mechanisms.
Stacked blocks of neural network computations that process and transform input text progressively, with more layers generally allowing the model to learn more complex patterns.
Neural network architecture widely used for language tasks like BERT and RoBERTa.
Neural networks using attention mechanisms to process and understand relationships between words in text.
A method where a transformer neural network generates text one token at a time by learning patterns from training data.
When a judge's scores contradict themselves (e.g., ranking A > B, B > C, but C > A), revealing internal inconsistency.
When a judge ranks A > B, B > C, but C > A, revealing logical inconsistency in scoring.
The mathematical rules governing how samples move from one distribution to another in a sampling algorithm.
Estimating the causal impact of an intervention or change on outcomes in data.
An algorithm that explores possible future states by building a tree of actions and outcomes to find promising paths.
Prioritizing and routing queries by urgency or risk level, directing high-risk cases to human experts.
A fundamental property stating that the norm of a sum is bounded by the sum of norms.
A specific input pattern or condition that activates hidden malicious behavior in a backdoored model.
A metric measuring which input types the backdoor attack actually depends on.
A model with one trillion learnable values that the neural network adjusts during training to improve performance on language tasks.
A training failure where generated sequences become so long they get cut off, biasing the training data toward incomplete examples.
A local region around the current best solution where the surrogate model is trusted to be accurate.
A generalized logarithm parameterized by q that interpolates between different loss functions as q varies.
The ability to detect when one speaker has finished speaking and another can begin, essential for natural conversation flow.
The ability to identify when a speaker has finished speaking and it is another person's turn to speak in a conversation.
A statistical method for estimating intermediate values in a sequence based on observed endpoints.
A strategy that applies different conditioning constraints at different stages of the generation process.
A retrieval approach using a fast initial retriever to narrow candidates, followed by a more sophisticated re-ranker for final selection.
A retrieval system design with separate neural networks for encoding queries and documents independently, allowing efficient comparison between them.
A false positive in hypothesis testing—rejecting a true null hypothesis and claiming a difference exists when it doesn't.
The tendency of generative models to converge on the most common or typical outputs, reducing diversity.
A convolutional neural network architecture with an encoder-decoder structure designed for image segmentation and restoration tasks.
The ability to understand and interact with user interfaces by reading screenshots and generating commands to control applications or websites.
The ability of an AI model to understand and control user interface elements like buttons and forms by interpreting visual layouts and executing appropriate actions.
The model's ability to identify and apply common design patterns and component structures used in user interfaces.
Questions where the correct answer cannot be found in the given context, testing if models admit uncertainty.
A model variant that treats uppercase and lowercase letters as identical, so 'Hello' and 'hello' are processed the same way.
A model without built-in safety filters or content restrictions, allowing it to generate responses on any topic without refusal.
A model trained without safety filters or content restrictions, making it willing to generate responses on sensitive topics that filtered models would refuse.
Quantifying how confident a model is in its predictions, critical for safe deployment in high-stakes applications.
Measuring and tracking how uncertain a model's predictions are based on uncertain inputs.
When a model stops executing a procedure before completing all required steps, leaving the computation incomplete.
Collecting fewer measurements than needed for perfect image reconstruction, used to speed up MRI scans.
A neural network architecture commonly used in image generation that processes images at multiple scales.
A single model design that handles multiple different tasks without needing separate specialized models for each task.
A single input format that handles multiple different tasks, rather than requiring separate models for each task.
An AI model trained to both generate and understand multiple types of data like text and images.
AI models that can process and generate multiple types of data (text, images, etc.) in a single system.
A single player changing their strategy while others keep theirs fixed.
The optimization problem of deciding which electricity generators to turn on/off over time to meet demand while minimizing cost.
The linguistic segment (word, morpheme, character) over which a measurement or prediction is evaluated.
Automated code that checks whether a specific piece of software works correctly by testing individual functions.
The property that a model can theoretically learn any continuous function given sufficient capacity.
Learning general rules from examples that apply broadly across different situations.
A model designed to work well across many different tasks and domains without requiring task-specific customization or retraining.
A function proportional to a probability distribution but not scaled to sum or integrate to one.
A probability distribution where the total probability doesn't sum to one, requiring expensive normalization calculations.
An algorithm for estimating the state of a system from noisy measurements, designed to handle nonlinear dynamics better than standard Kalman Filters.
Information that doesn't follow a predefined format or organization, such as raw text documents or photographs.
Information stored as plain text documents rather than organized databases, like PDFs or policy manuals.
Grouping data points into categories without labeled training examples, discovering patterns automatically.
A machine learning technique that discovers hidden structure in data without labeled examples, creating meaningful representations automatically.
Training a model without labeled examples, letting it discover patterns on its own.
Training language models with reinforcement learning using rewards derived without human labels or ground truth answers.
A model with the correct structure but no learned knowledge, producing meaningless output because it has never been trained on data.
Using sparse local measurements to estimate values across a larger geographic or temporal region.
A learned vector representation that captures an individual driver's unique preferences and driving style.
A synthetic agent that mimics realistic user behavior and preferences to test AI assistant performance.
Prompting a model to generate the next user message in a conversation to probe whether it understands interaction dynamics.
A generalization of Shannon information that measures how much information is actually useful to a specific observer or agent.
A process that checks generated outputs (like rendered charts) against quality criteria and iteratively improves them based on detected failures.
A function estimating how good a state or action is for achieving a goal.
Recognition that multiple legitimate ethical principles (autonomy, beneficence, justice) can conflict, requiring case-by-case navigation rather than single fixed rules.
The process of updating an agent's estimates of state values backward through a trajectory during learning.
A technique that dynamically adjusts how much a model explores new outputs versus exploiting known good ones.
Reducing a problem's complexity by fixing certain decision variables to specific values based on prior knowledge or predictions.
Techniques that reduce noise in gradient estimates to improve optimization efficiency and convergence speed.
A neural network that learns to compress data into a latent space and reconstruct it, useful for learning smooth representations.
An embedding learned through a variational approach that optimizes a probabilistic objective function.
A method to approximate complex probability distributions by learning simpler, tractable distributions.
A quantum machine learning model that uses parameterized quantum circuits to classify data by optimizing circuit parameters.
An optimization technique that transfers knowledge from a teacher model to improve generation quality by matching score distributions.
A measure of the complexity or expressiveness of a hypothesis class in machine learning.
The number of individual numerical values used to represent a piece of text; higher dimensions can capture more nuanced meaning but require more computational resources.
A representation of data (like molecules or text) as a list of numbers that captures its essential features in a form that machine learning models can work with.
Numerical representations of text where each word or sentence becomes a list of numbers that capture its meaning in a way computers can process.
The process of converting input data (like text) into numerical vectors that can be stored, compared, and searched efficiently.
Images defined by mathematical shapes and paths rather than pixels, allowing them to scale to any size without losing quality.
A preprocessing step that scales vectors to a standard length, ensuring fair comparisons when using cosine similarity.
The model's output is a single array of numbers (a vector) rather than generated text, which can be efficiently compared with other vectors to measure similarity.
Compressing data by encoding groups of values together rather than individually, achieving better compression ratios.
A way of expressing text as a list of numbers that a computer can process and compare mathematically.
A search method that converts text into numerical vectors and finds similar documents by comparing vector distances.
A search method that converts queries and documents into numerical vectors and finds matches by measuring similarity between vectors, fast but less nuanced than other ranking approaches.
A measurement of how alike two vectors (number lists) are to each other, used to determine if two pieces of text have similar meanings.
A method that converts text into numerical vectors and finds documents with vectors closest to a query vector, fast but sometimes missing nuanced relevance signals.
A mathematical representation where text is converted into points or directions in a multi-dimensional space, enabling comparison and analysis of semantic relationships.
A parameter-efficient fine-tuning approach that adapts models using learned vectors instead of full weight matrices, requiring even fewer parameters than LoRA.
A reward signal with multiple dimensions (e.g., correctness per test case) instead of a single scalar score.
In diffusion models, the learned direction and speed that guides the generation process at each step.
Operational procedures and knowledge unique to equipment from a particular manufacturer, like GE MRI scanners.
Uncertainty estimates based on explicit confidence statements the model generates as part of its reasoning output.
Answers that can be checked against external sources like the web to confirm correctness.
The code or rules that check whether an agent's solution correctly solves a benchmark task.
A model or system that evaluates whether another model's output is correct or high-quality.
A generative model that creates videos by iteratively refining noise into coherent frames, similar to image diffusion but applied to sequences.
A model component that processes video frames and converts them into compact numerical representations that capture the video's visual and motion content.
Creating realistic video sequences using AI based on text or image descriptions.
Editing technique that deletes objects from video while filling in background and correcting physical interactions.
A task where AI models watch videos and answer questions about what they see and understand.
Extending image segmentation to video by identifying and tracking objects across multiple frames over time.
Automatically selecting key frames or clips from a long video that capture the most important content.
The ability to follow and maintain consistent identification of objects as they move across multiple frames in a video sequence.
The ability of AI systems to analyze and extract meaning from video content including visual, temporal, and semantic information.
A variational autoencoder designed for video that compresses video frames into a latent representation for efficient processing.
A specialized AI model trained to understand video content and communicate its understanding through natural language text.
Creating sound effects or audio that matches the visual content and timing of a video.
How an object's appearance changes based on the viewing angle, including effects like reflections and shininess.
The capability to track how a viewpoint changes through rotations and predict resulting observations.
Representing biological cells as simplified computational models for simulation.
A computer-generated 3D environment that users can interact with using special headsets or controllers.
Using AI to digitally add color to microscope images without physical staining.
A mathematical solution concept for complex equations that handles non-smooth behavior in optimization problems.
The core neural network component that processes and understands images before passing information to the rest of the model.
A component that converts images into a numerical representation that a language model can understand and process.
A process that converts images into numerical representations that a model can understand and process.
Large pre-trained models like DINO and SAM that learn general visual understanding from diverse image data.
The specialized component of a model that processes and interprets image data to extract visual information.
Discrete representations of image patches or regions processed by vision-language models.
A neural network architecture that processes images by breaking them into small patches and analyzing them similarly to how language models process text.
A neural network architecture that processes images by breaking them into small patches and treating them similarly to how language models process words.
The ability of an AI model to analyze and interpret visual information from images, identifying objects, scenes, and their relationships.
A model designed to understand and reason about both visual content (images) and natural language text together.
Training a model to understand the relationship between images and their text descriptions so it can match them together effectively.
A pre-trained model that jointly processes and understands both visual and textual information in a unified representation.
A model that processes both images and text together to create shared numerical representations, rather than generating new text like a full language model would.
Training a model to understand and connect both images and text together, so it can reason about visual content using language.
An AI model that understands both images and text, allowing it to answer questions about images or describe what it sees.
AI systems that understand both images and text, allowing them to answer questions about images or describe what they see.
Task where an AI agent navigates physical spaces by following natural language instructions while processing visual input.
A task that requires a model to understand and reason about both visual information (images) and textual information together.
AI tasks that require understanding both visual information from images and textual information together, such as describing images or answering questions about them.
A model that combines visual perception, language understanding, and robotic action generation to interpret instructions and control robot movements.
Converting visual inputs like screenshots, charts, or diagrams into executable code or structured representations.
Grounding abstract concepts (like actions) to concrete visual observations to ensure they have real physical meaning.
Reconstructing or identifying visual stimuli from recorded brain activity patterns.
A component that converts images into a numerical representation that the model can understand and process.
Predicting and visualizing what a robot will do next based on its learned policy.
The ability to connect specific words or concepts in text to the actual objects or regions they refer to in an image.
A training technique that teaches a model to follow instructions about images by learning from examples of image-text instruction pairs.
A task where an AI model reads a question and an image, then generates an answer based on what it understands from the image.
The capability to analyze images and draw logical conclusions or answer complex questions based on what is depicted in the visual content.
RAG applied to visually rich documents, allowing models to retrieve and reason over images and multi-page visual content.
The task of identifying and separating individual objects or regions in an image or video by assigning each pixel to a specific object or category.
The degradation of visual understanding in models as generated text accumulates, causing attention to shift away from image tokens.
Removing unnecessary visual tokens from images or videos to reduce computational cost in vision-language models.
Discrete units representing different regions or features of an image processed by the model.
The ability of an AI model to interpret and analyze images, including identifying objects, reading text, and answering questions about visual content.
A model that processes both images and text together, understanding the relationship between visual content and language to answer questions about images or describe what it sees.
How a multimodal model allocates focus between visual and text information when processing inputs.
The persuasive techniques and design choices used in charts and graphs to influence how viewers interpret data.
A model's ability to understand and reason about visual information in images, connecting what it sees to language and concepts.
Combining visual information from cameras with tactile (touch) sensor data to improve robot perception and decision-making.
A system that converts visual input into motor control commands for robot manipulation.
A learned control policy that maps visual observations directly to robot motor commands.
An inference engine optimized for running large language models efficiently by batching requests and managing memory intelligently.
A high-performance serving framework that efficiently runs language models and embedding models with optimized memory usage and throughput for production deployments.
The complete set of unique words or tokens that a language model can recognize and generate.
When a model over-predicts only a few options and ignores others, losing diversity in its outputs.
Adding new tokens or words to a language model's vocabulary beyond its original pretrained set.
The number of unique tokens (words or word pieces) a model can recognize and process; larger vocabularies provide better coverage of a language.
A language model restricted to generate only outputs from a predefined set of allowed terms or concepts.
The process of generating natural-sounding human speech from text using machine learning models.
A neural network that learns discrete, quantized representations by combining VAE principles with vector quantization.
A reward signal derived from visual question answering that uses language-vision reasoning to evaluate image quality.
Video RAM — the memory on a GPU that stores model weights and intermediate computations during inference.
The amount of graphics memory (VRAM) required to load and run a model on a GPU.
Automatically identifying security flaws or weaknesses in code that could be exploited by attackers.
The ability to understand and explain how security weaknesses in software or systems could be exploited and what their potential impact might be.
A quantization format where model weights are stored in 4-bit precision while calculations use 16-bit precision, balancing efficiency with accuracy.
A specific quantization scheme where weights are stored in 4-bit precision while activations remain in 16-bit precision, balancing memory savings with accuracy.
A specific quantization method where both weights (w) and activations (a) are stored as 8-bit integers, providing a good balance between memory savings and model quality.
A specific quantization method that reduces both weights and activations to 8-bit integers, enabling faster computation on specialized hardware while maintaining reasonable accuracy.
Providing an optimization solver with an initial candidate solution to speed up convergence instead of starting from scratch.
A greedy heuristic that prioritizes moves to positions with fewer onward options to avoid dead ends.
A mathematical measure of how different two distributions are, useful for comparing expert and agent behavior.
Training with imperfect or limited supervision signals, such as scarce labels, noisy annotations, or self-generated targets.
Testing distillation by using a weaker model as teacher to see if a stronger student learns meaningfully.
Automatically browsing and collecting data from websites by following links across the internet.
Training data collected from publicly available internet sources, which provides broad but sometimes uneven coverage of topics.
An agent's capability to navigate websites, fill forms, click buttons, and extract information from live web pages.
The ability to search the internet in real-time during processing to retrieve current information rather than relying only on training data.
The capability for a model to query the internet in real-time during response generation, allowing it to access current information beyond its training data.
A specific quantization method that compresses both the model's stored weights and its intermediate calculations to 8-bit precision, significantly reducing memory and computation requirements.
A merging method that combines model weights by taking their average.
Grouping similar weight values together and replacing them with shared cluster centers to reduce model size.
The process of directly modifying a trained model's internal parameters (weights) to change its behavior without retraining from scratch.
The process of using a neural network to produce parameters for another model rather than training those parameters directly.
A measure of how much a specific weight contributes to model predictions and performance.
The process of setting the starting values for a neural network's parameters before training begins.
The number of bits used to represent each numerical value in a model's weights; lower precision (like 4-bit) uses less memory but may reduce accuracy.
A specific type of quantization that compresses only the model's learned parameters (weights) while keeping other calculations at higher precision.
Using the same neural network parameters for multiple tasks to enable knowledge transfer and reduce model size.
The numerical parameters inside a neural network that determine how it processes input and generates output.
A system that converts high-level motion commands into executable joint trajectories for robots.
A high-resolution digital scan of an entire microscope slide used in computational pathology for disease diagnosis.
How optimizer behavior changes when you increase the number of neurons in each layer of a neural network.
A quantum version of the score function that describes how to reverse noise in quantum systems.
A retrieval criterion requiring the correct target to score strictly higher than all other candidates.
The total length of connections between components on a chip; shorter wirelength improves performance and power efficiency.
Determining which meaning of a word is intended in a specific context when a word has multiple meanings.
A step-by-step demonstration of how to solve a problem, used to help students learn problem-solving strategies.
Using an AI model to automatically handle repetitive business tasks and processes, reducing manual effort and improving efficiency.
The active, temporary knowledge an AI system uses for the current task, drawn from long-term memory.
Software that schedules and manages job submissions and resource allocation on shared computing clusters.
A model's learned understanding of facts, concepts, and relationships about the real world, typically acquired during training on diverse text data.
An AI system that learns to understand and predict how the physical world works from observations.
Predicting future states of the environment based on current observations and actions.
Internal representations learned by AI systems that capture how the physical world works, including how objects move and interact over time.
Evaluating system behavior on the most dangerous or consequential failures rather than average performance.
The ability for modules to update and modify shared state, enabling bidirectional communication between tools.
A compression technique where the encoder has limited information but the decoder has side information to help reconstruction.
A metric that counts predictions as either completely right or completely wrong with no partial credit.
Solving a task without any training examples by using knowledge from related tasks or descriptions.
How well an AI model performs on new tasks it has never seen before without any training.
Identifying previously unknown security vulnerabilities or attacks that have no existing defenses.
The maximum rate at which information can be reliably transmitted over a noisy channel with zero probability of error.
A gradient-free signal derived from comparing function values across different hyperparameter settings.
Training without paired examples of two modalities, using only single-modality data.
Agent performing tasks without any external skill retrieval or runtime augmentation, relying only on learned parameters.
A comparison model that makes predictions without being trained on the target task or domain.
A model's ability to handle new, unseen tasks or data without additional training on those specific examples.
Using a model to solve a task without any training examples for that specific task.
Making predictions on new tasks without any task-specific training or fine-tuning on labeled examples.
Creating new sounds the model has never seen before by using reference audio as a guide.
Performing a new task without any training examples, using only knowledge learned from other tasks or domains.
A security model that requires verification of every access request regardless of source, rather than trusting internal networks.
Predicting risk at the geographic area level rather than individual policy level, useful when detailed location data is unavailable.