AI’s glossary is the antidote to term overload

Artificial intelligence isn’t just changing products—it’s changing language. A regularly updated AI glossary breaks down terms like AGI, AI agents, LLMs, RLHF, chain-of-thought, and even the hardware pressures behind them, turning jargon into plain, usable def
Artificial intelligence is rewriting the world, and then quietly inventing a whole new language to explain what it’s doing.
Walk into a product meeting. sit through a pitch. or tune in to one of the AI panels happening every week. and you’ll hear acronyms land like door slams: LLMs. RAG. RLHF. Even smart people in the tech industry can feel a little unsteady once the conversation starts moving faster than their understanding.
This is the gap a living AI glossary is trying to close. It offers pain-English definitions of AI terms readers are most likely to run into—whether they’re building with the technology. investing in it. or simply trying to keep up by reading AI coverage or listening to related podcasts. The glossary is updated regularly as the field evolves.
At the top of that translation effort is AGI, a term that remains slippery even among experts. It generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman once described AGI as the “equivalent of a median human that you could hire as a co-worker.” OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s framing differs slightly: the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.”.
In other words, the same letters can point to different thresholds—so confusion isn’t a sign of ignorance. It’s part of the terminology itself.
Another term that travels fast, but often means different things to different people, is an AI agent. The glossary describes it as a tool that uses AI technologies to perform a series of tasks on your behalf—beyond what a more basic AI chatbot could do. It gives everyday examples like filing expenses. booking tickets or a table at a restaurant. and even writing and maintaining code. It notes that the space is still messy. with many moving pieces and infrastructure being built to deliver on the capabilities people imagine. The basic idea, though, is an autonomous system that may draw on multiple AI systems to carry out multistep tasks.
There’s also the practical layer beneath agents: API endpoints. Think of them as “buttons” on the back of a piece of software that other programs can press to make it do things. Developers use these interfaces to build integrations—for example. allowing one application to pull data from another. or enabling an AI agent to control third-party services directly without a human manually operating each interface. The glossary adds that most smart home devices and connected platforms have these hidden buttons available. even if ordinary users never interact with them. And as AI agents grow more capable, they can increasingly find and use these endpoints on their own.
For readers who’ve heard the phrase chain-of-thought reasoning but never felt confident what it meant. the glossary draws a clear line. In an AI context. it refers to breaking a problem into smaller. intermediate steps to improve the quality of the end result. It usually takes longer to get an answer. but it’s more likely to be correct—especially in logic or coding contexts. Reasoning models are described as developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.
That “step by step” theme also shows up in coding agents. The glossary positions a coding agent as a specialized version of an AI agent applied to software development. Instead of just suggesting code for a human to review and paste in. it can write. test. and debug code autonomously—handling iterative trial-and-error work that typically consumes a developer’s day. It can operate across entire codebases, spot bugs, run tests, and push fixes with minimal human oversight. The comparison offered is simple: like hiring a fast intern who never sleeps and never loses focus. with the familiar caveat that a human still needs to review the work.
Then there’s the infrastructure story inside the jargon. Compute is described as the vital computational power that allows AI models to operate. fueling the AI industry by enabling training and deployment. It’s often used as a shorthand for the hardware providing that power—GPUs. CPUs. TPUs. and other forms of infrastructure considered the bedrock of modern AI.
The glossary also explains deep learning as a subset of self-improving machine learning built on a multi-layered artificial neural network (ANN) structure. inspired by the interconnected pathways of neurons in the human brain. It says deep learning systems can identify important characteristics in data themselves rather than relying on human engineers to define features. They can learn from errors and improve their own outputs through repetition and adjustment. The trade-offs are blunt: they require lots of data points—millions or more—and typically take longer to train than simpler machine learning algorithms. pushing development costs higher.
Some generative AI terms are less about model architecture and more about how the output is made. Diffusion is described as the tech behind many art-, music-, and text-generating AI models, inspired by physics. Diffusion systems slowly “destroy” the structure of data by adding noise until there’s nothing left; diffusion in physics is spontaneous and irreversible. The AI approach. the glossary says. is to learn a sort of “reverse diffusion” process to restore destroyed data by recovering it from noise.
Distillation, meanwhile, is presented as a technique using a ‘teacher-student’ setup. Developers send requests to a teacher model, record the outputs, and sometimes compare them with a dataset to check accuracy. Then a student model is trained to approximate the teacher’s behavior. The glossary adds that distillation can be used to create a smaller. more efficient model from a larger one with minimal distillation loss. citing OpenAI developing GPT-4 Turbo as “a faster version of GPT-4.” It also warns that while AI companies use distillation internally. distillation from a competitor usually violates the terms of service of AI API and chat assistants.
Fine-tuning gets its own place in the glossary as further training to optimize performance for a specific task or area. Many startups. it says. start with large language models and then “amp up utility” for a target sector by supplementing earlier training cycles with fine-tuning based on domain-specific knowledge.
Not all generative breakthroughs are treated equally. The glossary defines GANs—generative adversarial networks—as machine learning frameworks underpinning some generative AI developments used for realistic data. including deepfake tools. It describes the contest between a generator and a discriminator: the generator tries to produce outputs that fool the discriminator. while the discriminator tries to detect artificially generated data. The structured back-and-forth can optimize outputs to be more realistic without additional human intervention. But it also cautions that GANs work best for narrower applications. like producing realistic photos or videos. rather than general-purpose AI.
The glossary doesn’t soften the hardest defect word: hallucination. It defines hallucination as the AI industry’s term for models making stuff up—generating incorrect information. It calls it a huge problem for AI quality and warns that hallucinations can mislead and lead to real-life risks. including dangerous consequences like harmful medical advice from a health query that returns incorrect information.
It ties the problem to gaps in training data, and links that to a push toward increasingly specialized and/or vertical AI models—domain-specific AIs with narrower expertise—described as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.
Inference. the glossary says. is the process of running an AI model—setting a model loose to make predictions or draw conclusions from previously seen data. It adds a key constraint: inference can’t happen without training. The hardware picture is spelled out too. Inference can be performed by smartphone processors, beefy GPUs, or custom-designed AI accelerators. But the glossary notes that not all hardware can run models equally well: very large models would take ages on a laptop compared with a cloud server with high-end AI chips.
Then comes large language models, the umbrella most people actually interact with. LLMs are described as the AI models used by AI assistants such as ChatGPT. Claude. Google’s Gemini. Meta’s AI Llama. Microsoft Copilot. and Mistral’s Le Chat. When you chat with an AI assistant. you interact with a large language model that processes requests directly or with help from available tools like web browsing or code interpreters.
LLMs are described as deep neural networks made of billions of numerical parameters—or weights—that learn relationships between words and phrases and create a representation of language: a multidimensional map of words. They’re created by encoding patterns from billions of books, articles, and transcripts. When you prompt an LLM, it generates the most likely pattern that fits the prompt.
The glossary then zooms in on what happens while the model is responding. Memory cache is described as an optimization technique designed to make inference more efficient by saving computations for future user queries and operations. It notes different kinds of caching, with KV (key value) caching highlighted for transformer-based models. KV caching boosts efficiency and speeds results by reducing the time—and algorithmic labor—needed to generate answers.
For the connectivity layer, there’s Model Context Protocol, or MCP. MCP is described as an open standard letting AI models connect to outside tools and data—files. databases. or apps like Slack and Google Drive—without a developer building a custom connector for every pairing. It’s framed as a USB-C port for AI. The glossary says Anthropic introduced MCP in 2024 and later handed it over to the Linux Foundation. then notes it has been adopted by OpenAI. Google. and Microsoft. making it one of the fastest-spreading standards in recent AI history.
Other terms push readers into the mechanics behind speed and cost. Mixture of Experts is explained as splitting a neural network into many smaller specialized sub-networks (“experts”) and activating only a handful for any given task. The idea is a router chooses specialists for the job instead of routing every request through the entire model. The glossary says this makes it possible to build enormous models that stay relatively fast and cheap to run because only a fraction of the network does work at any one time. It cites Mistral AI’s Mixtral as a well-known example and says OpenAI’s newer GPT models are widely believed to use some version of the approach. while noting OpenAI has never officially confirmed it.
Parallelization gets described as doing many things at the same time rather than sequentially. In AI. it’s fundamental to both training and inference because modern GPUs perform thousands of calculations in parallel—one reason GPUs became a hardware backbone. As models grow more complex and larger. the ability to parallelize across chips and machines is framed as a major factor in how quickly and cost-effectively models can be built and deployed. The glossary adds that research into better parallelization strategies is now a field in its own right.
Even the hardware shortages have a term in the glossary. RAMageddon is described as a trend of increasing shortage of random access memory—RAM chips powering pretty much all daily-tech products. As the AI industry blossomed. the glossary says major tech companies and AI labs bought so much RAM for data centers that there wasn’t much left for everyone else. That supply bottleneck, it adds, is making remaining RAM increasingly expensive. It connects the shortage to
gaming. consumer electronics. and general enterprise computing. including the idea that gaming companies have raised prices on consoles because it’s harder to find memory chips for devices. It also says consumer electronics could see the biggest dip in smartphone shipments in more than a decade. The glossary ends the RAMageddon section with the bleak note that prices are only expected to stop after the shortage ends. with “not really much of a sign” it will
happen anytime soon.
A few definitions circle the longer-term fears people attach to AI’s trajectory. Recursive self-improvement is described as a threshold for how smart AI can get and how little it may rely on humans. In some tellings, models improve themselves without human intervention, accelerating capabilities and autonomy. The glossary says this is sometimes framed as a cataclysmic moment akin to singularity—when AI models become immune to outside intervention—but it also says RSI can describe a more basic capability: whether an AI model can design its own successor. It notes that several startups aim to build recursively self-improving models. but most dismiss the apocalyptic implications and frame RSI as simply the next frontier for research.
Reinforcement learning rounds out the training vocabulary. It’s described as training where a system learns by trying things and receiving rewards for correct answers—compared to training a pet with treats. except the “treat” is a mathematical signal indicating success. Unlike supervised learning. the glossary says reinforcement learning lets a model explore its environment. take actions. and update behavior based on feedback. It notes reinforcement learning’s power for training AI to play games. control robots. and more recently sharpen reasoning ability in large language models. It also places RLHF—reinforcement learning from human feedback—at the center of how leading labs fine-tune models to be more helpful. accurate. and safe.
Finally, the glossary addresses the terms that most users never see but always pay for. Tokens are described as the basic building blocks of human-AI communication, representing discrete segments processed or produced by an LLM. They’re created through tokenization. breaking raw text into bite-sized units a language model can digest—compared to a compiler translating human language into binary. In enterprise settings, tokens determine cost because most AI companies charge per-token basis.
Throughput is described as how much can be processed in a given period. Token throughput. specifically. is framed as how much AI work a system can handle at once—key for serving many users simultaneously and responding quickly. It includes a quote-style reference to AI researcher Andrej Karpathy describing feeling anxious when his AI subscriptions sit idle. and connecting it to the feeling he had as a grad student when expensive hardware wasn’t being fully utilized. The glossary uses that sentiment to capture why maximizing token throughput has become an obsession.
Training closes the loop on the vocabulary: it’s the process of feeding data into a model so it can learn patterns and generate useful outputs. The glossary says training is expensive because it requires lots of inputs and those volumes have been trending upwards. adding that hybrid approaches like fine-tuning a rules-based AI with targeted data can help manage costs without starting entirely from scratch.
It then defines transfer learning as starting with a previously trained AI model for a related task. reapplying knowledge from earlier training cycles. It says this can drive efficiency savings and help when task data is limited. but also warns that limitations exist: models relying on transfer learning for generalized capabilities will likely require additional training data to perform well in their domain of focus.
Validation loss is presented as a metric that tells how well a model is learning during training. with lower being better. It’s described as a real-time report card used to decide when to stop training. adjust hyperparameters. or investigate a potential problem—especially overfitting. where a model memorizes training data rather than learning patterns that generalize. The glossary compares it to a student who genuinely understands material versus one who memorized last year’s exam.
Weights are defined as core to training. determining how much importance is given to different features or input variables used during training. shaping the model’s output. They’re described as numerical parameters applied by multiplication to inputs. Training typically begins with weights randomly assigned, then weights adjust as the model seeks to match target outputs. A housing price example is given: a model trained on historical real estate data for a target location could include weights for features like number of bedrooms and bathrooms. whether a property is detached or semi-detached. whether it has parking. and whether it has a garage.
Across all these terms. the purpose stays steady: if AI is building products and redefining how machines think. the least it can do is stop leaving readers to guess what people mean when they say the words. This glossary is designed to make that guessing unnecessary—because the world is moving too fast to lose time to jargon.
AI glossary AGI AI agents LLM RLHF chain-of-thought hallucination inference MCP tokens diffusion distillation fine-tuning