Forum AI and the key question: Who decides what AI tells you?

AI evaluation – Campbell Brown’s Forum AI builds benchmarks and AI judges for high-stakes topics, aiming for human-level consistency on accuracy and bias.
A single, confident answer from an AI chatbot can feel like certainty. But the real fight, Campbell Brown argues, is over who sets the rules for what AI says in the first place.
Brown. once Meta’s first and only dedicated news chief. now runs Forum AI and is trying to prevent history from repeating itself as AI becomes a primary gateway to information.. She discussed her approach recently with TechCrunch’s Tim Fernholz in San Francisco.. In her view. the moment ChatGPT became publicly available marked a turning point: she says it quickly became clear that AI would act as a funnel for how people receive information. and that early results were not good enough.
Forum AI focuses on evaluating foundation models on what Brown calls “high-stakes topics,” including geopolitics, mental health, finance, and hiring.. These are areas where there are no easy yes-or-no answers—situations she describes as murky. nuanced. and complex—meaning generic performance benchmarks may miss the kinds of failures that actually matter in the real world.
Her company’s method starts with assembling what she calls the world’s foremost experts to design benchmarks.. Then, Forum AI trains AI judges to evaluate model outputs at scale.. For geopolitics work. Brown says she has brought in Niall Ferguson. Fareed Zakaria. former Secretary of State Tony Blinken. former House Speaker Kevin McCarthy. and Anne Neuberger. who led cybersecurity in the Obama administration.. The target is for AI judges to reach roughly 90% consensus with those human experts. a threshold Brown says Forum AI has been able to hit.
But Brown’s concern is not only that AI makes mistakes—it’s that the industry has often prioritized the wrong goals.. She argues that foundation model companies tend to focus heavily on coding and math. while the kinds of skills needed for trustworthy information are harder to measure.. And she says “harder” is not a reason to stop trying when the stakes are high. especially for people who rely on AI for understanding the world.
When Forum AI began testing leading models, Brown said the outcomes were not reassuring.. She cited examples including Gemini pulling from Chinese Communist Party websites for stories she says had nothing to do with China. and she also pointed to left-leaning political bias across nearly all models.. She describes additional failure modes that are more subtle than outright misinformation: missing context. missing perspectives. and straw-manning arguments without acknowledging opposing considerations.. “There’s a long way to go. ” Brown said. while also adding that she believes there are “easy fixes” that could improve results substantially.
Brown’s perspective is shaped by her earlier experience at Facebook. where she says she learned what happens when a platform optimizes for the wrong thing.. She told Fernholz that the fact-checking effort she built while at Facebook no longer exists. and she frames the lesson as broader than any one program: optimizing for engagement has been “lousy for society” and left many people less informed.
That history is part of why she sees a window of opportunity for AI to interrupt the engagement-first cycle.. Brown says the future could go either way—companies might give users what they want. or they might “give people what’s real and what’s honest and what’s truthful.” She acknowledges that an ideal version of the goal—optimizing for truth—can sound naive. but she believes enterprise incentives may make the difference.
In her account. businesses that use AI for credit decisions. lending. insurance. and hiring have more to lose when systems are wrong. and that creates demand for performance tied to correctness and liability.. Those needs. she argues. push the market toward models that are evaluated not just on speed or accuracy claims. but on whether outputs hold up under scrutiny.
Forum AI’s business bet aligns with that idea, though Brown says turning compliance interest into consistent revenue remains difficult.. She argues that much of the market is still satisfied with “checkbox audits” and standardized benchmarks she considers inadequate.. In her view, the real problem is that compliance efforts often don’t capture what goes wrong in practice.
Brown also criticizes the way evaluation works when domain knowledge is missing.. She called the compliance landscape “a joke. ” pointing to New York City’s first hiring bias law that required AI audits.. She said the state comptroller found more than half had violations that went undetected.. Real evaluation. she argues. requires domain expertise to work through not only known scenarios but also edge cases—situations that can create harm in ways people don’t anticipate.. And because this kind of testing takes time, she says “smart generalists” are not enough.
She also emphasizes the gap between the AI industry’s self-image and what many users experience.. Brown says leaders in big tech often tell audiences that AI will change the world or even eliminate work. cure cancer. and similar claims.. But for ordinary users who ask chatbots basic questions. she says the reality can be “slop and wrong answers. ” and she argues that this mismatch helps explain why trust in AI is so low.
In her view, consumer skepticism is not only widespread but often justified.. She draws a contrast between the conversation she says is happening in Silicon Valley and the one happening among consumers. implying that the industry may be debating the wrong issues while everyday reliability—and the mechanisms deciding what AI tells people—remains unresolved.
Brown’s company. based in New York. was founded 17 months ago and she traces its origin to a specific moment during her time at Meta. when ChatGPT was first released publicly.. Forum AI raised $3 million last fall. led by Lerer Hippeau. as Brown tries to turn the question of evaluation into a practical system—one designed to make “high-stakes” answers measurable. accountable. and harder to get wrong.
Forum AI Campbell Brown AI evaluation foundation models AI bias AI compliance high-stakes benchmarks