Seven models. Same prompt. Same public data. Some calls that held up. One that needs an asterisk. And a surprise from the Governor’s race that the framework got partly right — and partly wrong.

Election week is the worst time to pretend AI is an oracle.

Most California races are still being counted, and the state won’t certify results for weeks. That’s exactly why this is the right moment to talk about questions, not margins — and about methodology, not magic.

For this project, our team asked seven leading AI models to analyze three high-profile California contests:

Los Angeles Mayor
California Governor
LA County Measure ER

Each model received the same prompt, the same public online signals, and the same analytical framework: Discoverability, Credibility, and Momentum.

Where B3 Fits In

This framework is part of the same diagnostic B3 Media Solutions applies when a brand hires us to audit their competitive visibility. We’ve used it for brands ranging from consumer startups to Fortune 500 teams and the questions it’s designed to answer are always a variation of these two:

“Why is our competitor always in the conversation and we’re not?”
“We’re spending heavily. Why doesn’t it feel like we have momentum?”

For this experiment, our team swapped “customer” for “voter.” The logic is identical: before you can win someone’s vote or their business they have to be able to find you, trust you, and feel like momentum is on your side.

The election was the case study. The product is the diagnostic.

The Framework: Discoverability, Credibility, Momentum

Before touching any predictions, we locked in a clear, brand-style structure to govern how each model was asked to reason.

1. Discoverability

Can a normal voter — or customer — easily encounter this candidate, measure, or brand without already knowing the name?

2. Credibility

When people do find them, does what they see inspire trust? Does this feel like a serious, plausible choice?

3. Momentum

Where is the energy moving right now and is it the kind of momentum that converts?

We asked every model to reason inside that structure. Same races, same time window, same inputs. The interesting part isn’t just whether they got it right. It’s how they traded off those three lenses when the signals disagreed. We’ve written about the full methodology separately, here, but what matters here is how the models applied it under pressure.

One wrinkle we had to confront: in 2026, the ‘public signals’ these models read are no longer neutral or organic. Platforms like Reddit are now explicit targets for brands and campaigns trying to steer AI search results and chat answers, which is one reason we separate Discoverability, Credibility, and Momentum instead of just asking ‘what’s popular online.’

What We Can Already Grade: The “Obvious” Questions

Even with partial counts, a few questions are already safe to assess.

Question 1: Who was the most likely first-place finisher for LA Mayor?

On this, all seven models agreed: Karen Bass.

Bass scored highest across models on Discoverability (incumbent, saturation media presence, official channels) and Credibility (sitting mayor, established coalitions).
Momentum was noisy — Pratt and Raman generated more online spikes — but the models largely treated that as volatility, not a credible threat to Bass‘s first-place floor.

Early returns and projections support that call. Bass has secured her spot in the November runoff.

Provisional grade: “Did the models correctly identify Bass as the most likely first-place finisher?” So far, yes.

What it says about the framework: When Discoverability and Credibility both point hard in one direction, the models are willing to treat short-term Momentum spikes as noise rather than destiny. That’s a meaningful calibration for a brand to understand and one that tracks with how strong institutional brands behave in competitive markets.

Question 2: Would Becerra advance to the November runoff?

Even on this, all seven models converged: Xavier Becerra.

Becerra dominated on Discoverability (search volume, news coverage, institutional footprint) and Credibility (former California Attorney General, former U.S. HHS Secretary, deep Democratic Party infrastructure).
Momentum was more evenly distributed as Steve Hilton and Tom Steyer both had real pockets of intensity, but the models treated those as ceilings, not floors.

Unofficial statewide results (100% reporting) confirm Becerra is advancing to the November runoff. However, they also surface an important nuance: Steve Hilton is leading the raw vote at 27.6% to Becerra’s 25.5%. Every model predicted Becerra would finish first. In reality, the models correctly identified him as a runoff lock — they just had the order inverted. The models weren’t hallucinating a favorite. They were reading a genuine structural advantage.

Provisional grade: “Did the models correctly identify Becerra as a runoff advancer?” Yes — confirmed by unofficial results. “Did they correctly predict he would finish first?” No. Hilton leads the raw vote at 27.6%; Becerra is second at 25.5%.

What it says about the framework: The models correctly read Becerra’s structural floor and found he had strong institutional Credibility and durable Discoverability, which made him a reliable runoff call. What they missed was Hilton’s ceiling. The models treated Hilton’s Momentum as a lane-consolidation story with limited upside; unofficial results suggest that lane ran deeper than the public signal data indicated.

What We Cannot Grade Yet: The Second-Place Knife Fights

If the first-place calls were a test of floor, the second-place calls are a test of edge. That’s where model behavior gets genuinely interesting.

Updated as of June 7, 2026

As of June 7, with roughly 70% of expected votes counted, unofficial statewide results have Becerra and Hilton occupying the top two spots with Steyer in third, which is directionally consistent with the models that favored a Becerra + Hilton runoff while keeping the exact order provisional.

Question 3: Who makes the LA Mayor runoff with Bass — Raman or Pratt?

This is the question everyone is watching but the models saw the same messy picture:

Discoverability: Bass > Raman ≈ Pratt
Credibility: Bass > Raman > Pratt
Momentum: Pratt’s virality versus Raman’s coalition strength

Six models said Bass + Raman. One model said Bass + Pratt. Each camp had a coherent internal argument. In our coding, the Bass + Raman forecasts consistently pointed to her existing role as a council-member, her issue‑focused platform on housing and homelessness, and her ties to renter and progressive constituencies as reasons she might be better aligned with the kind of Angelenos who actually show up for a municipal primary. To be precise, this is a statement about how our models interpreted public signals around the race—not a direct claim about the electorate itself.

In our analysis, the lone Bass + Pratt model leaned into Momentum and a “don’t underrate outsider anger” thesis: if turnout skews toward disengaged but angry voters, online virality might translate into ballots more directly than traditional models expect. What’s interesting is polling from a May UC Berkeley Institute of Governmental Studies / LA Times poll before the primary already showed deep dissatisfaction: in Bass–Raman and Bass–Pratt head‑to‑head runoff scenarios, a sizable chunk of likely voters told UC Berkeley–LA Times pollsters they would choose ‘neither’ or would not vote at all.

In our own sentiment scan of social and comment‑thread conversations, we saw the same pattern: people who don’t want an incumbent they blame for the status quo, but who are also uneasy about handing the city to a reality‑TV outsider. The vote is still being counted. We already know Bass advances. We do not yet know whether Raman or Pratt joins her.

Provisional grade: “Did the models correctly call the second runoff spot?” Not yet gradable. We can evaluate their arguments but not their accuracy.

Updated as June 7, 2026

As of June 6, about 78% of expected votes are in, with Bass well ahead and Pratt holding a narrow lead over Raman for the second runoff slot; the race for #2 remains too close to grade.

Updated as June 8, 2026

As of June 7, Los Angeles County’s unofficial results show Bass well ahead in first, with Raman and Pratt effectively tied for the second runoff slot—Raman at 27.1% and Pratt at 26.7%, a margin of just over 3,000 votes. The race for #2 remains too close to grade.

What this already tells us about the framework

When Discoverability and Momentum pull against Credibility, models split. The “Pratt in the runoff” prediction wasn’t a hallucination. It’s a deliberate bet that online Momentum plus outsider appeal can overpower conventional Credibility signals in a low-turnout race.

The “Raman in the runoff” camp is saying something different: virality cannot fully compensate for being less embedded in LA’s civic and political infrastructure. The final canvass will determine which weighting was smarter.

But the disagreement itself is the finding because it maps directly onto a question brands face every day: when does a viral moment translate into durable market position, and when does it stay a moment?

Question 4: Would Hilton or Steyer join Becerra in the November runoff?

Every model agreed Becerra would advance. They split on whether Hilton or Steyer would claim the second slot. Unofficial results (100% reporting) now give us a clearer picture: Hilton finishes first at 27.6%, Becerra second at 25.5%, and Steyer third at 19.7%. Although ballots are still being counted, so nothing is set in stone just yet.

That means four of seven models: Claude Anthropic, Perplexity, DeepSeek, and Gemini, all correctly predicted the Hilton + Becerra runoff pair. Two models (Claude Co-Work and ChatGPT) predicted Becerra + Steyer, which did not hold. Grok called it a toss-up.

Provisional grade: “Did the models correctly predict the runoff pair?” Four of seven — yes. Three — no. The pair (Hilton + Becerra) is provisionally confirmed by unofficial results; final certification pending.

What this tells us: This is where the distinction between “got the pair right” and “got the rank right” matters. The four models that predicted Hilton + Becerra were correct on the candidates but all seven models predicted Becerra would lead the field, and he isn’t.

That’s a meaningful calibration gap. The models read Becerra’s institutional Credibility as a first-place signal when it was more accurately a floor signal: strong enough to guarantee a runoff spot, not strong enough to predict vote-share rank in a competitive field.

Steyer’s underperformance (19.7% versus the models treating him as a serious #2 threat) also reinforces the point the framework was designed to make: paid Discoverability as Steyer spent approximately $195 million, give or take, does not automatically convert into votes. We’ll know in July which signal was more predictive.

The Order Matters

There is a clean story inside the Governor’s race that deserves its own line: four of seven models correctly predicted the Hilton + Becerra runoff pair. They got the candidates right. What they got wrong was the order.

Every model predicted Becerra would lead the raw vote and Hilton actually did, as of today. That distinction, between correctly identifying who advances and correctly predicting who leads, is exactly the kind of precision gap the B3 Visibility framework is built to surface. In a market context, this translates directly: knowing your competitor will be in the room is not the same as knowing they will dominate it. Both calls matter, and they require different signals to make.

The Big Question Mark: Measure ER

If the candidate races test who advances, Measure ER tests whether anything passes at all. And here, the models genuinely disagreed.

The NO camp argued:

Low-salience ballot drop-off tends to hurt complex, technical measures
General voter skepticism toward tax increases was measurable in the data
Historically, funding measures with detailed implementation language underperform their pre-election polling

The YES camp argued:

The state of LA’s safety-net healthcare infrastructure created real urgency
A dense endorsement coalition (hospital systems, unions, community clinics) provided strong Credibility signals
The healthcare framing — “a public hospital closes without this” — was visceral enough to cut through generic anti-tax sentiment

Early returns show Measure ER trailing, which leans toward the NO-first camp. But the count is still in progress, and late-arriving ballots have historically moved results on county measures just like this one.

Provisional grade: “Did the models correctly call pass or fail?” Unclear and ungraded.

Why this is the most important test for the framework: Measure ER pushes on every assumption the Discoverability / Credibility / Momentum structure makes.

Discoverability for a low-profile county measure is inherently uneven. Many voters only encounter it in the voting booth.
Credibility is mediated through proxies, like endorsement slates, local organizations, and trusted community voices, which are harder for models to read from public online signals alone.
Momentum can be deeply localized: church bulletins, neighborhood forums, union phone trees, none of which surfaces cleanly in the data we fed the models. Particularly if they can be manipulated as seen in this subreddit thread. Let’s be real, peptides and biohacking are not the only topics being gamed.

Measure ER is where the framework runs into the genuine messiness of democracy. That’s not a failure of the experiment. It’s the most honest finding in it.

Act One Takeaway: This Is About Behavior, Not Bragging Rights

If what you want is a scoreboard, come back after certification and we’ll show you who called it.

But the more important questions are already visible before the races are officially called and they’re the same questions that matter when you’re using AI to read a market or industry, not a ballot:

When do models trust institutional strength, and when do they bet on backlash or novelty?
How do they behave when online visibility and likely buyer (or voter) behavior point in different directions?
Where do guardrails and refusals kick in and how much prompt engineering is required to get a model to engage with a genuinely contested question at all?

However, there is a fourth question lurking under all of this: How much of what the models ‘see’ is genuinely organic?

In 2026, Reddit isn’t just a window into public sentiment; it’s become an explicit target for companies and campaigns trying to manipulate what AI systems say, as recent reporting has documented. In this experiment, we deliberately treated Reddit and similar forums as part of the Momentum signal while recognizing that any model leaning too hard on those channels is vulnerable to whatever the most aggressive actors decide to post next.

That’s why we separated Discoverability, Credibility, and Momentum instead of dumping everything into ‘what’s popular online.’

If your model over‑weights Momentum in spaces that are easy to game, you’re not measuring sentiment, you’re measuring who bought the loudest megaphone.

Act One was the methodology story. Act Two, which will come after California certifies, will be the full grade sheet:

Which models were right for the right reasons
Which were right for the wrong reasons (lucky noise, not signal)
Where the Discoverability / Credibility / Momentum framework itself needs to evolve
What inputs could potentially be compromised

From Ballots to Brands

The diagnostic our team applied to these races is similar to the one we use when a brand wants to understand how a new buyer actually encounters them today before they’ve ever heard of your pitch, visited the website, or talked to a salesperson.

If you’re curious what an outside-in visibility audit looks like applied to your category, that’s what the B3 Brand Visibility Diagnostic does.

Request the Diagnostic →

Or reach out directly and we’ll walk you through how we’d apply the framework to your brand. Act Two drops after California certifies the results. Stay tuned!

If AI is becoming part of how your team reads public signals, then the issue is no longer access to the tool. The issue is whether anyone is interpreting the output well enough to act on it.
That is the reason the B3 Brand Visibility Diagnostic exists. It applies the same outside-in logic to brands instead of ballots, showing where a company is easy to find, where trust is thin, where momentum is rising or fading, and where ambiguity is being mistaken for clarity. If you want to know how a new buyer actually encounters your brand right now, that is the starting point:

Brand Visibility Diagnostic

For one week only, the Diagnostic is $99; after that, it returns to its regular $599 price. After the primary results are in, the model scorecard will show what these systems got right. Before then, the more immediate question is whether the public signal around your brand or business is strong, weak, or just being misread.
The B3 Brand Visibility Diagnostic shows where your brand or business is easy to find, where trust is thin, and where ambiguity is being mistaken for clarity.
For one week only: $99. Then it returns to $599.
Get the Brand Visibility Diagnostic