Audits in Cohort

57 Requested

Reports Analyzed

48 of 51 Completed

Full 3-Model Agreement

20.8% (10 / 48)

Semantic Gap Rows

235 across 46 Reports

The Data

This analysis used a first-party subscription-audit cohort with 57 requested cases, 51 completed cases, and 48 analyzable reports after quality filtering. The final sample was transformed into a standardized analysis dataset that enabled cross-model comparison of sentiment outcomes and semantic-gap outcomes at the report level.

Method Snapshot

- Reports were pulled from securely stored internal reporting artifacts and mapped to a consistent analysis shape.
- Records that failed completeness checks were excluded prior to metric computation.
- The transformed dataset was used to compare sentiment classifications across OpenAI, Claude, and Gemini.
- The same dataset supported semantic-gap analysis using normalized topic and gap representations.
- Outputs were aggregated at cohort level for agreement rates, divergence patterns, and supported topic summaries.

Sentiment Distribution by Model

Each model was asked the same brand - yet the three distributions are strikingly different. OpenAI and Gemini skew strongly positive; Claude trends neutral.

OpenAI

Positive46 (96%)

Neutral2 (4%)

Negative0 (0%)

Claude

Positive10 (21%)

Neutral37 (77%)

Negative1 (2%)

Gemini

Positive44 (92%)

Neutral4 (8%)

Negative0 (0%)

Top Tone Terms by Model

The vocabulary each model uses to describe brand tone reveals fundamentally different interpretive frames. Claude focuses on technical infrastructure signals; OpenAI and Gemini describe brand persona.

OpenAI

corporate ├ù32mission-driven ├ù17professional ├ù11authoritative ├ù7innovation-focused ├ù6sustainability-oriented ├ù5purpose-driven ├ù4trust-oriented ├ù4

Claude

structured content ├ù91llms.txt ├ù67JSON-LD ├ù59robots.txt ├ù56schema ├ù53signals ├ù43sitemap.xml ├ù38crawlers ├ù36

Gemini

authoritative ├ù43mission-driven ├ù18corporate ├ù16structured ├ù16sustainability ├ù5governance ├ù5transparency ├ù5institutional ├ù5

Sentiment Divergence Examples

38 of 48 reports (79%) had at least one model disagreement. The most common pattern: OpenAI and Gemini classify positive while Claude returns neutral, including in stronger-readiness samples.

Industry	Sample	GEO Score	OpenAI	Claude	Gemini
Automotive	Sample A	48	positive	neutral	positive
Automotive	Sample B	42	positive	neutral	positive
Automotive	Sample C	69	positive	neutral	positive
Technology Services	Sample A	66	positive	neutral	positive
Consumer Electronics	Sample A	71	positive	neutral	neutral
Enterprise Technology	Sample A	75	positive	neutral	positive
Enterprise Technology	Sample B	70	positive	neutral	neutral
Healthcare	Sample A	58	positive	neutral	positive
Healthcare	Sample B	60	positive	neutral	positive
Healthcare	Sample C	55	positive	neutral	positive

Semantic Gap Categories

Re-analysis of 235 semantic-gap rows across 46 reports shows a strong concentration in Structural gaps, followed by Entity/Definition and Intent/Context categories.

Category	Core Question	Primary Challenge	Rows	Share	Reports	Avg Gap
Intent/Context	Why are they asking?	Missing the underlying goal or hidden sub-questions.	19	8.1%	19	30.42
Entity/Definition	What are we talking about?	Conflicting definitions of the same term across teams.	34	14.5%	26	36.18
Sensory/Representation	How does it look/feel?	Translating raw signals (pixels/sound) into meaning.	9	3.8%	9	33.33
Structural	Where is the data?	Technical layers that fail to connect related concepts.	171	72.8%	46	46.98
Linguistic	How do we say it?	Moving from fuzzy human talk to rigid code.	2	0.9%	2	27.5

Lessons Learned

Four actionable takeaways from analyzing 48 enterprise brands across three LLMs.

01

Single-model sentiment scores are not a reliable GEO signal

OpenAI and Gemini classified 96% and 92% of brands as positive respectively, while Claude gave 77% a neutral rating. A team relying on one model alone could draw completely opposite strategic conclusions about the same brand.

02

Claude's tone vocabulary is technical, not emotional

Claude's top tone terms - structured content, llms.txt, JSON-LD, robots.txt - reveal that it interprets tone through a technical infrastructure lens rather than a brand persona lens. This makes Claude valuable for surfacing machine-readability risk, not sentiment alone.

03

Machine-readable authority gaps persist regardless of sentiment

Structural gaps dominate the semantic-gap landscape, representing 72.8% of all recurring issues. With an average gap of 47, these machine-readable authority and technical-readiness problems persist even in brands that all three models rate as positive - showing that positive sentiment does not imply citation-readiness.

04

Consensus is rare and should be the benchmark, not the exception

Only 10 of 48 brands achieved full three-model agreement. Enterprise GEO strategy should target consensus across model families as a performance bar, not optimize for a high score from a single preferred model.

Know How Three Models Read Your Brand

An AIEthos LLC audit generates a full three-model semantic profile - so you can see where OpenAI, Claude, and Gemini agree, where they diverge, and which structural gaps are hiding behind a positive headline score.

Run My GEO Audit Explore Services

Data Note - This study is based on first-party AIEthos LLC subscription audit report data from securely stored internal systems. The requested cohort was 57 audits; 51 were completed at extraction time and 48 reports were fully analyzable. All three model sentiment outputs (OpenAI, Claude, Gemini) were present in the analyzed set. The sample is not statistically random. Findings reflect brand state at audit time and may not represent current deployments.

How Three LLMs Interpret the Same 48 Enterprise Brands Differently

The Data

Method Snapshot

Sentiment Distribution by Model

OpenAI

Claude

Gemini

Top Tone Terms by Model

OpenAI

Claude

Gemini

Sentiment Divergence Examples

Semantic Gap Categories

Lessons Learned

Single-model sentiment scores are not a reliable GEO signal

Claude's tone vocabulary is technical, not emotional

Machine-readable authority gaps persist regardless of sentiment

Consensus is rare and should be the benchmark, not the exception

Know How Three Models Read Your Brand