Audits in Cohort
57 Requested
AIEthos Research | LLM Semantic Case Study
By AIEthos LLC Research | May 11, 2026
We pulled 48 subscription-audit reports and ran cross-model sentiment and semantic-gap analysis across OpenAI, Claude, and Gemini. What we found exposes a major blind spot in single-model GEO benchmarking.
Audits in Cohort
57 Requested
Reports Analyzed
48 of 51 Completed
Full 3-Model Agreement
20.8% (10 / 48)
Semantic Gap Rows
235 across 46 Reports
This analysis used a first-party subscription-audit cohort with 57 requested cases, 51 completed cases, and 48 analyzable reports after quality filtering. The final sample was transformed into a standardized analysis dataset that enabled cross-model comparison of sentiment outcomes and semantic-gap outcomes at the report level.
Each model was asked the same brand - yet the three distributions are strikingly different. OpenAI and Gemini skew strongly positive; Claude trends neutral.
The vocabulary each model uses to describe brand tone reveals fundamentally different interpretive frames. Claude focuses on technical infrastructure signals; OpenAI and Gemini describe brand persona.
38 of 48 reports (79%) had at least one model disagreement. The most common pattern: OpenAI and Gemini classify positive while Claude returns neutral, including in stronger-readiness samples.
| Industry | Sample | GEO Score | OpenAI | Claude | Gemini |
|---|---|---|---|---|---|
| Automotive | Sample A | 48 | positive | neutral | positive |
| Automotive | Sample B | 42 | positive | neutral | positive |
| Automotive | Sample C | 69 | positive | neutral | positive |
| Technology Services | Sample A | 66 | positive | neutral | positive |
| Consumer Electronics | Sample A | 71 | positive | neutral | neutral |
| Enterprise Technology | Sample A | 75 | positive | neutral | positive |
| Enterprise Technology | Sample B | 70 | positive | neutral | neutral |
| Healthcare | Sample A | 58 | positive | neutral | positive |
| Healthcare | Sample B | 60 | positive | neutral | positive |
| Healthcare | Sample C | 55 | positive | neutral | positive |
Re-analysis of 235 semantic-gap rows across 46 reports shows a strong concentration in Structural gaps, followed by Entity/Definition and Intent/Context categories.
| Category | Core Question | Primary Challenge | Rows | Share | Reports | Avg Gap |
|---|---|---|---|---|---|---|
| Intent/Context | Why are they asking? | Missing the underlying goal or hidden sub-questions. | 19 | 8.1% | 19 | 30.42 |
| Entity/Definition | What are we talking about? | Conflicting definitions of the same term across teams. | 34 | 14.5% | 26 | 36.18 |
| Sensory/Representation | How does it look/feel? | Translating raw signals (pixels/sound) into meaning. | 9 | 3.8% | 9 | 33.33 |
| Structural | Where is the data? | Technical layers that fail to connect related concepts. | 171 | 72.8% | 46 | 46.98 |
| Linguistic | How do we say it? | Moving from fuzzy human talk to rigid code. | 2 | 0.9% | 2 | 27.5 |
Four actionable takeaways from analyzing 48 enterprise brands across three LLMs.
OpenAI and Gemini classified 96% and 92% of brands as positive respectively, while Claude gave 77% a neutral rating. A team relying on one model alone could draw completely opposite strategic conclusions about the same brand.
Claude's top tone terms - structured content, llms.txt, JSON-LD, robots.txt - reveal that it interprets tone through a technical infrastructure lens rather than a brand persona lens. This makes Claude valuable for surfacing machine-readability risk, not sentiment alone.
Structural gaps dominate the semantic-gap landscape, representing 72.8% of all recurring issues. With an average gap of 47, these machine-readable authority and technical-readiness problems persist even in brands that all three models rate as positive - showing that positive sentiment does not imply citation-readiness.
Only 10 of 48 brands achieved full three-model agreement. Enterprise GEO strategy should target consensus across model families as a performance bar, not optimize for a high score from a single preferred model.
An AIEthos LLC audit generates a full three-model semantic profile - so you can see where OpenAI, Claude, and Gemini agree, where they diverge, and which structural gaps are hiding behind a positive headline score.
Data Note - This study is based on first-party AIEthos LLC subscription audit report data from securely stored internal systems. The requested cohort was 57 audits; 51 were completed at extraction time and 48 reports were fully analyzable. All three model sentiment outputs (OpenAI, Claude, Gemini) were present in the analyzed set. The sample is not statistically random. Findings reflect brand state at audit time and may not represent current deployments.