Quick answer. Across the AI search audits we run on B2B SaaS, ecommerce, and luxury brands, the same 12 GEO mistakes appear over and over, and they’re the ones costing brands AI citations the most. They cluster into four categories: entity authority failures (the “you don’t exist to AI” mistakes), content extraction failures (the “AI can’t read you” mistakes), crawler/discovery failures (the “AI can’t find you” mistakes), and signal/trust failures (the “AI doesn’t trust you” mistakes). Each mistake below is named, illustrated with what it actually looks like on real sites, backed by why it kills AI citation, paired with a detection method, and ended with a tactical fix. This is the canonical reference. Going forward, we’ll deep-dive on individual mistakes in a monthly series with anonymized real-world examples.

How we use this framework
Category A: Entity authority failures
Category B: Content extraction failures
Category C: Crawler / discovery failures
Category D: Signal / trust failures
- #12, Zero third-party editorial coverage
Cross-cutting patterns
Self-audit checklist
FAQ

How we use this framework

A short note on what this is and isn’t.

What it is: the canonical list of mistakes we screen for in every generative engine optimization audit Resocial runs. We built it iteratively across SaaS, ecommerce, and luxury client engagements, the patterns recur. The list is calibrated against the public research consensus from Profound (4 billion AI citations), Semrush (248K Reddit posts), and our internal citation-pattern research.

What it isn’t: a single audit script you run once. The 12 mistakes don’t have equal weight on every site. A B2B SaaS company with strong brand recognition fails mostly on Categories B and C. A young consumer brand fails mostly on A. A technically-strong dev-tool brand fails mostly on D. The framework’s value is the disambiguation, knowing which category your specific gaps fall into, then prioritizing.

Methodology disclaimer: we don’t ship audits in the hundreds yet. This is the framework, not a quantitative survey. Where we cite “across audits” below, the meaning is “this is a recurring pattern in the audits we have run + corroborated in the public research”, not “we measured 500 audits.” When we have larger quantitative data, we’ll publish it as a separate research piece. The frame is honest editorial pattern recognition, not bench science.

The series continues: each month we’ll deep-dive on one of the 12 with a fresh anonymized example, before/after data, and the implementation steps.

Category A: Entity authority failures

This is the most common failure cluster for B2B brands trying to enter AI citations. The brand exists in Google’s index, but it doesn’t exist as a recognized entity in the knowledge graph that AI search engines use to disambiguate references. If ChatGPT doesn’t know who you are as an entity, it won’t cite you, even when your content is technically better than what gets cited.

#1, No stable Organization @id

What it looks like: the site has schema.org Organization markup, but each page either omits the @id field or uses a different value per page. Sometimes the @id is the canonical URL of the current page (which means it changes per page) instead of a stable site-wide identifier.

Why it kills AI citation: AI engines build internal entity graphs by linking schema fragments together via stable identifiers. A stable @id lets the engine recognize that the Organization on the homepage is the same Organization referenced on a blog post, a press release, and a LinkedIn profile. Without it, every reference creates a fresh, orphaned entity. The brand authority signal gets fragmented.

How to detect: open three different pages from the site in view-source. Search for "@type": "Organization". Check the @id field. If it differs across pages, or doesn’t exist, you have the mistake.

The fix: pick one canonical URL fragment for the Organization (we use https://yoursite.com/#organization, the hash signals it’s not a real page). Use that exact @id value on every page that emits Organization schema. Reference it everywhere, Service schemas, Article schemas, Person schemas, as "provider": { "@id": "https://yoursite.com/#organization" }. We cover the pattern in detail in our Schema Markup Complete Guide.

#2, Thin sameAs array

What it looks like: schema.org Organization markup exists, but the sameAs array is missing entirely, or contains 1–3 obvious entries (homepage Twitter, LinkedIn). It’s missing GitHub, Crunchbase, G2/Capterra, Wikipedia, Wikidata, conference speaker pages, podcast guest appearances, AngelList, Glassdoor.

Why it kills AI citation: sameAs is how AI engines disambiguate “Acme Corp the SaaS company” from “Acme Corp the tire manufacturer”. A thin sameAs leaves the brand entity weakly connected to the broader web graph. Engines preferentially cite entities with rich sameAs networks because the disambiguation risk is lower.

How to detect: view source on any service or about page. Look for the sameAs array inside Organization schema. Count entries. Fewer than 6 is thin. Fewer than 3 is critical.

The fix: build the sameAs array systematically. Target 8–12 entries minimum. Include: LinkedIn company page, Twitter/X, GitHub (if you have any open source), Crunchbase, G2 listing, Capterra listing, AngelList, Wikipedia entry (if exists), Wikidata Q-number (more on this below), industry-specific directories (e.g., Built In, Slashdot, Product Hunt). Every entry must be a real page that actually mentions you. Submit them to your developer to add to the Organization schema with stable @id.

#3, No Wikidata presence

What it looks like: no Wikidata Q-number exists for the brand. Sometimes there’s a placeholder entry that was auto-created from a single news mention, but it’s thin (just brand name + URL, no industry classification, no founder linkage, no key claims).

Why it kills AI citation: this is the single biggest entity-authority killer for B2B brands trying to enter AI citations. Wikipedia/Wikidata feed directly into the training data and grounding services for ChatGPT, Claude, Perplexity, and Google AI Overviews. A brand without a Wikidata entry is structurally invisible to the entity disambiguation pipeline that AI engines run on every query.

How to detect: search wikidata.org/wiki/Special:Search for your brand. If no result, you have no entity. If thin result, see the next paragraph.

The fix: create a Wikidata entry. This is a 30–60 minute task by anyone willing to learn the Wikidata interface. Required minimum fields: instance of (Q4830453 “business” or more specific), industry (P452), country (P17), inception (P571), founder(s) (P112) linked to their own Person entries, official website (P856). Add sameAs links back to your site’s Organization schema. The single highest-leverage 1-hour investment any B2B brand can make in AI search visibility today.

#4, Missing Person schema for founders/team

What it looks like: the team or about page lists founders/key team with photos and bios, but no Person schema. Or there’s Person schema but the knowsAbout array is missing, alumniOf is missing, sameAs to LinkedIn/Twitter/Speaker bio is missing.

Why it kills AI citation: AI engines build attribution chains. When the engine reads a piece of content authored by “Christos at Resocial, ” it tries to disambiguate: who is Christos, what’s his expertise area, is he authoritative on this topic? Without Person schema for the named team, the attribution chain breaks. Content gets cited from your domain at lower rates than equivalent content from sites with full author entity graphs.

How to detect: view source on your team/about page. Search for "@type": "Person". Check that each named team member has Person markup with: name, jobTitle, worksFor linked to Organization, sameAs to LinkedIn, knowsAbout array of expertise areas, description paragraph.

The fix: add Person schema with full attribution chain on every page where senior team members are named or quoted. Tie each Person to the Organization via worksFor. Reference Persons in Article schema as author. This builds the authorship graph AI engines reward.

Category B: Content extraction failures

Even with full entity authority, content needs to be structurally extractable. AI engines paraphrase content at 0.53 similarity (Semrush, 248K Reddit posts), they extract concepts, not sentences. Content optimized for human narrative often fails extraction tests entirely.

#5, Preamble before the answer

What it looks like: page opens with “In this article, we’ll explore…” or “When clients ask us about X, we often find…” or anecdote-driven introduction. The actual definitional answer to the page’s topic appears 2–4 paragraphs in.

Why it kills AI citation: AI extraction is greedy for the first definitional sentence. The pattern "X is Y..." near the top of the page is the canonical extraction target. Preamble pushes the definition below the fold of AI’s effective context window. Result: the page can be the most authoritative on the topic, but the AI engine extracts the definition from a competing site that put it first.

How to detect: open the page. Read the first 60 words. Does it directly answer “what is X?” or describe what the article is about? If the latter, you have the mistake.

The fix: rewrite the page opening to a definitional sentence. “Local SEO is the discipline of optimizing for geographically-bound search queries. It includes Google Business Profile, NAP consistency, LocalBusiness schema, and local citation building.” That structure, direct definition + 2–3 key components, is what AI engines extract. Move the anecdotal preamble below a Quick Answer Block.

#6, No Quick Answer Block

What it looks like: page has long-form content, but no visually-distinct 40–80 word summary block at the top that directly answers the page’s primary query. The content has the answer somewhere in the middle but no extractable summary.

Why it kills AI citation: AI engines preferentially extract from short, dense, structured paragraphs. A Quick Answer Block (the qab pattern visible at the top of every Resocial post, including this one) creates a single high-density extraction target. Without it, AI engines either skip the page or extract from less canonical sentences scattered through the body.

How to detect: open any priority page (service page, top blog post, glossary entry). Is there a distinct opening summary block of 40–80 words that directly answers the page’s topic? If not, you have the mistake.

The fix: add a Quick Answer Block as the first content element on every priority page. 40–80 words. Direct answer. Bold the key terms. Wrap in a <div class="qab"> or equivalent for visual distinctiveness. This is one of the highest-leverage single edits any site can make for AI citation. We cover the implementation in our Generative Engine Optimization service.

#7, FAQ schema missing or malformed

What it looks like: page has an FAQ section visually (3–7 Q&A pairs), but no FAQPage schema markup. Or schema markup exists but the questions are awkwardly phrased (“Here are some questions you might have:”) instead of matching real user query patterns.

Why it kills AI citation: FAQPage schema is the single most reliable eligibility signal for AI Overviews. Without it, AI Overviews and answer engines extract Q&A from sites that have it, even when your content is better. With it, you become a preferred extraction target. Malformed questions (vague, marketing-toned) miss query-match opportunities.

How to detect: open the page in view-source. Search for "@type": "FAQPage". If absent on pages with FAQ content, you have the mistake. If present, audit the questions: do they match real user search phrasing? Or do they read like agency-written headers?

The fix: add FAQPage schema with 3–7 Q&A pairs per priority page. Phrase the questions in the way users actually search (start with “What”, “How”, “Why”, “When”, “Can”). Keep answers under 80 words each, direct. Mirror the visual FAQ on the page exactly. Detailed guidance in our Schema Markup Complete Guide.

#8, All-prose content with no extractable structure

What it looks like: page is 2, 000 words of flowing prose. No headings beyond H1 and a handful of H2s. No bullet lists. No comparison tables. No code blocks. No definitional callouts. Reads like an essay.

Why it kills AI citation: AI engines extract from structured units: bullet lists (each bullet becomes a potential citation), comparison tables (each row becomes a potential answer), headings (each H2/H3 is a potential question hook), code blocks (treated as authoritative reference). Prose-only content has fewer extraction surfaces. Result: lower citation rate per word.

How to detect: open the page. Count: heading hierarchy levels, bullet/numbered lists, comparison tables, definitional callout blocks. A 2, 000-word article should have at least 8 H2/H3 headings, 3+ structured lists, 1+ table or comparison block. Fewer than that = the mistake.

The fix: rewrite long-form content with extraction surfaces. Break sections with H2/H3 every 200–300 words. Convert “first… second… third…” prose into numbered lists. Convert “X is better at A while Y is better at B” prose into comparison tables. Add definitional callouts (our qab pattern works in mid-article too). Each structural element creates an additional extraction target.

Category C: Crawler / discovery failures

You can have perfect entity authority and perfect content extractability, but if AI crawlers can’t reach the content, none of it matters. This category is where many technically-strong sites unknowingly cap their AI search ceiling.

#9, robots.txt blocks AI crawlers

What it looks like: robots.txt either has explicit Disallow: / for GPTBot/ClaudeBot/PerplexityBot/anthropic-ai/Google-Extended, OR it doesn’t mention them (default-deny behavior is rare but happens with restrictive WAFs).

Why it kills AI citation: AI crawlers respect robots.txt. If you’ve blocked them, sometimes accidentally, copied from outdated “block AI” guidance from 2023, your content can’t be crawled, can’t be cited, can’t appear in any AI search engine that uses live retrieval.

How to detect: fetch yoursite.com/robots.txt. Search for GPTBot, OAI-SearchBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, CCBot. If any have Disallow: / (or your site appears under a generic User-agent: * with full disallow), you have the mistake.

The fix: explicitly allow the AI crawlers you want indexing your content. Our resocial.us robots.txt has explicit Allow: / for all 11 major AI crawlers, plus generic Allow: / for User-agent: *. Document the policy with comments so future you knows why it’s permissive. Strategic note: if you genuinely don’t want training-data scraping, block GPTBot/anthropic-ai/CCBot but allow OAI-SearchBot/ClaudeBot/PerplexityBot (the live-retrieval crawlers). Most brands want both.

#10, No llms.txt at site root

What it looks like: yoursite.com/llms.txt returns 404. The brand has no machine-readable site index for AI crawlers.

Why it kills AI citation: llms.txt is the emerging standard for letting AI crawlers efficiently discover your most important content. Without it, AI crawlers have to crawl the entire site to figure out priority, which means less compute is spent on extracting from your canonical pages. With it, you direct crawler attention to your highest-leverage content.

How to detect: fetch yoursite.com/llms.txt. If 404, you have the mistake.

The fix: create llms.txt at site root following the emerging spec. At minimum, list your priority service pages, top 10–20 blog posts, and core glossary entries with descriptive labels. Add llms-full.txt for deeper content. This is a 1–2 hour build once, then maintained as content evolves. Resocial’s llms.txt is at resocial.us/llms.txt as a reference implementation.

#11, JS-rendered content invisible to AI crawlers

What it looks like: the site is built on a JavaScript framework (React, Vue, Angular, Next.js, Nuxt, Remix, Astro). Content is rendered client-side. View source shows minimal HTML; the actual content appears only after JS executes.

Why it kills AI citation: not all AI crawlers execute JavaScript. ChatGPT-User runs limited JS. PerplexityBot has improved but is inconsistent. ClaudeBot has variable behavior. The result: content visible to users and to Googlebot is invisible to a non-trivial fraction of AI crawlers. Your AI citation ceiling is capped at the level that the most JS-capable crawler reaches.

How to detect: in Chrome DevTools, disable JavaScript (Settings → Preferences → Debugger → Disable JavaScript). Reload your priority pages. Is the content still visible? If not, you have the mistake.

The fix: deploy server-side rendering (SSR) or static site generation (SSG) for priority content. Astro, Next.js with getStaticProps, Nuxt with nuxt generate, Remix’s loader pattern, all valid. The principle: AI crawlers should see your content in the initial HTML response, not after a JS bundle executes. We chose Astro for resocial.us specifically because its static-by-default model eliminates this entire failure mode. For an existing dynamic site, the migration path is incremental, start with your top 20 pages and expand.

Category D: Signal / trust failures

The hardest category to fix, because it depends on external editorial decisions outside your control. But it’s also the category that separates “technically optimized for AI” from “actually cited by AI.”

#12, Zero third-party editorial coverage

What it looks like: the brand has no coverage in tier-1 industry publications. No Search Engine Land mention, no G2 listing, no Capterra review aggregation, no Forbes/Inc/Fast Company feature, no podcast appearance, no conference speaker billing, no industry-analyst report inclusion. The brand exists only on its own domain + LinkedIn.

Why it kills AI citation: AI engines reward triangulated trust. A brand cited only by itself is structurally weaker than a brand cited by 5 independent editorial sources. ChatGPT in particular leans heavily on Wikipedia + tier-1 editorial when building entity confidence. No editorial = the brand exists in AI’s mind as “claimed but unverified.”

How to detect: search "[your brand name]" on Search Engine Land, Search Engine Journal, MOZ blog, G2, Capterra, Forbes/Inc/Fast Company, your industry’s top podcast directory. If you have fewer than 5 independent mentions across these surfaces, you have the mistake.

The fix: this is a 6–12 month digital PR initiative, not a code change. Tactics: HARO/Qwoted expert quotes (build a habit of responding to journalist queries within the hour), original research (one piece per quarter, sized to attract press), conference speaking submissions (apply to 8–10 per year), industry awards (apply to 3–5 per year), guest articles on top SEO/marketing blogs (target 1 per quarter), podcast appearances (warm-pitch 5 per quarter). Resocial’s Link Building & Digital PR service runs this program for clients; the Link Building Complete Guide covers the playbook in depth.

Cross-cutting patterns

If we step back from the individual 12, three meta-patterns emerge across audits.

Pattern 1, The order matters. Brands that fix Category A first, then B, then C, then D get the fastest AI citation lift. Categories C and D won’t compound until A and B are solid. We routinely see brands invest in expensive digital PR (Category D) without fixing schema (Category A) and wonder why the editorial mentions aren’t producing AI citations. The editorial coverage works only when the brand entity is recognizable.

Pattern 2, The 80/20 is Category B for B2B SaaS. For B2B SaaS specifically, content extraction failures (Category B) account for the largest single citation gap. Most B2B SaaS brands have decent schema and decent crawlability, they fail on Quick Answer Blocks, FAQ schema, definitional openings. Fixing all four Category B mistakes typically produces a 2–4× AI citation lift in 90 days.

Pattern 3, Wikidata is the single highest-ROI hour in GEO. Of all 12 mistakes, fixing #3 (no Wikidata presence) is the single fix that produces the most measurable AI citation lift relative to time invested. A well-constructed Wikidata entry with full sameAs linkage to your owned properties typically begins compounding within 4–6 weeks across all major AI engines. We rarely see anything else in this list with a higher impact-to-effort ratio.

Self-audit checklist

For readers who want to run a quick self-audit on their own site, here’s the 12-question version of the framework. Each one answerable in under 5 minutes.

Pick three pages. View source. Is "@id" for Organization the same value on all three?
View source. Count entries in sameAs for Organization. Is it 8+?
Search Wikidata for your brand. Is there an entry with full property completion?
View source on your team page. Does each named senior team member have Person schema with knowsAbout, sameAs, worksFor?
Open your highest-traffic page. Read the first 60 words. Does it start with “X is Y…” or similar definitional phrasing?
Same page. Is there a visually distinct 40–80 word Quick Answer Block at the top?
Open a page with an FAQ section. View source. Does FAQPage schema exist? Are questions phrased as users would search?
Open a long-form post. Count headings, lists, tables. Is the page broken into 8+ structural units?
Fetch your robots.txt. Search for AI crawler names. Are GPTBot, ClaudeBot, PerplexityBot, etc. allowed?
Fetch yoursite.com/llms.txt. Does it return 200 with a valid index?
Disable JavaScript in Chrome DevTools, reload priority pages. Is content still visible?
Search your brand name on Search Engine Land, G2, Capterra, top 3 industry blogs. 5+ independent mentions?

Score: 12/12 = excellent foundation. 8–11 = strong, gaps fixable in 30–60 days. 4–7 = significant exposure, needs structured remediation program. 0–3 = the GEO program is not yet started; treat this list as the entry-level checklist.

FAQ

Are all 12 mistakes equally important? No. The categories matter in order: A (entity authority) → B (content extraction) → C (crawler discovery) → D (signal trust). Fixing D before A is wasted effort because editorial coverage of an unrecognizable entity doesn’t compound. Within Category A, mistake #3 (Wikidata) typically has the highest single-fix impact for B2B brands.

How long does it take to fix all 12? Categories A and B are typically 30–60 days of focused work for a small team. Category C is 1–7 days of technical work (or 30–90 days if you need an SSR migration). Category D is the long pole, 6–12 months of digital PR with consistent execution. Most brands ship a meaningful fix to 8 of 12 in their first quarter.

Will fixing these guarantee AI citations? No. They establish the eligibility for citations, they don’t guarantee the engine will choose your content. AI search engines have model-internal preferences that no amount of optimization fully predicts. But brands that fix all 12 ship from “structurally invisible to AI” to “structurally eligible.” That eligibility is the precondition for everything else.

How is this different from the AI Search Optimization Complete Guide? The Complete Guide is the broader framework, what GEO is, why it matters, how it relates to traditional SEO, the seven service tracks underneath. This piece is the focused audit checklist. The Guide is theory + strategy. This is detection + fixes.

Will you publish individual deep-dives? Yes. This piece is the canonical reference. Going forward, each month we’ll deep-dive on one of the 12 with an anonymized real-world example, the specific implementation steps, and before/after data where we can share it. The series will live under the audit tag on /blog/.

If you’d like to skip the self-audit and have us run the full 12-point framework on your site, our generative engine optimization service starts with this audit as the first 90-minute conversation. For brands at the entity-authority stage, the ChatGPT visibility service covers categories A and B in depth. For brands at the crawler-discovery stage, the Technical SEO service handles category C, and our llms.txt setup guide covers the discovery layer specifically.

For the broader context on AI search as a discipline, see the Complete Guide to AI Search Optimization in 2026 and The Agentic SEO Operating Model for how Resocial runs audits at scale using our 25-agent workforce. For brands operating outside US English, the AI Search Outside the US companion piece covers how the same 12 mistakes interact with regional engines (Baidu Ernie, Yandex Neuro, LeChat, Naver Cue:).

, Yuki & Petros, AI Search & Technical SEO leads

The 12 Most Common GEO Mistakes We See in Live Audits

Table of contents