Tech for Retail 2025 Workshop: From SEO to GEO – Gaining Visibility in the Era of Generative Engines

Back to blog

Technical GEO: Structured Data, Servers and Extractability

GEO

Discover Incremys

The 360° Next Gen SEO Platform

Request a demo
Last updated on

1/4/2026

Chapter 01

Example H2
Example H3
Example H4
Example H5
Example H6

Technical GEO in SEO: the essentials to master (as a complement to GEO referencing)

 

If you've already laid the groundwork with GEO referencing, you've understood the shift in objective: it's no longer only about "winning a click", but about becoming a source that assistants quote. Here, we zoom in on technical GEO: entity-oriented Schema.org markup, llms.txt, "AI agent" conventions, and server hygiene for AI crawlers. The aim isn't to rehash the SEO fundamentals, but to push them to the point where extraction and attribution become reliable. Most importantly: everything below must remain verifiable, maintainable, and consistent with what users can actually see on the page.

 

What technical GEO covers (and what you don't need to repeat here)

 

Technical GEO covers everything that makes your site "consumable" by generative systems: crawling, rendering, fact extraction, entity disambiguation, then attribution/citation. It builds on modern SEO (indexability, canonicalisation, performance, HTML structure), but goes further on machine-readability and data governance. It does not replace editorial strategy or external authority; it removes technical friction that prevents an AI from selecting you as a source. In practice, it comes down to simple symptoms: "Can the agent find the right canonical URL, understand what the page is about, and quote a precise passage with clear attribution?"

 

The practical goal: readable, extractable and citable in generative search

 

A generative engine doesn't "read" a page like a human. It reconstructs context from fragments and explicit signals. Your technical goal is therefore to provide stable footholds: descriptive headings, self-contained opening sentences, lists, tables, and consistent structured data. Add governance on top: one canonical page per concept, consistent authors and dates, and reusable entities via stable identifiers. In environments where "zero-click" searches are rising (60% according to Squid Impact, 2025), extractability becomes a survival KPI, not a nice-to-have.

 

Generative Engine Optimisation principles: how generative search works, rendering and citation

 

A generative system draws on data and signals, then produces a probabilistic synthesis: it doesn't "understand" in the human sense and can be wrong when data is incomplete, outdated or ambiguous. That constraint makes technical GEO very practical: reduce ambiguity, increase verifiability, and make elements citable in isolation. Incremys sources summarise this well: "AI is only as good as its data"; output quality depends directly on input quality and structure. Hence the importance of technical conventions that guide ingestion "at question time" (inference time) as well as classic crawling.

 

From crawl to citation: the technical stages that determine visibility

 

To be cited, your content has to make it through a chain of dependencies: access, rendering, understanding, extraction, then selection. The breaking points are rarely "mystical": they sit in redirects, blocking JavaScript, inconsistent canonicals, or markup that doesn't match the visible content. Here's the typical technical pipeline, with the associated levers:

  1. Discovery: internal linking, sitemaps, pages listed in llms.txt (guidance).
  2. Access: stable HTTP statuses, no WAF/CDN blocking for "unknown" user agents.
  3. Rendering: HTML contains critical content (SSR or pre-rendering), non-blocking JS.
  4. Extraction: clean headings, short paragraphs, lists/tables, "answer" blocks.
  5. Disambiguation: explicit entities and relationships via Schema.org (Organization, Article, etc.).
  6. Attribution: author, date, publisher, canonical URL, references, evidence.

 

Common constraints: JavaScript, fragmented content, access, quotas and rendering limits

 

The most common issue is the gap between what a browser sees and what an agent fetches: empty HTML, late client-side rendering, or content hidden behind interactions. Assistants may also limit crawl depth, ignore heavy resources, or reject long redirect chains. Some on-demand systems favour text/Markdown or highly structured pages, because turning complex HTML into reliable context is expensive. Your job: deliver a fast "machine-readable" version without breaking the user experience.

 

Visibility criteria in AI assistants: what technical work actually influences

 

Technical GEO mainly affects three families of criteria: clarity, stability and attribution. It has limited influence over external authority, but it determines whether a model can choose you without taking risks. In practice, it impacts:

  • Semantic clarity: a clearly identifiable main entity (organisation, product, service) with standardised attributes.
  • Stability: consistent canonicals, clean URLs (no parameters), minimal redirects.
  • Verifiability: dates, authors, sources, and figures that are contextualised and traceable.
  • Cite-worthiness: extractable passages (lists, tables, steps) with aligned Schema.org markup.

 

GEO vs SEO: what changes technically (without re-explaining the basics)

 

SEO targets a ranking position and a click; GEO targets reuse inside an answer (explicit citation, mention, consolidation of a source). The technical implications follow: your page must remain strong for Google, but also "compilable" into reliable fragments. According to Incremys data, 99% of AI Overviews cite pages from the organic top 10: your SEO foundation remains non-negotiable, but it's no longer sufficient on its own to trigger citations. Technical GEO reduces the AI's interpretation cost.

 

SEO: indexing, SERPs and classic signals vs GEO: extraction, synthesis and attribution

 

In SEO, indexing and relevance are often enough to generate clicks. In GEO, the critical stage is extraction: the agent must identify "facts" and attribute them to a source. That pushes you to optimise format (HTML structure) as much as substance (verifiable statements). It's also why tables and lists are over-represented in cited pages (80% according to Incremys sources): they reduce ambiguity and make passage selection easier.

 

Architecture impact: templates, internal linking, canonicals, duplication and entity consolidation

 

Technically, GEO doesn't tolerate silent duplication: the same content across multiple URLs, facets, parameters, archives, or language versions that aren't correctly canonicalised. The right reflex is to manage by template: "one intent = one canonical page", with satellites that are clearly connected. The other pivot is the entity: reuse the same Schema.org @id for your organisation, product or service, so systems consolidate your signals instead of scattering them. In short: fewer ambiguous URLs, more stable identifiers.

 

Measurement impact: tracking citations, source traceability and quality control

 

GEO measurement goes beyond rankings: a large share of citations have no clickable link (72% according to Incremys), so "classic" attribution underestimates impact. You need traceability: consistent authors and dates, versioning, and easy-to-identify "reference" pages. Measurement then becomes controlled testing and non-regression thinking: "Are we still extractable and citable after every release?" Google Search Console and Google Analytics remain useful, but they're not sufficient on their own to measure presence in generative answers.

 

Auditing technical GEO: diagnosis, prioritisation and an implementation plan

 

A technical GEO audit doesn't start with "mark up everything". It starts by removing the root causes of non-citation: non-indexable pages, fragile rendering, inconsistent canonicals, missing evidence. The goal is to prioritise the templates with the highest business value and citation potential. To frame the approach, you can use an AI-oriented audit, then turn findings into a testable technical backlog. Work in batches: one template, one rule, one validation.

 

Map templates and content with high citation value

 

Start with a "templates × intents" map: offer pages, documentation, comparison pages, glossaries, expert articles, proof pages (reviews, case studies). The goal isn't volume; it's centrality: which pages must become canonical sources when an agent answers? Once the list is set, adopt a simple rule: for each topic, one "reference" page and satellites that do not contradict it. This reduces the risk of an AI citing a secondary page (archive, tag, listing) instead of the master content.

 

Technical checkpoints: indexability, canonicalisation, server errors and stability

 

Before Schema.org and llms.txt, check the plumbing. Here's a short, "no surprises" technical checklist:

  • Consistent canonical URLs (rel=canonical, clean redirects, no chains).
  • Stable statuses: 200 for key pages, 404/410 for removed content, no intermittent 5xx.
  • No accidental blocking by robots.txt, WAF rules, CDN rules or geo-restrictions.
  • Content accessible without interaction (avoid client-side-only essential tabs/accordions).

 

Control extractability: HTML, headings, tables, lists and answer blocks

 

Extractability is tested in the final rendered HTML: the agent must be able to capture a definition, a step, or a condition without "guessing". To standardise, enforce patterns per template: one H2 = one idea, first sentence = self-contained answer, then a list/table for detail. Add "Key takeaways" blocks where a page contains cite-worthy points. Above all: avoid long paragraphs; 3–4 sentences max keeps granularity usable.

Element What AI extracts well Anti-pattern to avoid
Headings (H2/H3) Descriptive titles that make sense out of context Vague titles ("Good to know", "Details")
Lists Steps, criteria, specifications Marketing lists ("best", "unique") with no facts
Tables Comparisons and attributes with values Decorative tables or tables without headers
First sentence A self-contained, factual answer Abstract hooks and overly long intros

 

QA and non-regression: how to validate without getting scattered

 

Validate in two steps: (1) technical testing (HTTP, rendering, markup), (2) cite-worthiness testing (business questions, verification of quoted passages and cited URLs). Document a simple QA recipe per template and automate what you can (JSON-LD linting, HTTP 200 tests, redirect detection). The goal isn't to "optimise everywhere"; it's to prevent silent regressions after a release. Robust technical GEO looks like quality engineering: reproducible, versioned, observable.

 

GEO-specific structured data: Schema.org markup and entity signals

 

Schema.org remains one of the most concrete levers to reduce ambiguity and improve attribution—provided you treat it as an entity model (not an "SEO badge"). For priorities and testing methodology, see GEO structured data. Here, we focus on "entity-first" implementation: stable @id values, explicit relationships, and strict alignment with what's visible. This supports cite-worthiness and governance (avoiding broken or divergent markup at scale).

 

Which schemas to prioritise by page type (Article, Product, Organization, FAQPage, HowTo)

 

Pick one primary type per template, then add helpful supporting types for attribution. These recur in clean technical GEO implementations:

  • Article or BlogPosting: clarify author, publisher and dates.
  • FAQPage: make questions/answers extractable.
  • HowTo: step-by-step guides.
  • Organization: disambiguate the brand and connect official profiles.
  • Product / SoftwareApplication / Service: offer pages with verifiable attributes.

 

Align JSON-LD with visible content: avoid inconsistencies that undermine trust

 

Non-negotiable rule: JSON-LD must reflect exactly what the user can verify on the page. Marking up a review that isn't displayed, a price that isn't shown, a different date, or an author who can't be found creates distrust and can cause markup to be ignored. In GEO, this is critical: an AI will favour content it can cross-check and attribute without ambiguity. Your QA should therefore verify "visible ↔ structured" consistency on every template update.

 

Key properties for cite-worthiness: author, date, sources, entities, relationships and sameAs

 

The most useful properties aren't the ones that "look nice"; they're the ones that support attribution. Prioritise:

  • author (Person) plus an accessible bio: who is speaking?
  • publisher (Organization) plus logo: who is publishing?
  • datePublished and dateModified: is it up to date?
  • @id stability to reuse the same entity across the whole site.
  • sameAs links to official profiles (disambiguation).

 

Implementation guides: JSON-LD code examples ready to adapt

 

The examples below follow a "linked entities" pattern: a stable organisation @id reused everywhere, and one primary type per page. Prefer JSON-LD, then validate the real rendering (final HTML) before rolling out at scale. Only include what you actually display. Finally, version your snippets: technical GEO needs governance, not guesswork.

 

Example 1: Article + author + date + organisation

 

 

Example 2: FAQPage (without over-optimising)

 

 

Example 3: Organization, sameAs and brand identity

 

 

AI-oriented technical files and conventions: llms.txt and AI agent files

 

Beyond Schema.org, conventions are emerging to help agents find the right pages quickly, in the right format, with less confusion. The llms.txt file fits that logic: less about "controlling", more about "guiding" towards your canonical sources. In parallel, many teams publish clean Markdown versions or documentation endpoints to simplify ingestion. These do not replace robots.txt or canonicals; they add a curation layer for conversational use cases.

 

What llms.txt is for, and how to structure it properly

 

llms.txt primarily works as an intelligent table of contents: it tells an agent where to find your reference pages (offers, docs, proof, definitions) and which URLs to avoid (parameters, archives, internal search). It's an emerging, unofficial format with inconsistent adoption, so treat it as governance and ambiguity reduction, not a guarantee. Technically, publish it at the root, return HTTP 200, and serve it as text/plain; charset=utf-8 (or text/markdown if you explicitly choose that). Keep it short, explicit, versioned, and aligned with your SEO signals (sitemaps, canonicals).

 

AI agent files: what to expose, in which format, and with what limits

 

By "AI agent files", we mean resources designed for agents: Markdown versions of key pages, documentation indexes, or highly structured reference pages. The right principle: expose what stabilises answers (definitions, offer scope, methodology, sourced figures, dates) and avoid what adds noise (listings, tags, duplication). Never use these files as a security mechanism: if content is sensitive, protect it with authentication and application-level access control. Also avoid including URLs that reveal internal endpoints or test environments.

 

Robots.txt, sitemaps and exposure consistency: avoid contradictory signals

 

The number one risk is contradiction: guiding an agent to a page you block elsewhere, or listing non-canonical URLs. Use each file for its intended role:

File Main role What you must align
robots.txt Control crawl access Don't accidentally block your reference pages
sitemap.xml Declare indexable URLs Include canonicals, exclude duplicates
llms.txt Guide towards sources and clean versions Point to canonicals, avoid parameters and noise

If you need an operational refresher, use a dedicated checklist and adapt it to your templates.

 

Commented examples: minimal vs advanced llms.txt

 

An effective llms.txt is not a dump of your site. Here are two intentionally minimal formats:

# Example (site)
> Reference pages to answer accurately about our offer and expertise.

## Reference pages
- [Offer](https://www.example.com/offer) : factual description and scope.
- [Documentation](https://www.example.com/docs) : up-to-date technical guides.
- [Case studies](https://www.example.com/case-studies) : results, context, methodology.
- [Glossary](https://www.example.com/glossary) : canonical definitions.

## Optional
- [Blog](https://www.example.com/blog) : deeper dives when needed.

Advanced version: add testable guidance (prefer the pricing page, ignore parameterised URLs), a contact point, and optionally links to clean Markdown versions of key pages.

 

Server configurations and technical optimisations for AI crawlers

 

AI crawlers and agents are sensitive to stability: intermittent errors, inconsistent redirects, strange encodings, and blocked resources. Your goal is straightforward: make access predictable, fast and repeatable. Many gains come from solid HTTP hygiene applied without exceptions to reference pages and AI-oriented files. Treat it as service quality: if the fetch fails, the citation doesn't happen.

 

HTTP, caching, compression and stability: reduce errors that break extraction

 

Serve critical pages with consistent statuses and headers, and avoid "different behaviour by user agent". For resources like llms.txt, use a short cache whilst iterating (for example Cache-Control: public, max-age=60), then increase the TTL once stable (example: max-age=3600), with purges on updates. Add consistent ETag or Last-Modified headers where possible to help agent-side validation. gzip/brotli compression is fine as long as proxies do not corrupt UTF-8 encoding.

 

Status codes (4xx/5xx), redirects and chains: protect cite-worthiness

 

Redirects are inevitable, but they are costly in on-demand contexts. Limit chains and standardise your rules (http→https, www→non-www, trailing slash, case) to converge in one hop. Monitor 5xx errors closely: server instability alone can be enough for a page to "disappear" from candidate sources. When you remove content, return clean 404/410 responses and remove those URLs from guidance files (sitemap, llms.txt) to avoid noise.

 

Rendering and delivery: SSR, critical content in HTML, and non-blocking resources

 

If your site relies heavily on JavaScript, provide a stable rendering path: SSR or pre-render the critical content (definition, key features, tables, FAQ). Ensure headings and answer blocks exist in the initial HTML, not only after execution. Defer non-essential assets so they don't block content display (and therefore extraction). The test is simple: "If I fetch the HTML, do I already have something I can quote?"

 

Logs and monitoring: identify AI crawlers, measure coverage and detect regressions

 

Without logs, you're flying blind. Instrument your server/CDN to spot agent patterns (user agents, IPs, paths) and, above all, errors on reference pages: 4xx, 5xx, timeouts, redirects. Add simple health alerts for strategic endpoints (docs, offer pages, glossary, llms.txt). The goal isn't to identify every bot; it's to detect quickly when a segment of agents can no longer access your content.

 

Credibility signals for AI: E-E-A-T, AI trust signals and technical evidence

 

Generative systems favour content that is attributable and verifiable: who wrote it, when, using which methodology, with what evidence. Strong technical GEO makes these signals available and keeps them consistent between visible HTML and structured data. External mentions matter, but on-site you control the foundations: identity, authors, dates, and traceability. Don't overstate it: trust signals must be proven, not merely declared.

 

Structure expertise: authors, bios, methodology, updates and references

 

Give each piece of content an identifiable author with a bio page (role, expertise) and a clear link back to the organisation. Show publication date and updated date when you change the substance, and mirror those in JSON-LD. Add a "methodology" section where you make technical recommendations (assumptions, scope, limits). These are elements AI can quote without risky paraphrasing.

 

Data, citations and traceability: make it easy to refer back to the source

 

Whenever you use a figure, anchor it: source, year, context. For instance, the reported +300% increase in referral traffic from generative AI platforms (Coalition Technologies, 2025) or the 2.6% CTR when an AI Overview is present (Squid Impact, 2025) should remain tied to their provenance. The more your information is cross-checkable, the more an AI can reuse it without reputational risk. This also protects you against approximation: a dated, sourced figure ages better than a vague claim.

 

Avoid weak signals: anonymous pages, inconsistent dates and unattributed content

 

The most damaging weak signals are often invisible in traditional SEO: pages with no author, missing or contradictory dates (HTML vs JSON-LD), offers whose scope changes with no version trail. Avoid unverifiable "proof" too (reviews not displayed, numbers with no context). In technical GEO, trust is fragile: one inconsistency can push an agent towards a more stable source. Your rule: every structured element must be observable.

 

Content strategy for AI assistants: formats, templates and extractability rules

 

Content is still the fuel, but technical GEO determines whether it's usable. Build templates that produce reusable answers by default: definitions, criteria, steps, limits, use cases, and examples. Assistants are more likely to quote well-bounded units of information than long narratives. If you want to scale, standardise patterns and control non-regression with every publication. For the editorial side (angles, formats, structure), AI-optimised content is a useful complement.

 

Design reusable answers: definition, steps, conditions, limits and use cases

 

A good GEO-friendly template answers first, then proves, then sets boundaries. Here's a simple structure to replicate across your pages:

  1. Definition in one self-contained sentence.
  2. Steps as a numbered list (for "how to").
  3. Criteria as a bullet list (for "how to choose / evaluate").
  4. Table of attributes (for comparisons or specifications).
  5. Limits and conditions (what you don't cover).

 

Structure proof: sourced figures, citations, "key takeaways" and "limits" sections

 

Proof needs structure. Add short, factual call-outs and avoid untestable superlatives. Example "Key takeaways" block (adapt to your topic):

  • More than 50% of searches reportedly show an AI Overview (Squid Impact, 2025).
  • The click-through rate for position 1 reportedly drops to 2.6% when an AI Overview appears (Squid Impact, 2025).
  • Referral traffic from generative AI platforms increased by +300% in one year (Coalition Technologies, 2025).

Then add a "limits" section: what your data cannot conclude, and what depends on context (industry, country, technical maturity).

 

Balancing SEO and GEO: where technical work truly changes the outcome

 

The trade-off isn't choosing one over the other: GEO extends SEO. Technical work changes the outcome when it reduces losses linked to zero-click behaviour and improves attribution (being cited even without a click). It also changes the outcome when it consolidates an entity (Organization, product, service) across all pages—something keyword-led SEO doesn't always handle directly. Finally, it forces you to maintain time-based data (offers, pricing, versions) to prevent assistants from reusing outdated information.

 

Operating model: production, validation, updating and citation tracking

 

Run operations like a quality loop: produce, validate, publish, measure, update. A minimal sequence works well:

  1. Write using an extractable template (definition, lists, table, limits).
  2. Implement Schema.org and verify alignment with visible content.
  3. Publish and test cite-worthiness with real business prompts (varied personas).
  4. Plan updates: many bots appear to favour recent content (Incremys sources mention a bias towards the last two years).

To accelerate implementation, follow an implementation-focused tutorial.

 

Set up a tool-supported workflow (without piling on tools)

 

 

Centralise SEO & GEO auditing, prioritisation and tracking with Incremys (with Google Search Console / Google Analytics API integration)

 

At some point, the challenge is no longer "what to do", but "how to do it at scale without losing control". Incremys mainly helps you structure the workflow: SEO & GEO auditing, impact-led prioritisation, industrialised production and updates, then reporting—whilst integrating Google Search Console and Google Analytics via API to keep a unified measurement foundation. The value is organisational: avoid tool sprawl, keep decision traceability, and speed up non-regression checks. For an overview of the framework and control points, the article on GEO tools is a useful reference.

 

FAQ about technical GEO

 

 

What is technical GEO in SEO?

 

Technical GEO in SEO covers the optimisations that make a website readable, extractable and attributable for generative AI engines and assistants. It includes HTML structuring, Schema.org structured data, canonicalisation, server stability, and conventions such as llms.txt to guide agents towards the right sources. It complements SEO: without a strong SEO baseline, long-term citations are rare. The end goal is attribution (citation, source mention), not just clicks.

 

How does technical GEO differ from traditional SEO?

 

Traditional SEO optimises visibility and traffic through indexing and rankings in the SERPs. Technical GEO optimises fragment extraction and attribution inside AI-generated summaries. Technically, this means strengthening entity signals (Organization, author, dates), reducing URL ambiguity (canonicals, duplication), and structuring content so it can be reused (lists, tables, self-contained answers). In short: SEO = ranking, GEO = reuse and citation.

 

GEO vs SEO: what does it change technically in practice?

 

In practical terms, you move from "a page optimised for a click" to "passages optimised to be quoted". That means stricter templates (descriptive headings, short paragraphs, answer blocks), entity-first Schema.org markup with stable @id values, and URL governance to prevent assistants from citing a secondary page. It also means stronger stability monitoring (5xx errors, redirects, rendering). Technical GEO adds guidance conventions (llms.txt) to reduce noise.

 

Why is technical GEO becoming essential with generative AI search engines?

 

Because visibility is shifting towards generative answers where users don't need to click: 60% of searches end without a click (Squid Impact, 2025). When an AI Overview is present, the CTR for position 1 can drop to 2.6% (Squid Impact, 2025), which mechanically reduces the returns of "SEO only". Technical GEO is essential to stay present within the answer itself via clear attribution. It also reduces the risk of errors or outdated reuse by making your sources more explicit.

 

Which Generative Engine Optimisation principles should you understand before implementing?

 

Key principles: (1) AI extracts fragments, not whole pages; (2) it favours verifiable, attributable information (author, dates, sources); (3) it struggles with ambiguity (duplicate URLs, poorly defined entities); (4) it is probabilistic and data-dependent, so you must provide explicit signals. Implementation then becomes: structure content and entities, and test cite-worthiness using real business questions. Finally, keep content fresh: outdated time-based data directly degrades answer quality.

 

Which technical optimisations strengthen technical GEO on a website?

 

  • Reliable rendering: critical content present in the HTML (SSR/pre-rendering if needed).
  • Extractable structure: descriptive headings, answer blocks, lists and tables.
  • Strict canonicalisation: one canonical URL per topic, duplication controlled.
  • HTTP stability: 200 on reference pages, minimal redirects, low 5xx.
  • Entity-oriented structured data: Organization, Article, FAQPage, HowTo, with stable @id values.
  • Agent guidance: concise, maintained llms.txt aligned with robots.txt and sitemaps.

 

Which technical optimisations improve visibility criteria in AI assistants?

 

AI assistants primarily "reward" what reduces interpretation cost and the risk of error: clean HTML, key information above the fold, cite-worthy passages, and disambiguated entities. Add attribution signals (author, dates, publisher, canonical URL) and contextualised evidence (sourced figures). Reduce contradictions: different definitions in multiple places, old pages still indexable, or JSON-LD schemas that vary across templates. Ensure stable access too (avoid WAF/CDN blocking for unfamiliar user agents).

 

How do you use GEO-specific structured data with Schema.org?

 

Use Schema.org as a small graph of linked entities rather than minimal markup. Define a stable @id for your organisation and reuse it across the site. Choose one primary type per template (Article, Service, SoftwareApplication, FAQPage, HowTo) and link supporting entities (Organization, BreadcrumbList, Review if displayed). Keep markup strictly aligned with visible content, and test after every template change. The GEO goal: make attribution easier and reduce ambiguity.

 

Which Schema.org schemas should you prioritise to improve understanding and reuse?

 

  • BlogPosting/Article: author, publisher, dates, language.
  • FAQPage: directly extractable Q&A.
  • HowTo: steps, duration, requirements, conditions.
  • Organization: brand identity, logo, sameAs.
  • Service / SoftwareApplication / Product: offer scope and verifiable attributes.

 

Which signals help generative engines cite a source via technical GEO?

 

The most cite-worthy signals are those that make a source attributable and verifiable: an identified author, consistent dates, a clear publisher, a stable canonical URL, structured information (lists, tables), and sourced data. Entity consolidation via stable @id values and sameAs also helps prevent confusion between similar brands or products. Finally, technical stability (no 5xx, no redirect chains, reliable rendering) prevents agents from abandoning your page. In GEO, a strong source is first and foremost a source with no ambiguity.

 

Which signals help generative engines cite a source (author, date, sources, entities)?

 

Signal What to display What to structure (Schema.org)
Author Name, role, bio Person (author), link to author page
Dates Published + updated datePublished, dateModified
Publisher Organisation, logo Organization (publisher), logo
Sources Explicit references At minimum, structure attribution (Article + publisher)
Entities Exact product/service naming Stable @id values, sameAs, relationships (provider, itemReviewed…)

 

Which AI engines and assistants are most influenced by technical GEO?

 

The environments most sensitive to technical GEO are those that generate summaries and select sources: Google AI Overviews (highly correlated with the organic top 10), ChatGPT (strong correlation with Bing according to Incremys sources), and "answer + sources" engines such as Perplexity. In all cases, technical work mainly helps you be understood and attributed; it doesn't replace authority or content quality. The best approach is to test cite-worthiness across multiple assistants using real business prompts and different personas.

 

Is llms.txt mandatory, and what should it include at a minimum?

 

No—llms.txt isn't mandatory and adoption varies by platform. At a minimum, it should include: site identification, a short factual description, then a list of reference pages (offer, docs, case studies, glossary) with one descriptive line each. Serve it at the root with HTTP 200, UTF-8, and no unstable redirects. Keep it maintainable: an outdated file creates more confusion than it solves.

 

What should you include in AI agent files, and how do you expose them safely?

 

Include stable, verifiable resources: clean Markdown versions of canonical pages, technical documentation, a glossary, security/trust pages if you have them, and case studies with methodology. Only expose them on public URLs if the content is truly public. Never include sensitive URLs (exports, internal endpoints, test environments), and don't treat these files as protection—use authentication and access control for premium content. Finally, keep "canonical page ↔ agent version" strictly synchronised to avoid contradictions.

 

How do you check that content is rendered and extractable without relying on complex JavaScript?

 

Check the server-rendered HTML: critical content (definition, steps, table, FAQ) must be present without interaction. Then test extraction: each section should begin with a self-contained sentence followed by a list or table. If you use a JS framework, prefer SSR or pre-rendering for reference pages. Finally, validate consistency between visible content and JSON-LD: perfect markup on content that isn't rendered is pointless.

 

Which technical issues cause you to lose citations (redirects, 5xx, canonicals, duplication)?

 

  • Redirect chains and non-standardised URL rules (slash, www, http/https).
  • Intermittent 5xx errors, timeouts, or WAF/CDN blocks on unusual user agents.
  • Inconsistent canonicals (A canonicalises to B, B to C) or contradictory signals.
  • Mass duplication (tags, archives, facets, parameters) that blurs the reference page.
  • Critical content only rendered client-side, absent from initial HTML.

 

Which server configurations and technical optimisations for AI crawlers have the most impact?

 

The most impactful are the ones that guarantee a reliable fetch: stable HTTP 200 responses, minimal redirects, compression without encoding corruption, and controlled caching with purges on updates. For llms.txt, serve a predictable Content-Type and UTF-8 encoding, and avoid 301/308 behaviour that varies by agent. Monitor errors via logs and simple alerts on strategic endpoints. Keep rate limiting sensible: protect your infrastructure without blindly blocking "unknown" agents.

 

How do you define a content strategy for AI assistants without cannibalising existing SEO?

 

Define one canonical page per intent, and use GEO-focused content as satellites that enrich the reference page rather than duplicating it. Standardise extractable templates (definition, steps, table, limits) and connect them with clear internal linking back to the master page. Then measure cite-worthiness and optimise the sections being quoted, instead of creating multiple near-identical pages. To frame the stakes, use the order-of-magnitude benchmarks from GEO statistics and, if needed, LLM statistics to ground the usage context.

To keep learning without spreading yourself too thin, find more analyses and guides on the Incremys Blog.

Discover other items

See all

Next-Gen GEO/SEO starts here

Complete the form so we can contact you.

The new generation of SEO
is on!

Thank you for your request, we will get back to you as soon as possible.

Oops! Something went wrong while submitting the form.