Back to blog

Microdata for SEO in HTML5: Mark Up Your Content Without Errors

SEO

Discover Incremys

The 360° Next Gen SEO Platform

Request a demo

Last updated on

22/2/2026

Chapter 01

Microdata in HTML is one of the historical formats for embedding structured data directly into a page's source code. If you want the wider picture first — definitions, SEO benefits, GEO impact and how to prioritise — Incremys' pillar guide to structured data provides the full framework. Here, we focus exclusively on the Microdata format: the attributes you will use, realistic use cases in 2026, and a practical comparison with JSON-LD and RDFa.

Understanding HTML5 Microdata: a Structured Data Format Embedded in HTML

What This Guide Adds to the Main Structured Data Article

The pillar article explains the "why" (visibility, rich results, machine readability, GEO). This guide concentrates on the Microdata-specific "how":

how the attributes work (itemscope, itemtype, itemprop, itemid, itemref) and what that means for the DOM;
the maintenance trade-offs versus JSON-LD (the format Google recommends in most cases) and RDFa;
a quality-led approach to testing and monitoring, to avoid markup that is "valid but unusable" (inconsistencies, missing required fields and poorly defined entities).

Why SEO Markup Helps Search Engines and Generative AI Systems

Search engines can index a page without precisely understanding what a price, availability status, author or breadcrumb trail represents. Semantic markup reduces ambiguity by describing entities and their properties using a shared vocabulary (most commonly Schema.org). This benefits SEO (potential eligibility for certain rich-result formats and better interpretation), and it also supports GEO structured data strategies: generative systems (LLMs) tend to favour information that is explicit, consistent and straightforward to extract.

The broader context underlines the stakes: Google remains a dominant channel (89.9% global market share according to Webnyxt, 2026), and search is shifting towards more synthesised SERPs (Semrush, 2025 cites 60% of searches as "zero-click"). In an environment where visibility and citations are never guaranteed, making facts machine-readable becomes a genuine defensive advantage.

Microdata: a Practical Definition and How It Works in HTML

The itemscope / itemtype / itemprop Model: Connecting Attributes to On-Page Content

According to MDN documentation, Microdata (an HTML specification maintained by the WHATWG) is designed to provide metadata within the content of a page: you annotate existing HTML elements so that crawlers can extract "name/value" pairs grouped into "items" and "properties".

In practice, the key attributes are:

itemscope: starts an item and defines its scope;
itemtype: indicates the item type via a vocabulary URL (most commonly Schema.org);
itemprop: associates a property with an element or its descendant;
itemid: provides a unique identifier for the item;
itemref: references other HTML elements that describe the same item (useful, but a common source of errors when misused).

The crucial point is that this format is attached to the rendered HTML. That closeness can be an advantage — the structured data is literally wrapped around the visible content — but it is also why Microdata can be harder to maintain on dynamic sites with templates, components, A/B tests and redesigns.

Choosing a Schema.org Vocabulary and Aligning It with a Consistent SEO Schema

Microdata only delivers value when the vocabulary is understood by the systems reading it. In practical SEO terms, the most widely reusable standard remains Schema.org (created in 2011 by Google, Microsoft, Yahoo and Yandex). If you want to formalise types, properties and relationships as a proper "data contract", Incremys' guide to schema SEO helps you map page templates (product, article, organisation, local, FAQ and so on) to the schemas search engines expect.

A simple illustration: if you are describing a music event, the MusicEvent type can be expressed using itemtype='https://schema.org/MusicEvent', then populated with properties such as startDate and location. The logic of "type → properties → values" is identical regardless of format (Microdata, JSON-LD or RDFa); only the placement within the document changes.

What Google Actually Interprets: Requirements, Limitations and Realistic Use Cases

Google can parse all three formats (JSON-LD, Microdata and RDFa), but whether the markup is practically useful depends on three constraints, whichever format you choose:

Consistency with visible content: marking up information that does not appear on the page (price, reviews, availability) can lead to non-display and reduced trust signals;
Minimum completeness: a Schema.org type can be syntactically valid yet still ineligible for rich results if required fields are absent (for example, a Product without an Offer or price, depending on the guidelines);
Editorial control: the more frequently a site changes — stock levels, prices, reviews, product variants — the higher the risk of drift between the DOM and the markup.

Crucially, display is never guaranteed: Google decides whether to show any given enhancement. You should therefore think in terms of "eligibility + reliability + usability" rather than treating markup as an automatic mechanism.

Structured Data Formats: Comparing JSON-LD, Microdata and RDFa

JSON-LD: Strengths, Limitations and Recommended Deployment Scenarios

JSON-LD is typically implemented as a dedicated block (<script type='application/ld+json'>) placed in the <head> or <body>, entirely separate from visible HTML. This decoupling makes it considerably easier to:

maintain (far less impact during front-end redesigns);
industrialise (templates, CMS injection and versioning);
quality-control (diffs, automated tests and validation pipelines).

A common limitation: because JSON-LD sits alongside the DOM rather than inside it, teams need strong discipline to keep it synchronised with visible content — particularly on sites where prices and stock levels change frequently.

JSON-LD vs Microdata: Key Differences for Maintenance and Code Quality

The distinction is not semantic (you often declare the same Schema.org types in both formats); it is structural:

Microdata: attributes placed directly inside HTML tags. Advantage: strict proximity to what is rendered. Drawbacks: more verbose HTML, harder to read and refactor, and a higher risk of breakage when templates evolve.
JSON-LD: a separate data layer. Advantage: readable, maintainable and scalable. Drawback: requires governance to remain aligned with the content actually displayed.

This is the primary reason Google highlights JSON-LD across most of its documentation: separating code and content reduces errors during UI, component and CMS changes, and makes iterative improvement far more straightforward.

When HTML5 Microdata Is Still Relevant Despite Evolving Standards

Microdata can still make sense in a handful of typical situations:

you need to annotate very specific HTML fragments closely — for example, a complex editorial component — and your organisation has tight control over templates;
you are working on a legacy site where Microdata is already in place and the short-term priority is fixing existing issues rather than undertaking a full migration;
you want to minimise the risk of drift between injected JSON-LD and the DOM by keeping structured data directly attached to rendered elements.

That said, as soon as a site is heavily component-based, multilingual or subject to frequent front-end testing, maintenance debt accumulates quickly. In many projects, Microdata is retained mainly for historical reasons and then gradually replaced during redesigns.

RDFa: Compatibility, Complexity and Implementation Risks

RDFa also enriches HTML via attributes, drawing on semantic web principles (Resource Description Framework in Attributes). It is compatible with Google, but it is widely considered more complex to implement correctly than JSON-LD and less prevalent in everyday SEO practice. It tends to make most sense for organisations already structured around knowledge graphs and RDF vocabularies.

Decision Grid: Choosing Based on Your CMS, Teams and Technical Debt

High-volume site, multiple teams, frequent redesigns: favour JSON-LD (simpler deployment and QA).
Legacy site already annotated, limited budget: fix what exists in Microdata and plan a phased migration starting with the highest-ROI templates.
Semantic web or knowledge graph-driven organisation: RDFa can be justified, but it requires specialised skills and strict conventions.

What Google Expects for Structured Data Eligibility and Rich Results

Google Structured Data: Consistency, Visibility and Compliance with Official Guidelines

Two principles should guide every decision:

Eligible ≠ shown: compliant markup makes a rich result possible, not automatic.
Visible ≠ marked up: everything you mark up must be verifiable within the main on-page content (price, stock, author, reviews and so on).

This is precisely why quality control is essential after every change: template updates, review modules, pricing logic, internationalisation, front-end redesigns, AI-driven automation and more.

Structured Data for SEO: Prioritising the Pages and Types with the Greatest Impact

Rather than marking up content indiscriminately, prioritise the templates that carry genuine business value: product and offer pages, service pages, local pages, pillar content and pages with high impression volumes. The aim is robust coverage where it matters most, then gradual expansion from there.

To manage this work effectively, rely on actionable metrics (impressions, CTR, clicks and conversions) and cross-reference them with Search appearance reporting. For a more ROI-led perspective, Incremys' resources on SEO statistics, SEA statistics and GEO statistics help connect visibility, acquisition and overall performance.

Why JSON-LD Became the Recommended Standard: Separating Code and Content for Long-Term Maintainability

Google's preference is largely an operational reality: at scale, the most common failure point is rarely Schema.org theory — it is keeping markup consistent across releases. A separate markup layer reduces side effects when the UI changes and enables automated controls (validation, testing and versioning). As a result, JSON-LD proves more reliable over time for most organisations, even though Microdata remains fully supported.

Implementation: From HTML Microdata to Robust Schema Markup Without Errors

Mapping Your Information to the Right Schema Markup: Required Properties, Recommended Properties and Validity

Work template by template:

define the primary entity (e.g. Product, BlogPosting, Organization);
list the required properties for Google eligibility (when the goal is a rich result);
add recommended properties only if you can guarantee their long-term accuracy — otherwise you are creating debt;
validate syntax and consistency systematically (types, properties at the correct level, date formats and accessible URLs).

A practical tip: document an internal "contract" (required fields, sources of truth and mapping rules) so that each team does not end up implementing its own variant.

Common Pitfalls: Nesting, Duplication, Missing Data and Non-Visible Content

Inconsistent nesting: entities not properly closed, or properties attached to the wrong item.
Duplication: multiple competing representations of the same entity, often arising after a partial migration where JSON-LD is added without removing the existing inline markup.
Missing data: a schema that appears plausible but is incomplete for Google — for example, a product page missing an Offer.
Non-visible data: marking up reviews, prices or availability that are not actually displayed on the page is one of the classic reasons for non-display.

Focused Examples Without Over-Markup: Article, Breadcrumb, Organization, Product

The goal is to stay minimal, faithful to visible content and robust over time:

Article / BlogPosting: title, publication date, last-updated date, author (Person or Organization) and the primary image — provided it is genuinely displayed on the page.
Breadcrumb trail (BreadcrumbList): the real navigational hierarchy (even if SERP display evolves, the signal continues to support interpretation).
Organization: name, URL, logo and contact points where present and consistent with the site.
Product + Offer: name, description and image, then an Offer with standardised currency, price and availability. For reviews, only declare aggregated ratings and individual reviews that are genuinely collected, displayed and compliant (see AggregateRating on Schema.org).

MDN also provides concrete examples showing numeric properties within a SoftwareApplication schema (e.g. a ratingValue of 4.6 and a ratingCount of 8,864), which is useful for understanding the level of granularity expected — provided those values genuinely exist on the page.

Testing and Validation: a Method, a Testing Approach and Quality Criteria

Setting Up Reproducible Testing: Environments, Logs and Pre-Release Checks

Avoid ad-hoc testing on a single production URL. Instead, put the following in place:

a representative staging environment (same templates, same data as production);
a reference list of pages per template (1 to 5 URLs);
a systematic check after every front-end or CMS release;
validation logs (date, URL, errors and fixes) to strengthen governance over time.

Choosing the Right Testing Approach: Diagnosis, Prioritisation and Fixing SEO Markup Errors

To diagnose and correct issues effectively, combine:

the Schema.org validator (vocabulary and syntax validation);
Google's Rich Results Test (search-engine eligibility);
and, on the Incremys side, the methodology set out in our guide on how to test structured data systematically — covering before/after comparisons, blocking errors versus warnings and prioritisation.

If you are working with Microdata, always test the HTML that is actually rendered rather than a theoretical template, because the markup depends on the final DOM.

Tracking Impact in Google Search Console: Linking Impressions, Clicks and Conversions

Within Google Search Console, monitor:

Enhancements and rich result reports (detection, errors and impacted URLs);
performance filtered by "Search appearance" where available, to isolate affected pages;
CTR alongside positions (the purpose of an enhancement is often to improve click-through at a stable ranking).

Then connect that traffic to conversions in your analytics platform. For B2B sites in particular, the objective is not just the click — it is lead quality (form submissions, demo requests, contact enquiries), which requires end-to-end tracking.

GEO Angle: Impact on Visibility in Generative AI Answers

From Extraction to Synthesis: How LLMs Identify Entities, Attributes and Relationships

AI-driven systems do not read a page as a human would: they extract entities, attributes and relationships, then synthesise. Structured signals — types, properties, identifiers and explicit relationships such as product→offer, article→author and organisation→contact details — help stabilise interpretation, especially when multiple pages contradict one another.

Two pieces of quantified context illustrate the trend: Similarweb (2025) estimates 1.13 billion monthly visits generated by AI worldwide, and IPSOS (2026) indicates that 39% of people in France use AI search engines for their queries. Generative answers are becoming a visibility surface in their own right.

GEO-Ready Best Practice: Cross-Page Consistency, Reliable Sources and Trust Signals

Cross-page consistency: maintain the same facts (prices, contact details, authorship) across all pages that present them — and in the associated markup.
Faithful to visible content: avoid "marketing" fields that cannot be independently verified; contradictions reduce citability.
Editorial structure: pages with a clear H1–H2–H3 hierarchy are 2.8 times more likely to be cited by AI (State of AI Search, 2025, cited by Incremys), and 80% of cited pages use lists. This does not replace Schema.org, but it reinforces machine readability.

Scaling Quality with Incremys: Governance, Automation and Measurement

Audit and Prioritisation: Aligning Technical, Content and Business Priorities

Incremys primarily functions as a governance layer: helping you prioritise high-impact templates, detect inconsistencies and structure a clear action plan. In practice, the SEO 360° Audit module combined with a thorough technical SEO audit helps pinpoint where markup debt — errors, duplicates and missing fields — is causing lost visibility or reduced reliability in SERPs and generative AI answers.

Centralising Measurement via API: Google Search Console, Google Analytics and Performance Steering

To move from markup that is merely present to markup that is actively managed, centralised measurement is essential. Incremys integrates Google Search Console and Google Analytics via API, linking error detection, changes in impressions, clicks and CTR, and business performance — all without the need for constant manual exports.

FAQ: Microdata, Structured Data and Structured Data Formats

What Is Microdata on a Web Page?

Microdata is a set of attributes added to HTML tags to annotate elements (entities) and their properties using a shared vocabulary (most commonly Schema.org). The goal is to make certain information easier for search engines and analysis bots to extract and interpret accurately.

What Is the Difference Between JSON-LD and Microdata?

JSON-LD groups structured data in a dedicated block (typically a script tag), entirely separate from visible HTML. Microdata, by contrast, places properties directly inside HTML tags using attributes. The former is generally easier to maintain at scale; the latter stays tightly bound to the DOM.

Why Does Google Recommend JSON-LD Rather Than HTML Microdata?

Because separating the data layer from the presentation layer reduces risk during redesigns and makes maintenance, quality assurance and large-scale deployment considerably easier. Microdata is still interpreted by Google, but it tends to increase code complexity and accumulate technical debt over time.

Microdata, RDFa or JSON-LD: Which Should a B2B SEO-Focused Site Choose?

In most B2B cases, JSON-LD is the most pragmatic choice, offering straightforward industrial deployment, maintainability and quality assurance. Microdata can suit a relatively stable site or one that is already historically annotated that way. RDFa is typically only justified where there is an existing advanced semantic web approach and dedicated in-house expertise.

Is Microdata Still Supported by Google?

Yes. Google continues to support Microdata, JSON-LD and RDFa. In practice, however, many teams migrate progressively to JSON-LD during redesigns because it is significantly easier to maintain at scale.

Can You Combine Multiple Structured Data Formats on the Same Page Safely?

It is technically possible, but risky if you describe the same entity more than once with differing values. The most dangerous scenario is a partial migration — adding JSON-LD without removing the older inline markup — which creates duplicates or outright contradictions. If you do combine formats, apply a clear strategy (one entity, one consistent representation) and test systematically.

Which Schema.org Types and SEO Schemas Should You Prioritise for Better Displays in Google?

Prioritise the types tied to your most important templates: Product + Offer (and AggregateRating where compliant) for e-commerce, Article/BlogPosting for content attribution, BreadcrumbList for site structure, and Organization/LocalBusiness for brand identity and local signals. The right choice ultimately depends on which templates are driving impressions and conversions.

Which Testing Approach Should You Use to Detect and Fix Markup Errors?

Use a Schema.org validator for vocabulary compliance and Google's Rich Results Test for eligibility. For a fully repeatable approach — covering before/after comparisons, prioritisation and post-release checks — follow the method described by Incremys for how to test structured data and systematically reduce blocking errors.

Does Structured Data Directly Influence SEO Rankings?

It does not replace content quality or domain authority, and rich-result display is never guaranteed. The impact is often indirect: better machine understanding, eligibility for certain rich formats and a potential uplift in CTR. One cited source reports a 20–40% higher CTR from rich results (according to analysis published by Digitalkeys). Measure this case by case within Search Console.

How Does the GEO Angle Improve Visibility in Generative AI Answers?

In GEO, the goal is to be citable: clear entities, consistent attributes, explicit relationships and an absence of contradictions across pages. Combine this with a readable editorial structure (clear headings and lists) and verifiable, accurate facts. To keep exploring these topics — SEO, GEO, AI and digital performance — the logical next step is the Incremys Blog.