You already have the answers. We help the internet find them.
Structure before ads — your business, clearly defined, permanently visible
Every website on the internet is written for two audiences. The human who reads it. And the machine that decides whether anyone ever finds it. For most of the web's history those two audiences have been completely misaligned — beautiful pages for humans, invisible noise for machines.
Schema is the bridge. It is a standardized vocabulary — a shared language — that lets your website declare what it is to any machine that comes looking. Your business name. Your location. Your hours. Your services. Your relationship to your industry, your city, your customers. All of it structured, labeled, and machine-readable in a format that search engines and AI systems can read, reason about, and trust.
The technical name for the format is JSON-LD — JavaScript Object Notation for Linked Data. It lives in the head of your HTML as a script block. It is invisible to your visitors. It is everything to the machines that decide whether your visitors ever find you.
Think of it this way. Your website is an island. Schema is the map. Without the map, the treasure hunters guess. With the map, they come back.
This page is a complete reference. The history of where schema came from. The real numbers on what it does. How AI changed the stakes. What most implementations get wrong. And what we do differently. If you are a business owner, a developer, or an agency — everything you need to understand structured data at the level that actually matters is here.
In 2011 something unusual happened in the technology industry. Four competing companies — Google, Microsoft, Yahoo, and Yandex — sat down together and agreed on something. That almost never happens. They were competing for the same users, the same advertisers, the same market. And they agreed.
The problem they were solving was real and it was expensive. Each search engine had been recommending its own vocabulary for structured data. Webmasters trying to add structured markup to their sites had to pick a side — or implement everything multiple times. Most gave up. The result was that the structured web Berners-Lee imagined in 2001 was still mostly theoretical in 2010. The data was there. The standard was not.
The three engineers who drove schema.org into existence were R.V. Guha at Google — who had previously created RSS and co-led the Cyc project — Dan Brickley, who had helped build the Semantic Web project at W3C, and Steve Macbeth at Microsoft. They launched schema.org with 297 types and 187 properties. One vocabulary. One standard. Every search engine would accept it.
The idea was brutally simple. Give webmasters a shared vocabulary so they only have to do the work once. The search engines would each use the markup differently on their end — that was their problem to solve. The webmaster's job was just to speak the language.
Within four years, 31.3% of pages in the Google index had schema.org markup. That number represented at least 12 million sites. Schema.org had grown from 297 types to 638 types and 965 properties. It had become the most widely adopted structured data vocabulary on the internet — and it is still growing. Today it contains 827 types and 1,528 properties covering every category of entity that exists on the web.
They published the standard. They gave it to the world. And then they waited for the web to catch up.
Most of it still has not.
When schema.org launched in 2011 it supported three ways to implement structured data. Each one represented a different philosophy about how machines and humans should share information on the same page.
Microdata came first. It embedded structured markup directly inside HTML tags using attributes. Clean in theory. In practice it tangled the data layer and the presentation layer together — every time a developer changed the HTML they risked breaking the schema. It also required deep knowledge of both HTML structure and schema vocabulary simultaneously, which was too much to ask of most webmasters.
RDFa — Resource Description Framework in Attributes — was the academic approach. It came from W3C's linked data community and was technically more expressive than Microdata. It was also significantly more complex. The learning curve was steep enough that adoption stayed concentrated among developers who already lived in the semantic web world.
Then in 2014 Google began recommending JSON-LD — JavaScript Object Notation for Linked Data. And everything changed.
JSON-LD solved the fundamental problem that Microdata and RDFa both had. It completely separated the structured data from the visible HTML. Your schema lived in a script block in the head of your page — a clean, self-contained JSON object that declared everything about your business without touching a single line of your front-end code. Developers could update schema without touching design. Designers could update design without touching schema. The two layers finally stopped fighting each other.
JSON-LD became the W3C Recommendation in 2020 — the highest level of endorsement the standards body issues. It is the official standard. Google recommends it. Every major AI system reads it. It is the language your website uses to speak to machines.
All three formats remain technically valid. In practice, JSON-LD is the answer. Our federation runs entirely on JSON-LD. Every entity record we produce, every schema block we generate for clients, every edge in our graph — JSON-LD.
The separation of concerns that JSON-LD provides is the reason the entire schema.org community converged on it. Your structured data is a document. Your HTML is a document. They live separately, they serve different audiences, and they can be maintained independently. That is not a minor convenience. That is the difference between schema that gets implemented and schema that gets abandoned.
Schema.org has been live since 2011. Fourteen years of real data exists on what happens to websites that implement it versus websites that skip it. The numbers are from Google's own documentation, from independent case studies, and from publishers who ran controlled experiments on their own traffic. These are cited. These are real.
Rotten Tomatoes added structured data to 100,000 unique pages and measured a 25% higher click-through rate on pages with structured data compared to pages without. The Food Network converted 80% of their pages and saw a 35% increase in visits. Nestlé measured pages showing as rich results and found an 82% higher click-through rate than non-rich result pages. Rakuten found users spend 1.5x more time on structured data pages and have a 3.6x higher interaction rate.
These are not edge cases. These are household names running controlled experiments at scale. The conclusion is the same every time. Structured data changes how your result looks in search. How it looks changes whether people click. Whether people click determines whether your business gets found.
What schema actually does at the technical level is give search engines and AI systems explicit signals instead of forcing them to guess. Without schema, a crawler arrives at your page and has to infer everything — what type of business you are, where you operate, what you sell, who your hours are for, how your reviews relate to your services. That inference process is imperfect and expensive. The crawler may get it right. It may get it partially right. It may get it wrong entirely.
With schema, you tell the machine exactly what everything is. There is no inference required. The crawler reads the JSON-LD block, extracts structured facts, and moves on — confident in what it found. That confidence translates into richer search results, higher citation frequency in AI responses, and stronger entity authority in knowledge graphs.
The machines reward the islands that make their job easy. Schema is how you make their job easy.
Schema mattered before large language models. The CTR numbers above are from the pre-LLM era — search features, rich snippets, knowledge panels. Those were already significant reasons to implement structured data. Then AI arrived and the entire calculus shifted.
Search engines used to match keywords to pages. The job was ranking — which page is most relevant for this query. Schema helped by providing explicit signals that improved ranking accuracy. That was the old game.
AI systems reason. They build models. When someone asks ChatGPT or Google's AI Overview about a restaurant in Covina, the system is constructing an understanding of what that restaurant is — its entity. What type of establishment. What cuisine. What neighborhood. What relationship it has to the surrounding area, to similar businesses, to the people who own it. That model was built long before the question was asked. Your schema is one of the primary inputs into that model.
The shift from ranking to reasoning changes what schema needs to do. In the old world, schema helped you rank higher. In the new world, schema determines whether you exist in the AI's model of your industry at all. A business with rich, well-structured, relationship-dense schema is a fully formed entity in the AI's understanding. A business without it is a gap in the data — something the AI has to guess about, or worse, something it hallucinates.
We have all experienced AI hallucination on real businesses. Wrong addresses. Wrong phone numbers. Descriptions that fit nobody. Services that were never offered. That is what happens when an AI system has to build an entity model from unstructured data and inference. The model fills in gaps with pattern-matching. Pattern-matching produces plausible-sounding lies.
Schema is hallucination prevention. It gives the AI system ground truth about your business from a source it can verify — your own website, declaring its own identity, in a standardized vocabulary the system was trained to trust. Every property you declare in JSON-LD is one fewer thing the AI has to guess about. Every relationship you establish is one more accurate edge in the model of your business that AI systems carry when they answer questions about your industry.
The stakes were already high in 2011. In 2026, with AI systems mediating the majority of discovery for a growing share of the population, schema is the most important technical investment a website can make. Full stop.
The difference between those two examples is the difference between existing in AI's model of your industry and being a gap that gets filled with a guess. Every business in our federation has the second version. Every business we index gets declared, linked, and confirmed.
Schema.org has been publicly available since 2011. JSON-LD has been the recommended format since 2014. The W3C made it an official recommendation in 2020. And yet the most common implementation of structured data on a business website in 2026 looks like this: a plugin-generated block with a name, an address, a phone number, and nothing else. Written once. Never updated. Never connected to anything.
This is what we call the checkbox problem. Schema gets treated as a technical audit item. Something to tick off the list. The SEO agency adds it to the monthly platinum package. The developer generates it with a WordPress plugin. The client never sees it and never thinks about it again. The result is structured data that technically exists but does nothing meaningful for discoverability in the AI era.
The checkbox problem has three specific failure modes.
Thin schema. The minimum required fields and nothing more. A LocalBusiness with a name and address but no description, no service areas, no hours, no price range, no cuisine type, no founding date, no social profiles, no aggregate rating, no opening hours specification. The crawler reads it, extracts the minimum, and moves on. The entity model it builds is skeletal. A skeletal entity model produces weak citations.
Static schema. Schema written once and never touched again. A business that has been open for eight years, changed its hours, added a patio, started serving brunch, hired a new chef, won three local awards — and has the exact same JSON-LD block it had in 2018. The gap between what the schema says and what the business actually is grows every year. AI systems that encounter contradictions between schema and other signals reduce their confidence in the source. Reduced confidence means fewer citations.
Isolated schema. Schema that lives on one page and points nowhere. No sameAs connections to verified external profiles. No links to related entities. No relationships declared between the business and its industry, its neighborhood, the people who run it. A JSON-LD block that describes one entity in isolation is a page. A JSON-LD block that connects that entity to a graph of related entities is an island with a map. The treasure hunter can actually navigate it. The crawler finds relationships to follow. Every relationship followed is another confirmation of the entity's existence and authority.
None of this is a criticism of agencies or developers who implement standard schema. Standard schema is better than no schema. The baseline is real and it matters. Our methodology builds on top of the baseline. We score what exists, identify what is missing, and inject edges that connect the entity to a live federation graph of hundreds of thousands of real businesses across America. The result is schema that does not just declare — it connects.
The thin version gets generated in thirty seconds by a plugin. The rich version requires research, industry knowledge, and understanding of how AI systems build entity models. That gap is where we live. And our federation members get something neither version can produce alone — edges injected from a live graph of hundreds of thousands of real businesses that share their semantic context. Their schema does not just declare. It connects.
We arrived at this methodology backwards. We were treasure hunters first. We spent years crawling websites, extracting structured data, normalizing it, classifying it, scoring it against our own lexicon. We did what search engines and AI systems do — by hand — until we understood it well enough to build the infrastructure ourselves.
What we found in that process was consistent and damning. The majority of websites — including websites that looked professionally built and were being actively maintained — had structured data that scored poorly against the vocabulary standards that AI systems actually use to build entity models. The schema existed. It was just thin, static, and isolated. The checkbox had been ticked. The work had not been done.
Our methodology has four stages. Every federation member goes through all four. Every stage compounds on the previous one.
Stage one — Score. We extract every JSON-LD block currently on your website. We run it through our schema vocabulary lexicon — a curated scoring system built from hundreds of thousands of real business entity records across 24 industry pillars. We score your current implementation against the full vocabulary available for your business type. Most sites score between 12% and 35% of available schema coverage. We show you exactly what is missing and why it matters.
Stage two — Enrich. You complete a structured questionnaire about your business. Every answer becomes a data point that feeds schema generation. We produce a complete, deeply researched JSON-LD block for your business type — not generated from a template, built from your actual data against the full schema.org vocabulary for your entity class. A dentist gets dentist schema. A law firm gets law firm schema. A restaurant gets restaurant schema. Each one built to the depth that AI systems reward.
Stage three — Inject. This is where our methodology diverges from every other schema service on the internet. We run your entity against our Pydantic deterministic RAG system — a pipeline that fetches semantically relevant edges from our indexed entity buckets. Hundreds of thousands of real business entities across America, indexed and classified. A dentist in San Diego gets edges from the dental entity cluster — schema patterns used by verified dental businesses in Kentucky, Texas, Florida, California. The health regulations that govern dental practice from OakMorel. The economic indicators for the San Diego dental market. These relationships get baked directly into your JSON-LD as confirmed semantic edges. Your schema does not just declare what you are. It connects you to a living graph of everything related to what you are.
Stage four — Connect. Your entity record becomes a node in the RankWithMe federation graph. Every new entity added to the federation immediately becomes available for edge-building against every existing entity across all domains. Your island gets wired into the torus network. LarryBrin arrives at your island and finds a crawler's paradise — not just one page, but a verified identity connected to an entire graph of related entities, laws, regulations, and industry context. That is what we mean by structure before ads. Always.
Every schema audit we run, every schema block we generate, every coverage score we produce — it all runs against the same standard. We call it the Schema Vocabulary Lexicon. It is a curated JSON file that maps every meaningful schema.org property to every business entity type we cover across our 24 industry pillars, weighted by AI citation importance, search engine rich result eligibility, and federation edge-building value.
We built it because it did not exist. Schema.org publishes 827 types and 1,528 properties. That is the full vocabulary. What it does not tell you is which properties actually matter for a dentist in San Diego versus a restaurant in Covina versus a law firm in New York. The vocabulary is exhaustive. The prioritization is absent. Every webmaster and every agency is left to guess which fields to implement and in what order.
Our lexicon answers that question with data. We indexed hundreds of thousands of real business entity records across America. We analyzed which schema properties appeared most consistently on the highest-performing entities in each industry — the ones getting cited in AI responses, ranking in Google AI Overviews, appearing in knowledge panels. We weighted those properties by citation frequency, rich result eligibility per Google's documentation, and federation edge-building potential. The result is a priority-ordered field map for every entity type we cover.
When we score your current schema against the lexicon, we are comparing what you have against what the best-performing entities in your industry actually use. When we generate schema for a federation member, we target 85% lexicon coverage minimum. When we inject edges, we are pulling from the same lexicon that scored the entities those edges come from.
The lexicon is the standard we hold ourselves to. It is also the standard we make public. You can download it, fetch it programmatically, and use it to score your own website against our methodology. We publish it because we believe the web should be readable by everyone — and that means the tools for making it readable should be accessible too.
If you are a developer or an agency, the lexicon is the fastest way to understand what we mean by coverage. If you are a business owner, the coverage score is the number that tells you how visible your business is to the machines that are deciding your discoverability right now.
The lexicon is a living document. Every new entity type we index, every new schema property we confirm matters for AI citation, every new industry pillar we add — the lexicon gets updated. It is versioned. It is published. It is the standard the entire federation runs on.
If you want to score your own website before reaching out to us — download the lexicon, find your entity type, count your implemented fields against the priority list, and divide. That number is your current machine-readability score. Most sites score under 30%. Federation members target 85% and above.
We believe in JSON-LD. Our entire federation runs on it. Every entity record we produce is grounded in it. If you do one thing after reading this page — implement JSON-LD on your website. It is the highest-leverage technical action available to you right now for AI era discoverability.
And then we will tell you what we also believe — which is that JSON-LD as it currently exists has a ceiling.
JSON-LD is a snapshot. It declares what you are at the moment it was written. It sits on your page, static, waiting to be read. It builds zero relationships on its own. A thousand businesses can each have perfect JSON-LD on their individual websites and those thousand records will never talk to each other, never confirm relationships between each other, never build the cross-domain graph that makes AI systems reason about an industry the way they reason about Wikipedia.
This is the ceiling we hit in our own research. And it is why we built Root-LD.
Root-LD is a federated semantic linked data specification — a framework for building a knowledge graph that grows smarter every time a new entity joins it. Every entity we mint gets run through four passes. The first pass is deterministic — pure data, exact matches, confidence 1.0. The second pass is lexical — shared vocabulary across domains, weighted by density. The third pass is semantic — a language model reasoning about proposed relationships and scoring them. The fourth pass is proprietary — a model fine-tuned specifically on the confirmed relationships inside this graph, finding connections that no general-purpose model could find because they only exist here.
We call this Compound Reasoning. The graph gets more intelligent as it gets larger. Every new business entity added to the federation immediately becomes available for edge-building against every existing entity across all domains. Every entity makes every other entity more valuable. This is infrastructure at a level that has no name yet in the industry because nobody has built it before.
The full specification is public. Every decision we made, every layer of the architecture, every edge type in the taxonomy — documented, published, open. Because we believe the web should be readable by everyone.
JSON-LD declares. Root-LD connects. The declaration is the foundation. The connections are the compound interest. Every business that joins the federation adds to the graph. Every addition to the graph makes every existing entity more valuable. This is what we mean when we say the ground is shifting and we are preparing the new bedrock.
The following are production-ready JSON-LD examples for six of our 24 industry pillars. Every block is built to our lexicon standard — targeting 85%+ field coverage for the entity type, structured for rich result eligibility, and formatted for federation edge injection. These are real starting points, not toy examples.
Copy any block directly into the <head> of your HTML inside a <script type="application/ld+json"> tag. Replace the placeholder values with your actual business data. Validate using the Google Rich Results Test before deploying.
These examples represent the baseline. Federation members get the enriched version — edges injected from our live graph, relationships confirmed across 24 industry pillars, coverage scored and gap-filled against the full lexicon for their specific entity type and geographic market.
These six examples represent a fraction of our 24 industry pillars. Every pillar has its own full lexicon entry, its own priority field map, and its own edge injection profile. If your business type is not shown here — it is in the directory. Submit your site and we will score what you have against the full lexicon for your entity class.
Everything on this page is sourced. The history, the statistics, the technical specifications — all of it traces back to primary documents that are publicly available and permanently archived. This is what we mean by provenance. Not a claim. A source.
We list these references the same way Wikipedia does — inline, accessible, verifiable. If you want to go deeper on any topic covered on this page, the primary source is linked below. If you want to understand the full technical specification of JSON-LD at the W3C level, it is there. If you want to read the original ACM paper by the people who created schema.org, it is there. If you want to see Google's own case study data, it is there.
This is a living reference page. As our research produces new findings, as schema.org publishes new versions, as the AI citation landscape evolves — this page will be updated. The date at the bottom reflects the last verified update. All external links are checked on each update cycle.
