Back to top

    The agentic web is here: Why NLWeb makes schema your greatest SEO asset

    Microsoft’s NLWeb bridges websites and AI agents. Learn how to make your schema work harder — powering smarter discovery and visibility.

    The web’s purpose is shifting. Once a link graph – a network of pages for users and crawlers to navigate – it’s rapidly becoming a queryable knowledge graph

    For technical SEOs, that means the goal has evolved from optimizing for clicks to optimizing for visibility and even direct machine interaction.

    Enter NLWeb – Microsoft’s open-source bridge to the agentic web

    At the forefront of this evolution is NLWeb (Natural Language Web), an open-source project developed by Microsoft. 

    NLWeb simplifies the creation of natural language interfaces for any website, allowing publishers to transform existing sites into AI-powered applications where users and intelligent agents can query content conversationally – much like interacting with an AI assistant.

    Developers suggest NLWeb could play a role similar to HTML in the emerging agentic web

    Its open-source, standards-based design makes it technology-agnostic, ensuring compatibility across vendors and large language models (LLMs). 

    This positions NLWeb as a foundational framework for long-term digital visibility.

    Schema.org is your knowledge API: Why data quality is the NLWeb foundation

    NLWeb proves that structured data isn’t just an SEO best practice for rich results – it’s the foundation of AI readiness. 

    Its architecture is designed to convert a site’s existing structured data into a semantic, actionable interface for AI systems. 

    In the age of NLWeb, a website is no longer just a destination. It’s a source of information that AI agents can query programmatically.

    The NLWeb data pipeline

    The technical requirements confirm that a high-quality schema.org implementation is the primary key to entry.

    Data ingestion and format

    The NLWeb toolkit begins by crawling the site and extracting the schema markup. 

    The schema.org JSON-LD format is the preferred and most effective input for the system. 

    This means the protocol consumes every detail, relationship, and property defined in your schema, from product types to organization entities. 

    For any data not in JSON-LD, such as RSS feeds, NLWeb is engineered to convert it into schema.org types for effective use.

    Semantic storage

    Once collected, this structured data is stored in a vector database. This element is critical because it moves the interaction beyond traditional keyword matching. 

    Vector databases represent text as mathematical vectors, allowing the AI to search based on semantic similarity and meaning. 

    For example, the system can understand that a query using the term “structured data” is conceptually the same as content marked up with “schema markup.” 

    This capacity for conceptual understanding is absolutely essential for enabling authentic conversational functionality.

    Protocol connectivity

    The final layer is the connectivity provided by the Model Context Protocol (MCP). 

    Every NLWeb instance operates as an MCP server, an emerging standard for packaging and consistently exchanging data between various AI systems and agents. 

    MCP is currently the most promising path forward for ensuring interoperability in the highly fragmented AI ecosystem.

    The ultimate test of schema quality

    Since NLWeb relies entirely on crawling and extracting schema markup, the precision, completeness, and interconnectedness of your site’s content knowledge graph determine success.

    The key challenge for SEO teams is addressing technical debt. 

    Custom, in-house solutions to manage AI ingestion are often high-cost, slow to adopt, and create systems that are difficult to scale or incompatible with future standards like MCP. 

    NLWeb addresses the protocol’s complexity, but it cannot fix faulty data. 

    If your structured data is poorly maintained, inaccurate, or missing critical entity relationships, the resulting vector database will store flawed semantic information. 

    This leads inevitably to suboptimal outputs, potentially resulting in inaccurate conversational responses or “hallucinations” by the AI interface.

    Robust, entity-first schema optimization is no longer just a way to win a rich result; it is the fundamental barrier to entry for the agentic web. 

    By leveraging the structured data you already have, NLWeb allows you to unlock new value without starting from scratch, thereby future-proofing your digital strategy.

    NLWeb vs. llms.txt: Protocol for action vs. static guidance

    The need for AI crawlers to process web content efficiently has led to multiple proposed standards. 

    A comparison between NLWeb and the proposed llms.txt file illustrates a clear divergence between dynamic interaction and passive guidance.

    The llms.txt file is a proposed static standard designed to improve the efficiency of AI crawlers by:

    • Providing a curated, prioritized list of a website’s most important content – typically formatted in markdown.
    • Attempting to solve the legitimate technical problems of complex, JavaScript-loaded websites and the inherent limitations of an LLM’s context window.

    In sharp contrast, NLWeb is a dynamic protocol that establishes a conversational API endpoint. 

    Its purpose is not just to point to content, but to actively receive natural language queries, process the site’s knowledge graph, and return structured JSON responses using schema.org. 

    NLWeb fundamentally changes the relationship from “AI reads the site” to “AI queries the site.”

    AttributeNLWebllms.txt
    Primary goalEnables dynamic, conversational interaction and structured data outputImproves crawler efficiency and guides static content ingestion
    Operational modelAPI/Protocol (active endpoint)Static Text File (passive guidance)
    Data format usedSchema.org JSON-LDMarkdown
    Adoption statusOpen project; connectors available for major LLMs, including Gemini, OpenAI, and AnthropicProposed standard; not adopted by Google, OpenAI, or other major LLMs
    Strategic advantageUnlocks existing schema investment for transactional AI uses, future-proofing contentReduces computational cost for LLM training/crawling

    The market’s preference for dynamic utility is clear. Despite addressing a real technical challenge for crawlers, llms.txt has failed to gain traction so far. 

    NLWeb’s functional superiority stems from its ability to enable richer, transactional AI interactions.

    It allows AI agents to dynamically reason about and execute complex data queries using structured schema output.

    The strategic imperative: Mandating a high-quality schema audit

    While NLWeb is still an emerging open standard, its value is clear. 

    It maximizes the utility and discoverability of specialized content that often sits deep in archives or databases. 

    This value is realized through operational efficiency and stronger brand authority, rather than immediate traffic metrics.

    Several organizations are already exploring how NLWeb could let users ask complex questions and receive intelligent answers that synthesize information from multiple resources – something traditional search struggles to deliver. 

    The ROI comes from reducing user friction and reinforcing the brand as an authoritative, queryable knowledge source.

    For website owners and digital marketing professionals, the path forward is undeniable: mandate an entity-first schema audit

    Because NLWeb depends on schema markup, technical SEO teams must prioritize auditing existing JSON-LD for integrity, completeness, and interconnectedness. 

    Minimalist schema is no longer enough – optimization must be entity-first.

    Publishers should ensure their schema accurately reflects the relationships among all entities, products, services, locations, and personnel to provide the context necessary for precise semantic querying. 

    The transition to the agentic web is already underway, and NLWeb offers the most viable open-source path to long-term visibility and utility. 

    It’s a strategic necessity to ensure your organization can communicate effectively as AI agents and LLMs begin integrating conversational protocols for third-party content interaction.


    Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.


    About the Author

    Elmer Boutin

    Elmer Boutin

    Elmer is currently Digital Manager at Perry Johnson, Inc. where he supports Perry Johnson companies around the world with organic and paid search optimization. He has 25+ years of experience in digital marketing with a major focus on SEO. He worked with businesses from SMBs to Fortune 5-size, from local and global. Brands he has worked with include Rocket, Ford, Mars Corporation, Banfield Pet Hospital, Wilsonart, and Corner Bakery Cafe. Prior to his career in digital marketing, he was in the US Army for 14 years as a translator and intelligence analyst.