Validate or format XML (not HTML) with the free XML Formatter — runs entirely in your browser.
The Core Difference in Purpose
HTML (HyperText Markup Language) is a specific application for describing web pages. It has a fixed set of elements (<p>, <div>, <form>) with defined semantics, and browsers know how to render them visually.
XML (Extensible Markup Language) is a general-purpose data format. It has no predefined elements — you define your own vocabulary. XML carries data structure and meaning, not presentation. RSS feeds, SOAP messages, Android layouts, and Maven build files are all XML with application-specific vocabularies.
Side-by-Side Comparison
- Tag names
- HTML: fixed set defined by the spec (
<p>,<table>,<video>…).
XML: you define any tag names you need (<invoice>,<product>,<quantity>). - Case sensitivity
- HTML: tag and attribute names are case-insensitive —
<P>,<p>, and<P>are the same element.
XML: fully case-sensitive —<Note>and<note>are distinct elements that must each have matching closing tags. - Unclosed tags
- HTML: void elements like
<br>,<img>, and<input>have no closing tag and the browser handles them correctly.
XML: every element must close, either with</tagname>or as a self-closing empty element<tagname/>. - Attribute values
- HTML: attribute values may be unquoted if they contain no spaces, and boolean attributes like
disabledneed no value at all.
XML: all attribute values must be quoted (single or double) and all attributes must have explicit values. - Error handling
- HTML: browsers implement a detailed error-recovery algorithm — malformed HTML is displayed as best the browser can guess.
XML: parsers stop immediately on any well-formedness error and report a fatal error. There is no recovery. - Multiple root elements
- HTML: a page is implicitly rooted at
<html>, and parsers handle fragments gracefully.
XML: exactly one root element is required. A document with two top-level elements is invalid. - Whitespace handling
- HTML: collapsing of whitespace is defined per-element by CSS display rules.
XML: whitespace is significant and preserved by default unless the parser or application explicitly strips it.
The XHTML Middle Ground
XHTML is HTML reformulated as a valid XML application. An XHTML document must obey XML's strict rules — all tags closed, all attributes quoted, exactly one root element — while using HTML's element vocabulary (<p>, <div>, etc.).
XHTML was popular in the early 2000s as a stepping stone toward modular web formats. It fell out of favor with HTML5's arrival. You may still encounter XHTML in legacy codebases or feed validators — serve it as application/xhtml+xml, not text/html, to get strict XML parsing in browsers.
When to Use XML vs HTML
- Use HTML for anything displayed in a browser as a web page — content, forms, dashboards, and applications rendered visually by browser engines.
- Use XML for data interchange between systems (APIs, feeds, configuration), document formats with semantic structure (DOCX, SVG, EPUB), and any format that needs a custom schema.
- Do not try to parse HTML with an XML parser. Most real-world HTML is not well-formed XML — use an HTML-specific parser (like DOMParser with
text/html) for HTML, and an XML parser for XML.
SVG: Where XML and the Browser Meet
SVG (Scalable Vector Graphics) is the primary XML format that lives inside web pages. Inline SVG embedded in HTML must follow XML rules — self-closing tags, quoted attributes, no overlapping elements. This is one of the few cases where browser developers deal with strict XML parsing alongside lenient HTML parsing in the same document.
The XML Formatter can format, validate, and minify SVG files — paste the SVG content and click Format to clean up the indentation.