HTML — HyperText Markup Language

Status: 🟩 COMPLETE Last updated: 2026-06-20 Plain-English tagline: The language web pages are written in — it describes what’s on the page and how it’s structured, not how it looks.


In plain English

HTML is a markup language. That means it doesn’t do things the way a programming language does — it just describes things. Specifically, it describes the structure and content of a web page: this is a heading, this is a paragraph, this is a list, this is a link, this is an image.

If a web page were a magazine article, HTML would be the equivalent of writing it with editorial markup — “this line is the headline, this is a sub-heading, this is body text, this is a quote, this is a photo caption.” The browser’s job is to read that markup and turn it into something humans can look at.

HTML was invented by Tim Berners-Lee in 1989–1993 at CERN, originally as a way for scientists to share documents. It is one of three core languages of the web, alongside CSS (which controls how things look) and JavaScript (which makes things interactive). HTML alone gives you the structure; the other two add the visual style and the behaviour.

The current version is HTML5, which is a living standard maintained by the WHATWG. It is updated continuously, so there is no longer a “version number” like HTML 4 or HTML 5.1 — there is just “HTML.”


Why it matters

HTML is the substrate of the web. Every website you’ve ever visited — Wikipedia, Google, your bank, this encyclopedia rendered in a markdown viewer — ultimately becomes HTML before it reaches a browser. Frameworks like React and Next.js eventually compile down to HTML for the browser to render.

Understanding HTML matters because:

  1. Semantics affect accessibility and SEO. Using <button> vs <div onclick=...> is the difference between a screen reader announcing “button, submit” and announcing nothing at all. Using <h1> correctly is how Google’s crawler understands what your page is about.
  2. It’s the lowest layer where you can actually look. When something looks broken, opening DevTools and reading the rendered HTML is the most reliable debugging move.
  3. React and JSX look like HTML. If you understand HTML, you understand 80% of JSX. (The other 20% is the gotchas — see below.)

How it works

Elements and tags

An HTML document is made of elements. Each element is written with tags that wrap content:

<p>This is a paragraph.</p>
  • <p> is the opening tag.
  • </p> is the closing tag.
  • This is a paragraph. is the element’s content.

Some elements are void — they have no content and no closing tag (e.g. <br>, <img>, <input>).

Attributes

Tags can carry attributes, which are extra information about the element:

<a href="https://example.com" target="_blank">Visit example</a>
<img src="/photo.jpg" alt="A black cat sitting on a windowsill" />
  • href says where a link points.
  • src says where an image file lives.
  • alt describes the image for screen readers and when the image fails to load.

The page skeleton

Every HTML document has the same skeletal shape:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>My page</title>
  </head>
  <body>
    <h1>Hello, world</h1>
    <p>This is my first web page.</p>
  </body>
</html>

Breaking that down line by line:

  • <!DOCTYPE html> — tells the browser “this is HTML5.” Must be the first line.
  • <html lang="en"> — the root element of the whole document. lang helps screen readers and translation tools.
  • <head> — metadata about the page that the user doesn’t see directly: title, encoding, viewport settings, stylesheets, scripts.
  • <meta charset="utf-8"> — the character encoding. Without this, accented characters like “é” can break. See Text encodings & UTF-8.
  • <meta name="viewport" ...> — tells mobile browsers how to scale the page. Essential for responsive design.
  • <title> — the text shown in the browser tab.
  • <body> — everything the user actually sees.

The DOM tree

When a browser reads HTML, it builds a tree-shaped data structure in memory called the Document Object Model (DOM). Each element is a node; nested elements are children. This is what JavaScript reads and modifies when it changes a page after it loads.

html
├── head
│   ├── meta (charset)
│   ├── meta (viewport)
│   └── title
└── body
    ├── h1
    └── p

Block vs inline elements

Elements have a default display behaviour:

  • Block elements take up the full width of their container and start on a new line. Examples: <div>, <p>, <h1><h6>, <section>, <article>, <nav>, <header>, <footer>, <ul>, <ol>, <li>.
  • Inline elements flow within the surrounding text. Examples: <span>, <a>, <strong>, <em>, <img>, <code>.

You can change this behaviour with CSS (display: block, display: inline, display: flex, etc.). The HTML default is just a starting point.


A concrete example

A small but realistic page:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>George's Cat Blog</title>
  </head>
  <body>
    <header>
      <h1>George's Cat Blog</h1>
      <nav>
        <a href="/">Home</a>
        <a href="/about">About</a>
      </nav>
    </header>
 
    <main>
      <article>
        <h2>Why my cat is the best</h2>
        <p>Published <time datetime="2026-06-19">19 June 2026</time></p>
        <p>My cat is named <strong>Biscuit</strong>. He sleeps 18 hours a day.</p>
        <img src="/biscuit.jpg" alt="An orange tabby cat curled up on a blue blanket" />
      </article>
    </main>
 
    <footer>
      <p>&copy; 2026 George</p>
    </footer>
  </body>
</html>

Notice the semantic elements: <header>, <nav>, <main>, <article>, <footer>. These don’t change how things look — they tell browsers, search engines, and screen readers what each chunk of the page means. A <nav> is announced as “navigation”; a <main> lets a screen reader user jump straight to the content; an <article> tells Google “this is a self-contained piece of content.”

You could write the same page using only <div>s and it would look the same. But the semantic version is more accessible, more discoverable, and easier to read for the next person (or LLM) who works on it.


Common HTML elements — the working set

A short list of the elements that come up daily. Memorize this set and you can read 90% of any HTML page:

ElementPurpose
<h1> to <h6>Headings, biggest to smallest. Use one <h1> per page.
<p>Paragraph
<a href="...">Anchor / link
<img src="..." alt="...">Image
<ul> / <ol> / <li>Unordered (bullet) and ordered (numbered) lists; <li> is each item
<div>Generic block container — when no semantic element fits
<span>Generic inline container
<strong> / <em>Bold (semantically: strong importance) / italic (semantically: emphasis)
<button>A button. Use this — not a clickable <div>
<input>A form field
<form>Wraps inputs that submit together
<label>A label tied to an input (essential for accessibility)
<header> / <main> / <footer> / <nav> / <article> / <section> / <aside>Semantic page regions

Common gotchas

  • HTML is forgiving — sometimes too forgiving. Browsers will try to render even badly-formed HTML. This makes errors easy to miss. Use the W3C validator when something looks off but you can’t tell why.

  • alt is not optional. Every <img> should have one. If the image is decorative, set alt="" (an empty string) explicitly. Missing alt text breaks screen readers and hurts SEO.

  • Don’t use headings as styling. <h2> is not “make this medium-sized.” It’s “this is the second-level heading of the document.” If you want big bold text, use CSS — not the wrong heading level.

  • Self-closing tags differ between HTML and JSX. In plain HTML5, void elements can be written <br> or <br />. In JSX (React), you must write <br /> — the slash is required. Forgetting this is a common error when moving between the two.

  • Tag case sensitivity. HTML tag names are case-insensitive (<DIV> works), but JSX is case-sensitive and lowercase = HTML, capitalized = React component. So <button> is an HTML button; <Button> is a React component called Button. Mixing these up confuses both Claude and humans.

  • Inline style attributes vs CSS classes. Both work, but inline styles override stylesheets and are hard to override later. Prefer CSS classes (or Tailwind utility classes) unless you have a specific reason.

  • <button> defaults to type="submit" inside a form. That can accidentally submit the form. Add type="button" to any button that isn’t meant to submit.

  • HTML comments leak. <!-- comment --> is visible in “View Source.” Don’t put secrets there.

  • The <head> is not the same as <header>. <head> is the invisible metadata zone. <header> is the visible top section of a page or article. They are unrelated.


See also


Sources