From doctest to runnable Markdown

Community Article Published April 4, 2026

One of Python's long-standing ideas is that examples in documentation should actually run. If a snippet appears in the docs, users should be able to copy it into a Python interpreter and get the same result.

This idea goes back more than twenty years and shaped how Python documentation evolved.

Recently, while working on the doc-builder project at Hugging Face, we added support for runnable code blocks in Markdown documentation. The goal is simple: documentation examples should never silently break as libraries evolve.

More importantly, the example itself should remain a primary source of truth: the same Markdown snippet does not need to be rewritten separately for documentation and testing, although projects can still duplicate examples when it is more practical.

This feature keeps the original spirit of executable documentation, but updates it for pytest and modern documentation workflows.

To see why that matters, it helps to look at the history of doctest.

The origin of doctest

The first widely used tool for executable documentation in Python was doctest, written by Tim Peters and added to the Python standard library in Python 2.1 (2001).

The idea was elegant. Documentation examples often looked like interactive interpreter sessions:

>>> add(2, 3)
5

doctest could parse these examples, run the code, and verify that the output matched the expected result. Documentation examples suddenly doubled as regression tests.

For Python, this was a natural fit. The language encouraged interactive exploration, and documentation frequently used interpreter-style examples. doctest simply automated the verification of those examples.

For small projects and simple APIs, this approach worked extremely well.

When documentation and tests collide

As Python projects grew larger, many developers started to notice a fundamental tension:

Good documentation and good tests are not the same thing.

Documentation examples aim to be:

short
readable
focused on teaching
free of noise

Tests, on the other hand, often need:

assertions
setup and teardown
fixtures
mocking
decorators
debugging tools
complex checks

doctest forced both concerns into the same format.

For example, documentation examples had to include expected output:

>>> my_function(42)
{'value': 42, 'status': 'ok'}

Even for simple cases, setup and teardown quickly add noise:

# setup
>>> import tempfile, os
>>> path = tempfile.mkstemp()[1]
>>> with open(path, "w") as f:
...     _ = f.write("hello")

# example
>>> open(path).read()
'hello'

# teardown
>>> os.remove(path)

This worked for simple cases but became painful in real-world scenarios. Early adopters like Zope 3, which heavily relied on doctest, ran into these limitations as their codebase grew.

output matching is brittle, so a small formatting change can break a test
setup code clutters examples
debugging failures means comparing strings instead of inspecting state

Over time, many projects moved their real testing to frameworks like pytest, while documentation examples remained simple snippets that were not executed automatically.

The result was a familiar problem: documentation drift.

Examples that used to work quietly break. Nobody notices until a user tries them, copies the code, and hits an error.

At that point, the documentation has already lost credibility.

A modern approach to executable documentation

The runnable blocks feature in doc-builder takes a different approach.

Instead of embedding tests inside documentation syntax, like doctest, it treats documentation snippets as normal Python code that happens to live in Markdown.

You can think of it as turning Markdown into a thin test container, rather than embedding tests into a custom documentation format.

A runnable block looks like a normal code example in Markdown:

```py runnable:quickstart
from transformers import pipeline

pipe = pipeline("sentiment-analysis")
result = pipe("I love runnable docs!")

if not result:  # doc-builder: hide
    raise ValueError("pipeline returned no result")

print(result[0]["label"])
assert result[0]["score"] > 0.5  # doc-builder: ignore-bare-assert
```

When documentation is rendered, it appears exactly like a regular code snippet.

During testing, runnable blocks are executed as normal Python code.

With hf-doc-builder installed, pytest automatically discovers them inside Markdown files. Each block becomes a standard test item.

Alternatively, you can expose a page through a regular Python test module using DocIntegrationTest.

In both cases, the Markdown stays the source of truth. Only the execution entry point changes.

For example, pytest can run documentation directly:

pytest -q docs/source/en/

Or you can expose one page through a regular Python test file:

from pathlib import Path

from doc_builder.testing import DocIntegrationTest


class MyPageDocIntegrationTest(DocIntegrationTest):
    doc_path = Path(__file__).resolve().parents[2] / "docs" / "source" / "en" / "my_page.md"

Each runnable block becomes a standard pytest test item. That means documentation examples can participate in normal project test infrastructure, instead of relying on a special-purpose mini-language like doctest.

In transformers, that flexibility is useful. The same runnable example can be exercised from a documentation-oriented suite or exposed through another test flow. The example itself can be written once in Markdown and reused across different test entry points, even if some projects choose to duplicate or adapt it in specific cases.

With the pytest plugin, failures appear like normal test failures.

Not every documentation example fits naturally in one large code fence. Tutorials often introduce setup in one snippet and build on it in the next. Continuation blocks let authors split those steps across multiple visible examples while sharing the same execution context during tests.

```py runnable:test_basic
processor = AutoProcessor.from_pretrained("suno/bark")
inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)
```

```py runnable:test_basic:2
inputs = processor("Amazing! I can speak English too.")
```

In that pattern, runnable:test_basic:2 is grouped with runnable:test_basic, so the later snippet can reuse the earlier setup without forcing the documentation to collapse everything into one long block.

Keeping documentation clean

One of the key design goals was not polluting documentation with test mechanics.

In the source markdown, authors can keep test-only directives next to the example:

```py runnable:test_basic
# pytest-decorator: transformers.testing_utils.slow
from transformers import pipeline

pipe = pipeline("sentiment-analysis")
result = pipe("I love this!")

if not result:  # doc-builder: hide
    raise ValueError("pipeline returned no result")

print(result[0]["label"])
assert result[0]["score"] > 0.5  # doc-builder: ignore-bare-assert
```

But the rendered documentation stays clean:

from transformers import pipeline

pipe = pipeline("sentiment-analysis")
result = pipe("I love this!")
print(result[0]["label"])
assert result[0]["score"] > 0.5

The runnable annotation is removed from the fence, # pytest-decorator: lines disappear, # doc-builder: hide lines stay executable without being shown, and the # doc-builder: ignore-bare-assert comment is stripped while the assertion itself remains visible.

This lets documentation tests integrate naturally with existing test infrastructure without turning the test suite layout into part of the authoring format.

Documentation tests that behave like real tests

Unlike traditional doctests, runnable documentation blocks execute as normal Python code, not interpreter transcripts.

That means:

failures produce standard pytest tracebacks
debugging works normally
complex assertions are possible
pytest decorators and skips work naturally

This is an evolution of the same idea that made doctest useful in the first place: examples should execute. The difference is that they now fit naturally into pytest and modern project workflows.

Why this matters for large projects

Large projects like transformers have hundreds of documentation pages and thousands of code examples.

Without automated checks, even small API changes can quietly break examples. Runnable documentation helps keep docs and code evolving together, so examples stay trustworthy at the scale where manual review stops being realistic.

Bringing executable documentation back

The goal behind this feature is the same one Python had when doctest was introduced more than two decades ago:

Documentation examples should actually work.

But instead of forcing documentation and testing into the same format, doc-builder separates the concerns:

documentation stays readable
tests stay powerful
examples remain executable
the source of truth stays in Markdown even when execution happens in different suites

The result is documentation that users can trust, because every example they see is continuously tested.

When documentation examples are continuously tested, they stop being aspirational and become something users can rely on.

And that's the point: examples should not just illustrate how code is supposed to work. They should prove that it does.

The first place we’re using this feature is in the GLM-ASR documentation. You can see a real example in the Transformers repository: markdown file and rendered doc

State of Open Source on Hugging Face: Spring 2026

March 17, 2026

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

February 3, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote