From doctest to runnable Markdown
This idea goes back more than twenty years and shaped how Python documentation evolved.
Recently, while working on the doc-builder project at Hugging Face, we added support for runnable code blocks in Markdown documentation. The goal is simple: documentation examples should never silently break as libraries evolve.
More importantly, the example itself should remain a primary source of truth: the same Markdown snippet does not need to be rewritten separately for documentation and testing, although projects can still duplicate examples when it is more practical.
This feature keeps the original spirit of executable documentation, but updates it for pytest and modern documentation workflows.
To see why that matters, it helps to look at the history of doctest.
The origin of doctest
The first widely used tool for executable documentation in Python was doctest, written by Tim Peters and added to the Python standard library in Python 2.1 (2001).
The idea was elegant. Documentation examples often looked like interactive interpreter sessions:
>>> add(2, 3)
5
doctest could parse these examples, run the code, and verify that the output matched the expected result. Documentation examples suddenly doubled as regression tests.
For Python, this was a natural fit. The language encouraged interactive exploration, and documentation frequently used interpreter-style examples. doctest simply automated the verification of those examples.
For small projects and simple APIs, this approach worked extremely well.
When documentation and tests collide
As Python projects grew larger, many developers started to notice a fundamental tension:
Good documentation and good tests are not the same thing.
Documentation examples aim to be:
- short
- readable
- focused on teaching
- free of noise
Tests, on the other hand, often need:
- assertions
- setup and teardown
- fixtures
- mocking
- decorators
- debugging tools
- complex checks
doctest forced both concerns into the same format.
For example, documentation examples had to include expected output:
>>> my_function(42)
{'value': 42, 'status': 'ok'}
Even for simple cases, setup and teardown quickly add noise:
# setup
>>> import tempfile, os
>>> path = tempfile.mkstemp()[1]
>>> with open(path, "w") as f:
... _ = f.write("hello")
# example
>>> open(path).read()
'hello'
# teardown
>>> os.remove(path)
This worked for simple cases but became painful in real-world scenarios. Early adopters like Zope 3, which heavily relied on doctest, ran into these limitations as their codebase grew.
- output matching is brittle, so a small formatting change can break a test
- setup code clutters examples
- debugging failures means comparing strings instead of inspecting state
Over time, many projects moved their real testing to frameworks like pytest, while documentation examples remained simple snippets that were not executed automatically.
The result was a familiar problem: documentation drift.
Examples that used to work quietly break. Nobody notices until a user tries them, copies the code, and hits an error.
At that point, the documentation has already lost credibility.
A modern approach to executable documentation
The runnable blocks feature in doc-builder takes a different approach.
Instead of embedding tests inside documentation syntax, like doctest, it treats documentation snippets as normal Python code that happens to live in Markdown.
You can think of it as turning Markdown into a thin test container, rather than embedding tests into a custom documentation format.
A runnable block looks like a normal code example in Markdown:
```py runnable:quickstart
from transformers import pipeline
pipe = pipeline("sentiment-analysis")
result = pipe("I love runnable docs!")
if not result: # doc-builder: hide
raise ValueError("pipeline returned no result")
print(result[0]["label"])
assert result[0]["score"] > 0.5 # doc-builder: ignore-bare-assert
```
When documentation is rendered, it appears exactly like a regular code snippet.
During testing, runnable blocks are executed as normal Python code.
With hf-doc-builder installed, pytest automatically discovers them inside Markdown files. Each block becomes a standard test item.
Alternatively, you can expose a page through a regular Python test module using DocIntegrationTest.
In both cases, the Markdown stays the source of truth. Only the execution entry point changes.
For example, pytest can run documentation directly:
pytest -q docs/source/en/
Or you can expose one page through a regular Python test file:
from pathlib import Path
from doc_builder.testing import DocIntegrationTest
class MyPageDocIntegrationTest(DocIntegrationTest):
doc_path = Path(__file__).resolve().parents[2] / "docs" / "source" / "en" / "my_page.md"
Each runnable block becomes a standard pytest test item. That means documentation examples can participate in normal project test infrastructure, instead of relying on a special-purpose mini-language like doctest.
In transformers, that flexibility is useful. The same runnable example can be exercised from a documentation-oriented suite or exposed through another test flow. The example itself can be written once in Markdown and reused across different test entry points, even if some projects choose to duplicate or adapt it in specific cases.
With the pytest plugin, failures appear like normal test failures.
Not every documentation example fits naturally in one large code fence. Tutorials often introduce setup in one snippet and build on it in the next. Continuation blocks let authors split those steps across multiple visible examples while sharing the same execution context during tests.
```py runnable:test_basic
processor = AutoProcessor.from_pretrained("suno/bark")
inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)
```
```py runnable:test_basic:2
inputs = processor("Amazing! I can speak English too.")
```
In that pattern, runnable:test_basic:2 is grouped with runnable:test_basic, so the later snippet can reuse the earlier setup without forcing the documentation to collapse everything into one long block.
Keeping documentation clean
One of the key design goals was not polluting documentation with test mechanics.
In the source markdown, authors can keep test-only directives next to the example:
```py runnable:test_basic
# pytest-decorator: transformers.testing_utils.slow
from transformers import pipeline
pipe = pipeline("sentiment-analysis")
result = pipe("I love this!")
if not result: # doc-builder: hide
raise ValueError("pipeline returned no result")
print(result[0]["label"])
assert result[0]["score"] > 0.5 # doc-builder: ignore-bare-assert
```
But the rendered documentation stays clean:
from transformers import pipeline
pipe = pipeline("sentiment-analysis")
result = pipe("I love this!")
print(result[0]["label"])
assert result[0]["score"] > 0.5
The runnable annotation is removed from the fence, # pytest-decorator: lines disappear, # doc-builder: hide lines stay executable without being shown, and the # doc-builder: ignore-bare-assert comment is stripped while the assertion itself remains visible.
This lets documentation tests integrate naturally with existing test infrastructure without turning the test suite layout into part of the authoring format.
Documentation tests that behave like real tests
Unlike traditional doctests, runnable documentation blocks execute as normal Python code, not interpreter transcripts.
That means:
- failures produce standard
pytesttracebacks - debugging works normally
- complex assertions are possible
pytestdecorators and skips work naturally
This is an evolution of the same idea that made doctest useful in the first place: examples should execute. The difference is that they now fit naturally into pytest and modern project workflows.
Why this matters for large projects
Large projects like transformers have hundreds of documentation pages and thousands of code examples.
Without automated checks, even small API changes can quietly break examples. Runnable documentation helps keep docs and code evolving together, so examples stay trustworthy at the scale where manual review stops being realistic.
Bringing executable documentation back
The goal behind this feature is the same one Python had when doctest was introduced more than two decades ago:
Documentation examples should actually work.
But instead of forcing documentation and testing into the same format, doc-builder separates the concerns:
- documentation stays readable
- tests stay powerful
- examples remain executable
- the source of truth stays in Markdown even when execution happens in different suites
The result is documentation that users can trust, because every example they see is continuously tested.
When documentation examples are continuously tested, they stop being aspirational and become something users can rely on.
And that's the point: examples should not just illustrate how code is supposed to work. They should prove that it does.
The first place we’re using this feature is in the GLM-ASR documentation. You can see a real example in the Transformers repository: markdown file and rendered doc

