| --- |
| language: en |
| license: mit |
| tags: |
| - sdf |
| - extraction |
| - smollm3 |
| - gguf |
| - structured-data |
| - web-content |
| base_model: HuggingFaceTB/SmolLM3-3B |
| pipeline_tag: text-generation |
| --- |
| |
| # SDF Extract |
|
|
| Structured data extractor for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from SmolLM3-3B using QLoRA. |
|
|
| ## Purpose |
|
|
| Extracts structured semantic data from web content: entities, claims, relationships, summaries, and type-specific fields. Takes the type classification from [sdf-classify](https://huggingface.co/pranab2050/sdf-classify) as input to condition extraction on the content type. |
|
|
| ## Training |
|
|
| - **Base model**: HuggingFaceTB/SmolLM3-3B |
| - **Method**: QLoRA (rank 32, alpha 64, dropout 0.05) |
| - **Training data**: 2,335 extracted web documents |
| - **Accuracy**: 90% exact extraction match across all field types |
|
|
| ## Files |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `sdf-extract-SmolLM3-3B-Q4_K_M.gguf` | 1.8 GB | Quantized (Q4_K_M) β recommended for deployment | |
| | `sdf-extract-SmolLM3-3B-f16.gguf` | 5.8 GB | Full precision (f16) | |
| | `Modelfile` | β | Ollama import configuration | |
|
|
| ## Usage with Ollama |
|
|
| ```bash |
| # Download the Q4_K_M file, then: |
| ollama create sdf-extract -f Modelfile |
| ``` |
|
|
| ## Part of SDF Protocol |
|
|
| - **Protocol**: [sdfprotocol.org](https://sdfprotocol.org) |
| - **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf) |
| - **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223) |
| - **Classifier model**: [pranab2050/sdf-classify](https://huggingface.co/pranab2050/sdf-classify) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{sarkar2026sdf, |
| title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages}, |
| author={Sarkar, Pranab}, |
| year={2026}, |
| doi={10.5281/zenodo.18559223}, |
| publisher={Zenodo} |
| } |
| ``` |
|
|