File size: 4,536 Bytes
fd0e5cf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | ---
license: mit
base_model:
- microsoft/deberta-v3-small
datasets:
- tgupj/tiny-router-data
---
# tiny-router
`tiny-router` is a compact experimental multi-head routing classifier for short, domain-neutral messages with optional interaction context. It predicts four separate signals that downstream systems or agents can use for update handling, action routing, memory policy, and prioritization.
## What it predicts
```
relation_to_previous: new | follow_up | correction | confirmation | cancellation | closure
actionability: none | review | act
retention: ephemeral | useful | remember
urgency: low | medium | high
```
The model emits these heads independently at inference time, plus calibrated confidences and an `overall_confidence`.
## Intended use
- Route short user messages into lightweight automation tiers.
- Detect whether a message updates prior context or starts something new.
- Decide whether action is required, review is safer, or no action is needed.
- Separate disposable details from short-term useful context and longer-term memory candidates.
- Prioritize items by urgency.
Good use cases:
- routing message-like requests in assistants or productivity tools
- triaging follow-ups, corrections, confirmations, and closures
- conservative automation with review fallback
Not good use cases:
- fully autonomous high-stakes action without guardrails
- domains that need expert reasoning or regulated decisions
## Training data
This checkpoint was trained on the synthetic dataset split in:
- `data/synthetic/train.jsonl`
- `data/synthetic/validation.jsonl`
- `data/synthetic/test.jsonl`
The data follows a structured JSONL schema with:
- `current_text`
- optional `interaction.previous_text`
- optional `interaction.previous_action`
- optional `interaction.previous_outcome`
- optional `interaction.recency_seconds`
- four label heads under `labels`
## Model details
- Base encoder: `microsoft/deberta-v3-small`
- Architecture: encoder-only multitask classifier
- Pooling: learned attention pooling
- Structured features:
- canonicalized `previous_action` embedding
- `previous_outcome` embedding
- learned projection of `log1p(recency_seconds)`
- Head structure:
- dependency-aware multitask heads
- later heads condition on learned summaries of earlier head predictions
- Calibration:
- post-hoc per-head temperature scaling fit on validation logits
This checkpoint was trained with:
- `batch_size = 32`
- `epochs = 20`
- `max_length = 128`
- `encoder_lr = 2e-5`
- `head_lr = 1e-4`
- `dropout = 0.1`
- `pooling_type = attention`
- `use_head_dependencies = true`
## Current results
Held-out test results from `artifacts/tiny-router/eval.json`:
- `macro_average_f1 = 0.7848`
- `exact_match = 0.4570`
- `automation_safe_accuracy = 0.6230`
- `automation_safe_coverage = 0.5430`
- `ECE = 0.3440`
Per-head macro F1:
- `relation_to_previous = 0.8415`
- `actionability = 0.7982`
- `retention = 0.7809`
- `urgency = 0.7187`
Ablations:
- `current_text_only = 0.7058`
- `current_plus_previous_text = 0.7478`
- `full_interaction = 0.7848`
Interpretation:
- interaction context helps
- actionability and urgency are usable but still imperfect
- high-confidence automation is possible only with conservative thresholds
## Limitations
- The benchmark is task-specific and internal to this repo.
- The dataset is synthetic, so distribution shift to real product traffic is likely.
- Label quality on subtle boundaries still matters a lot.
- Confidence calibration is improved but not strong enough to justify broad unattended automation.
## Example inference
```json
{
"relation_to_previous": { "label": "correction", "confidence": 0.94 },
"actionability": { "label": "act", "confidence": 0.97 },
"retention": { "label": "useful", "confidence": 0.76 },
"urgency": { "label": "medium", "confidence": 0.81 },
"overall_confidence": 0.87
}
```
## How to load
This repo uses a custom checkpoint format. Load it with this project:
```python
from tiny_router.io import load_checkpoint
from tiny_router.runtime import get_device
device = get_device(requested_device="cpu")
model, tokenizer, config = load_checkpoint("artifacts/tiny-router", device=device)
```
Or run inference with:
```bash
uv run python predict.py \
--model-dir artifacts/tiny-router \
--input-json '{"current_text":"Actually next Monday","interaction":{"previous_text":"Set a reminder for Friday","previous_action":"created_reminder","previous_outcome":"success","recency_seconds":45}}' \
--pretty
``` |