Apollo: Oracle Model
Project Status
Phase: Hyperparameter Optimization & Dataset Preparation.
Recent Updates (Jan 2026)
- Hyperparameter Tuning: Analyzed token trade distribution to determine optimal model parameters.
- Max Sequence Length: Set to 8192. This covers >2 hours of high-frequency trading activity for high-volume tokens (verified against
HWVY...) and the full lifecycle for 99% of tokens. - Prediction Horizons: Set to 60s, 3m, 5m, 10m, 30m, 1h, 2h.
- Min Horizon (60s): Chosen to accommodate ~20s inference latency while capturing the "meat" of aggressive breakout movers.
- Max Horizon (2h): Covers the timeframe where 99% of tokens hit their All-Time High.
- Max Sequence Length: Set to 8192. This covers >2 hours of high-frequency trading activity for high-volume tokens (verified against
- Infrastructure:
- Updated
train.shto use these new hyperparameters. - Updated
scripts/cache_dataset.pyto ensure cached datasets are labeled with these horizons. - Verified
DataFetcherretrieves full trade histories (no hidden limits).
- Updated
Configuration Summary
| Parameter | Value | Rationale |
|---|---|---|
| Max Seq Len | 8192 |
Captures >2h of intense pump activity or full rug lifecycle. |
| Horizons | 60, 180, 300, 600, 1800, 3600, 7200 |
From "Scalp/Breakout" (1m) to "Runner/ATH" (2h). |
| Inference Latency | ~20s | Dictates the 60s minimum horizon. |
Usage
1. Cache Dataset
Pre-process data into .pt files with correct labels.
./pre_cache.sh
2. Train Model
Launch training with updated hyperparameters.
./train.sh
TODO: Future Enhancements
Multi-Task Quality Prediction Head
Add a secondary head (Head B) that predicts token quality percentiles alongside price returns:
- Fees Percentile — Predicted future fees relative to class median
- Volume Percentile — Predicted future volume relative to class median
- Holders Percentile — Predicted future holder count relative to class median
Rationale: The analyze_distribution.py script currently uses hard thresholds on future metrics to classify tokens as "Manipulated". This head would let the model learn to predict those quality metrics from current features, enabling scam detection at inference time without access to future data.
Approach Options:
- Single composite quality score (simpler)
- Three separate percentile predictions (more interpretable)
- Three binary classifications (fees_ok, volume_ok, holders_ok)
Data Sampling (Context Optimization) Replace hardcoded H/B/H limits with a dynamic sampling strategy that maximizes the model's context window usage.
The Problem Currently, the system triggers H/B/H logic based on a fixed 30k trade count and uses hardcoded limits (10k early, 15k recent). This mismatch with the model's max_seq_len (e.g., 8192) leads to inefficient data usage—either truncating valuable data arbitrarily or feeding too little when more could fit.
The Solution: Dynamic Context Filling Implementation moves to data_loader.py (since cache contains full history).
Algorithm Input: Full sorted list of events (Trades, Chart Segments, etc.) up to T_cutoff. Check: if len(events) <= max_seq_len , use ALL events. Split: If len(events) > max_seq_len : Reserve space for special tokens (start/end/pad). Calculate Budget: budget = max_seq_len - reserve (e.g., 8100). Dynamic Split: Head (Early): First budget / 2 events. Tail (Recent): Last budget / 2 events. Construct: [HEAD] ... [GAP_TOKEN] ... [TAIL]. Implementation Changes [MODIFY] data_loader.py Remove Constants: Delete HBH_EARLY_EVENT_LIMIT, HBH_RECENT_EVENT_LIMIT. Update _generate_dataset_item : Accept max_seq_len. Implement the split logic defined above before returning event_sequence.
Here explained easly:
We check all the final events if exeed the total context we have. Then we filter out all the trade events and then check how many non aggregable events we have, for example a burn or a deployer trade etc... Then we take the remaining from context exldued thosoe IMPORTANT events like i show above and we check how many snapshot will fit chart segment, holders snapshot, chain stats etc... Then the remaining after snapshot and important non aggregable events we use them to make the H segments (high definition) and in the middle (Blurry) we keep just the snapshots.
This works because 90% of context is taken just by trades and transfers so they are the only thing to compress to help context
you dont need new tokens becuase there are already special tokens for it: 'MIDDLE', 'RECENT'
so when you switch to blurry and when you go back to high definition you use