Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
2
3
34
Asaf Delmedigo
asafd60
Follow
0 followers
Β·
15 following
delmedigo88
AI & ML interests
None yet
Recent Activity
reacted
to
mrs83
's
post
with π₯
about 9 hours ago
In 2017, my RNNs were babbling. Today, they are hallucinating beautifully. 10 years ago, getting an LSTM to output coherent English was a struggle. 10 years later, after a "cure" based on FineWeb-EDU and a custom synthetic mix for causal conversation, the results are fascinating. We trained this on ~10B tokens on a single AMD GPU (ROCm). It is not a Transformer: Echo-DSRN (400M) is a novel recurrent architecture inspired by Hymba, RWKV, and xLSTM, designed to challenge the "Attention is All You Need" monopoly on the Edge. The ambitious goal is to build a small instruct model with RAG and tool usage capabilities (https://huggingface.co/ethicalabs/Kurtis-EON1) π The Benchmarks (Size: 400M) For a model this size (trained on <10B tokens), the specialized performance is surprising: *SciQ*: 73.8% π¦ (This rivals billion-parameter models in pure fact retrieval). *PIQA*: 62.3% (Solid physical intuition for a sub-1B model). The Reality Check: HellaSwag (29.3%) and Winogrande (50.2%) show the limits of 400M parameters and 10B tokens training. We are hitting the "Reasoning Wall" which confirms we need to scale to (hopefully) unlock deeper common sense. As you can see in the visualization (to be released soon on HF), the FineWeb-EDU bias is strong. The model is convinced it is in a classroom ("In this course, we explore..."). The Instruct Model is not ready yet and we are currently using curriculum learning to test model plasticity. Source code and weights will not be released yet. This is not a fork or a fine-tune: the base model is built in-house at https://www.ethicalabs.ai/, with novel components that do not exist in current open libraries. π€ Call for Collaboration: I am looking for Peer Reviewers interested in recurrent/hybrid architectures. If you want to explore what lies beyond Transformers, letβs connect! Training diary: https://huggingface.co/ethicalabs/Kurtis-EON1
liked
a Space
17 days ago
merterbak/DeepSeek-OCR-Demo
liked
a Space
4 months ago
khang119966/DeepSeek-OCR-DEMO
View all activity
Organizations
None yet
asafd60
's models
48
Sort:Β Recently updated
asafd60/qwen-Curriculum
Image-Text-to-Text
β’
8B
β’
Updated
Feb 9, 2025
β’
1
asafd60/Qwen2-VL-plain-ocr
Image-Text-to-Text
β’
8B
β’
Updated
Feb 4, 2025
β’
2
asafd60/Qwen2.5-VL-heb-general
Image-Text-to-Text
β’
8B
β’
Updated
Jan 30, 2025
β’
1
asafd60/Qwen2_5-vl-json
Image-Text-to-Text
β’
8B
β’
Updated
Jan 29, 2025
asafd60/Qwen2.5-VL-DocVQA-Heb-100
Image-Text-to-Text
β’
8B
β’
Updated
Jan 29, 2025
asafd60/Qwen2.5-VL-fin_vqa
Image-Text-to-Text
β’
8B
β’
Updated
Jan 29, 2025
asafd60/HebQwen-json-2025-meta
Image-Text-to-Text
β’
8B
β’
Updated
Jan 15, 2025
asafd60/HebQwen-json-2025
Image-Text-to-Text
β’
8B
β’
Updated
Jan 15, 2025
asafd60/HebQwen-2025
Image-Text-to-Text
β’
8B
β’
Updated
Jan 15, 2025
asafd60/qwentext
Image-Text-to-Text
β’
8B
β’
Updated
Jan 14, 2025
β’
3
asafd60/Heb-Qwen-VL-7B-Instruct_New
8B
β’
Updated
Dec 17, 2024
asafd60/Heb-Qwen2-VL-Instruct-LoRA-half-precision
Image-Text-to-Text
β’
8B
β’
Updated
Sep 11, 2024
asafd60/HebQwen_LoRA_half_precision
Updated
Sep 11, 2024
asafd60/Heb-Qwen-VL-7B-Instruct
Updated
Sep 9, 2024
asafd60/HebQwen
Updated
Sep 9, 2024
asafd60/LiLT_Synth_Large_2.0
0.3B
β’
Updated
Aug 24, 2024
asafd60/LiLT_Large_CBS_QA
0.3B
β’
Updated
Aug 24, 2024
asafd60/LiLT_Synth_Large
0.3B
β’
Updated
Aug 23, 2024
asafd60/LiLT_Large
0.3B
β’
Updated
Aug 23, 2024
asafd60/LiltHeb_synth_man_6.0
0.3B
β’
Updated
Aug 22, 2024
asafd60/LayouXLM_synth_1.0_validsplit
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_1.0_validsplit
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_man_1.0_validsplit
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_man_5.0_validsplit
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_man_4.0
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_man_3.0
0.3B
β’
Updated
Aug 22, 2024
β’
1
asafd60/LiltHeb_synth_man_2.0
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_man_1.0
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_3.0_finetunedonheb
0.3B
β’
Updated
Aug 22, 2024
asafd60/LiltHeb_synth_3.0
0.3B
β’
Updated
Aug 21, 2024
Previous
1
2
Next