Regarding the video, at first I thought it was a joke because they looked like tokenized words haha
The 10% speed and VRAM usage improvements sound absolutely revolutionary. It would really be a massive breakthrough if you pull it off.
Also, I commented on your post on Twitter, but I'll say it here too: this would work absolutely wonders for speech-to-text and text-to-speech since it also has baked in IPA phonemes. You should definitely consider exploring that angle, because those spaces desperately need improvement.