Update README.md
Browse files
README.md
CHANGED
|
@@ -120,21 +120,21 @@ The new runs were not successful.
|
|
| 120 |
|
| 121 |
But we got real improvements. While the model still says unnecessary things, it now hallucinates much less and does some of the things correctly.
|
| 122 |
|
| 123 |
-
The run was on an L40S GPU. We don't know why we picked it to be honest.
|
| 124 |
|
| 125 |
But we found out that the data generator is flawed.
|
| 126 |
|
| 127 |
-
Normally LaaLM-exp-v1 was trained on scenarios where future commands actually got the things that happened from commands before.
|
| 128 |
|
| 129 |
-
So in the data generator if the generated data did `mkdir hello` in a later command there would be `ls` which would show `hello`. It was done by using a Linux simulator.
|
| 130 |
|
| 131 |
But here LaaLM-v2's data generator randomly generates data without caring about anything before.
|
| 132 |
|
| 133 |
-
So on it instead it
|
| 134 |
|
| 135 |
We will first try using LaaLM-exp-v1's training dataset generator here to see if our theory is correct.
|
| 136 |
|
| 137 |
-
We will give an update when results are out.
|
| 138 |
|
| 139 |
#### 4 March 2026 Update
|
| 140 |
|
|
@@ -144,10 +144,44 @@ But we thought in the meantime that Transformers are too heavy for what LaaLM do
|
|
| 144 |
|
| 145 |
So we are going to use a LSTM based architecture.
|
| 146 |
|
| 147 |
-
LSTMs are not designed for big models. But LaaLM is very simple so
|
| 148 |
|
| 149 |
But we can assure you that LaaLM-v2 is close to release.
|
| 150 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
---
|
| 152 |
|
| 153 |
## How to use it
|
|
|
|
| 120 |
|
| 121 |
But we got real improvements. While the model still says unnecessary things, it now hallucinates much less and does some of the things correctly.
|
| 122 |
|
| 123 |
+
The run was on an L40S GPU. We don't know why we picked it, to be honest.
|
| 124 |
|
| 125 |
But we found out that the data generator is flawed.
|
| 126 |
|
| 127 |
+
Normally, LaaLM-exp-v1 was trained on scenarios where future commands actually got the things that happened from commands before.
|
| 128 |
|
| 129 |
+
So in the data generator, if the generated data did `mkdir hello` in a later command, there would be `ls` which would show `hello`. It was done by using a Linux simulator.
|
| 130 |
|
| 131 |
But here LaaLM-v2's data generator randomly generates data without caring about anything before.
|
| 132 |
|
| 133 |
+
So, on it instead, it may do `cd hello`, but then does `touch hi` and goes on randomly generating. No persistence which is literally what LaaLM-v2 is for.
|
| 134 |
|
| 135 |
We will first try using LaaLM-exp-v1's training dataset generator here to see if our theory is correct.
|
| 136 |
|
| 137 |
+
We will give an update when the results are out.
|
| 138 |
|
| 139 |
#### 4 March 2026 Update
|
| 140 |
|
|
|
|
| 144 |
|
| 145 |
So we are going to use a LSTM based architecture.
|
| 146 |
|
| 147 |
+
LSTMs are not designed for big models. But LaaLM is very simple, so an LSTM is better for it.
|
| 148 |
|
| 149 |
But we can assure you that LaaLM-v2 is close to release.
|
| 150 |
|
| 151 |
+
#### 6 March 2026 Update
|
| 152 |
+
|
| 153 |
+
We have updated the codebase, but we have a problem.
|
| 154 |
+
|
| 155 |
+
Speed.
|
| 156 |
+
|
| 157 |
+
Speed is crucial to us because we need rapid experimentation. If we take too much time, the cloud bill will explode.
|
| 158 |
+
|
| 159 |
+
But that's the problem also.
|
| 160 |
+
|
| 161 |
+
We can't find the appropriate accelerator that is exactly what LaaLM needs.
|
| 162 |
+
|
| 163 |
+
We have been using an L40S, but we got well fed up with its slowness and ineffectiveness without FP8.
|
| 164 |
+
|
| 165 |
+
So we will use another accelerator. But there's not really any price-to-performance accelerator that can fit us.
|
| 166 |
+
|
| 167 |
+
For the data generator, we have a weird problem.
|
| 168 |
+
|
| 169 |
+
Technically, v2's data generator is actually better than exp-v1.
|
| 170 |
+
|
| 171 |
+
But for some reason, there's a leak somewhere we can't figure out that causes the model to just cheat.
|
| 172 |
+
|
| 173 |
+
Maybe it's our indicator tokens, or it's something subtle we can't figure out, but LaaLM-v2 definitely got us fed up with the whole LaaLM franchise.
|
| 174 |
+
|
| 175 |
+
We also have sad news for LaaLM today.
|
| 176 |
+
|
| 177 |
+
LaaLM-v2 will be the last model of the LaaLM series.
|
| 178 |
+
|
| 179 |
+
Our reason is that we have better projects to spend our compute on than a bash predictor that any other model can beat.
|
| 180 |
+
|
| 181 |
+
LaaLM has definitely been a fun experience for us, but we can't just spend precious compute for something this experimental and non-useful.
|
| 182 |
+
|
| 183 |
+
Maybe future models will keep coming but we definitely recommend you can stop expecting new models after LaaLM-v2.
|
| 184 |
+
|
| 185 |
---
|
| 186 |
|
| 187 |
## How to use it
|