Information Flows Through Transformers

dtagames 6 hours ago

If this is supposed to a simple or approachable (or even correct) explanation of LLMs, I think it misses the mark, especially in the last paragraph where the author seems to confuse the transformer's work of putting values into a model with the later, predictive steps of returning tokens when prompted.

Folks that say an LLM cannot "introspect on itself" are correct because the model's "learning" process consists of a series of combinations of assignments and adjustments to the model data. In other words, it's predictive soup all the way down.

I'm biased because I wrote it, but I think this is a better article[0]. I did so specifically because most explanations are awful, and on that point I agree with this author.

[0] Something From Nothing: A Painless Approach to Understanding AI -- https://medium.com/gitconnected/something-from-nothing-d755f...