The Math You Need to Start Understanding LLMs
Once you peel back the hype and mysticism, large language models (LLMs) are a fascinating application of statistical models, effectively what you get when you dial a basic auto-complete model up to eleven. In order to analyze a mind-boggling amount of text and produce meaningful auto-completion results quite a bit of math is involved, with a recent three-part article series by [Giles] going through the basics of inference , being the prediction step using a trained model. The text is encoded in the LLM’s vector space as token IDs, each token being a text fragment that has some probability of following another ID, such as when cats may be found on desks, as in the above photo by [Giles]. With inference multiple of such IDs are retrieved in a vector from which in successive steps a sentence can be pieced together. These so-called logits are detailed in the first article in the series, with the second article focusing on vocabulary space and embedding, as well as the matrix operati...