This program is a dependency-free implementation of GPT-2. It loads
the weight matrix and BPE file out of the original TensorFlow files,
tokenizes the input with a simple byte-pair encoder,
implements a basic linear algebra package with matrix math operations,
defines the transformer architecture, performs transformer inference,
and un-tokenizes the output with the BPE decoder.
All in ~3000 bytes of C.
It’s optimized efficiently enough so that GPT-2 Small takes a few
seconds per reply on any modern machine. To do this I’ve implemented
KV caching and an efficient matrix multiplication algorithm,
with optional OMP parallelism.
You can then use this to create something like Chat GPT—just so long
as you don’t care about the quality of the output. (It’s actually
pretty terrible output, objectively speaking… But it does run.)
There are a
few quirks (especially with handling UTF-8 characters), and running
the XL size model at long context length can require ~100GB of RAM.
But if you’re just typing with ASCII using GPT2-Small it should run
just about anywhere.
I’ve uploaded the code to GitHub, so feel free to try and use it there.
This program is made up of the following main blocks (hover over each to see the coresponding code):
Basic matrix math library (700 bytes)
Fast matrix multiplication (300 bytes)
Neural network layers (300 bytes)
Transformer model (600 bytes)
Byte pair encoding (400 bytes)
I/O (200 bytes)
Weight loading (300 bytes)
Byte pair encoding loading (300 bytes)
#include
#include
#include
#include
int U,C,K,c,d,S,zz;char*bpe;typedef struct{float*i;int j,k;} A;void*E,*n;A*f;FILE*fp;
#define N(i,j)for(int i=0; i
GIPHY App Key not set. Please check settings