From tokens to embeddings
2025-02-06
To get token embeddings, text is tokenized into tokens (e.g., via BPE). Each token is assigned a unique ID. IDs are converted to one-hot vectors, then multiplied by an embedding matrix (learned during training) to produce dense, fixed-size token embeddings.
