Embeddings
An embedding is a list of numbers that captures what data means.
Any piece of data—text, images, audio, code—can be converted into an embedding. The key property: similar meanings produce similar numbers, measurable as distance.
| Comparison | Distance |
|---|---|
| ”Dogs love fetch” vs “puppies enjoy catching balls” | 0.08 (close) |
| “Dogs love fetch” vs “the stock market collapsed” | 0.94 (far) |
How SOMA Uses Embeddings
Section titled “How SOMA Uses Embeddings”SOMA uses embeddings in two ways.
Data embeddings: When a submitter runs a model on their data, the model produces an embedding — a vector representing that data’s content. The distance between this embedding and a target determines how competitive the submission is. Closest wins.
Model embeddings: Each registered model also has an embedding representing its specialization — what domains it understands best. The network uses these model embeddings to assign models to targets via stake-weighted KNN. A model specializing in code gets assigned to code-related targets.
Embeddings turn the subjective question “is this data valuable?” into an objective distance calculation.