Skip to content

Embeddings

An embedding is a list of numbers that captures what data means.

Any piece of data—text, images, audio, code—can be converted into an embedding. The key property: similar meanings produce similar numbers, measurable as distance.

ComparisonDistance
”Dogs love fetch” vs “puppies enjoy catching balls”0.08 (close)
“Dogs love fetch” vs “the stock market collapsed”0.94 (far)

SOMA uses embeddings in two ways.

Data embeddings: When a submitter runs a model on their data, the model produces an embedding — a vector representing that data’s content. The distance between this embedding and a target determines how competitive the submission is. Closest wins.

Model embeddings: Each registered model also has an embedding representing its specialization — what domains it understands best. The network uses these model embeddings to assign models to targets via stake-weighted KNN. A model specializing in code gets assigned to code-related targets.

Embeddings turn the subjective question “is this data valuable?” into an objective distance calculation.