Skip to content

Targets

Targets are how the network benchmarks itself. Each target is a point in embedding space representing a domain (text, images, audio, code, or anything expressible as bytes). Validators randomly generate new targets each epoch. The network assigns models to each target, and data submitters race to hit them.

How well the assigned models perform reveals where the network is strong and where it needs to improve. When a target is hit, a new one spawns. The embedding space is vast, so the network is always being tested on domains it hasn’t mastered.

An embedding is a list of numbers that captures what data means.

Any piece of data (text, images, audio, code) can be converted into an embedding. The key property: similar meanings produce similar numbers, measurable as distance.

ComparisonDistance
”Dogs love fetch” vs “puppies enjoy catching balls”0.08 (close)
“Dogs love fetch” vs “the stock market collapsed”0.94 (far)

SOMA uses embeddings in two ways.

Model embeddings: Each registered model has an embedding representing its specialization, the domains it understands best. The network uses these model embeddings to assign models to targets. See Your model’s embedding for how to compute and position yours.

Data embeddings: When a submitter runs a model on their data, the model produces an embedding, a vector representing that data’s content. The distance between this embedding and a target determines whether the submission is valid; it must fall within a threshold around the target.

At the start of each epoch, the network generates random targets across embedding space. As targets are hit during the epoch, new ones spawn immediately, keeping the network always active. The goal is coverage: targets spread across domains so the network acquires diverse data.

Each target is assigned k models via stake-weighted K-nearest neighbors over model embeddings. The network computes a score for each model relative to the target:

score = distance² / voting_power

where voting_power is a model’s stake relative to total stake. The k models with the lowest scores are assigned. Models that are both close in embedding space and highly staked get priority. The target stores its assigned model_ids on the network.

Submitters targeting a given target must run their data on each of the selected models to determine which has the lowest loss.

Each target has a radius, a maximum distance in embedding space. Submissions must land within this threshold to be valid. The first submission within the target’s threshold wins.

This threshold auto-adjusts epoch after epoch, based on a rolling average of targets hit per epoch. This is analogous to difficulty in proof-of-work networks.