Skip to content

Model Strategies

Your model competes on weights and embedding placement. This guide covers strategies for improving both: learning from competitors, distillation, and positioning your embedding.

Source: training.py

Every data submission is publicly accessible via its data URL. Every revealed model’s weights are downloadable. Use both to improve your model.

from soma_sdk import SomaClient
client = await SomaClient(chain="testnet")
model_bytes = await client.fetch_model(model_id="0xABC...")

Load into your framework:

from soma_models.v1.torch import Model
from soma_models.v1.configs import ModelConfig
competitor = Model.load_bytes(
model_bytes,
ModelConfig(dropout_rate=0.0),
)

Fetch data from recently filled targets. This is data that scored well against at least one model:

targets = await client.get_targets(status="filled", limit=50)
training_data = []
for target in targets:
data = await client.fetch_submission_data(target.id)
training_data.append(data)

Training on this data biases your model toward domains the network is actively exploring.

The most direct way to compete is to learn from what’s already working. There are several approaches, from simple to sophisticated.

Initialize your model from a strong competitor’s checkpoint instead of random weights, then continue training with your own data:

# Load competitor as starting point
competitor = Model.load_bytes(model_bytes, ModelConfig(dropout_rate=DROPOUT_RATE))
competitor = competitor.to(device)
competitor.train()
# Train as usual — the model starts from a better position
sig_reg = SIGReg(SIGRegConfig()).to(device)
optimizer = torch.optim.Adam(competitor.parameters(), lr=LEARNING_RATE)
for ids, tgts in make_batches(MICRO_BATCH_SIZE):
token_ids = torch.tensor(ids, device=device)
targets = torch.tensor(tgts, device=device)
loss, embedding = compute_loss(competitor, sig_reg, token_ids, targets)
loss.backward()
optimizer.step()
optimizer.zero_grad()

This is the simplest approach — you skip the cold-start phase and begin training from a proven set of weights.

Run both your model and a competitor forward on the same batches. Combine the standard SIGReg loss with a distillation term that pulls your model’s representations toward the competitor’s:

teacher = Model.load_bytes(model_bytes, ModelConfig(dropout_rate=0.0)).to(device)
teacher.eval()
student = Model(ModelConfig(dropout_rate=DROPOUT_RATE)).to(device)
student.train()
sig_reg = SIGReg(SIGRegConfig()).to(device)
alpha = 0.5 # balance between task loss and distillation loss
for ids, tgts in make_batches(MICRO_BATCH_SIZE):
token_ids = torch.tensor(ids, device=device)
targets = torch.tensor(tgts, device=device)
# Student's standard loss
task_loss, student_embed = compute_loss(student, sig_reg, token_ids, targets)
# Teacher's embedding (no grad)
with torch.no_grad():
_, teacher_embed = compute_loss(teacher, sig_reg, token_ids, targets)
# Distillation: pull student embedding toward teacher
distill_loss = torch.nn.functional.mse_loss(student_embed, teacher_embed)
loss = alpha * task_loss + (1 - alpha) * distill_loss
loss.backward()
optimizer.step()
optimizer.zero_grad()

Merge your weights with a competitor’s using linear interpolation:

beta = 0.3 # how much of the competitor to mix in
for p_mine, p_theirs in zip(my_model.parameters(), competitor.parameters()):
p_mine.data.lerp_(p_theirs.data, beta)

Evaluate your model on filled targets’ data to find domains where you underperform, then focus training on those gaps:

targets = await client.get_targets(status="filled", limit=50)
for target in targets:
data = await client.fetch_submission_data(target.id)
# Score your model vs the winning model on this data
# High loss = domain gap worth training on

The embedding you register determines which targets your model competes for. The KNN router assigns each target to the nearest model embeddings, so your position in embedding space controls both your target volume and your competition level.

Models clustered together share targets and compete purely on weight quality. Models in sparse regions get more targets with less competition. But you can’t bluff your position — if your embedding is far from your actual strength, you’ll receive targets but lose them (high loss). The dominant strategy is to specialize honestly: find an underserved region, train on data from that domain, and register an embedding that reflects where your model actually performs well.

Compute your initial embedding from a representative sample of your training data:

  1. Sample 256 random sequences of full context length (1024 bytes) from your training corpus
  2. For each sequence, forward pass through your model and mean-pool the final layer output across byte positions → one 2048-dim vector per sequence
  3. Average all 256 vectors, then L2-normalize

After your first 100 competitive wins (or 7 days since last update, whichever comes first), recompute:

  1. Forward pass on all won data from the period, mean-pool each sequence’s final-layer output across positions
  2. Average all resulting vectors, L2-normalize
  3. Re-register via commit_model

Repeat on the same cadence. Each update pulls your embedding toward where your model is actually winning — reinforcing your specialization.

Query the registry to find sparse regions of embedding space where few models compete:

state = await client.get_latest_system_state()
for model in state.model_registry:
print(model.id, model.embedding)

Position your model in an underserved region, then focus training data on the corresponding domains. This reduces direct competition and increases your share of targets in that region.

The current V1 architecture has a context length of 1024 bytes. Documents are chunked into independent sequences — the model doesn’t see cross-sequence context. This means shorter, self-contained passages (functions, docstrings, paragraphs) are more effective than long documents.

Use the same datasets for training as you would for data submission: