Core Design

Great care went into making SOMA truly open.

Anyone can contribute intelligence
Anyone can check safety
Anyone can download the weights

We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as possible. – OpenAI, 2015

The Bandwidth Problem

Training AI requires extreme bandwidth. Moving data is one of the primary bottlenecks for training. The reason so much investment has gone into building larg interconnected GPU clusters to minimize the distance data needs to travel.

The internet is many orders of magnitude slower. This makes traditional training over the internet impractical. Various techniques try to reduce the bandwidth required for training, each with their caveats.

Ultimately training over the internet requires a fundamentally different approach.

Efficient Training over the Internet

AI models are trained using loss scores. A loss score defines how well an AI model is doing on some task. During training, the model improves its score. This is similar in concept to a competition.

Just as AI labs use benchmarks to compare the performance of their models relative to other labs, SOMA uses a carefully designed competition to check model performance.

Evaluating models on benchmarks consumes a small fraction of data relative to the amount of data used during training time.

The competition is sparse, and randomly distributed, sampling

Sparse competition can be highly informative, while requiring very little bandwidth.

By randomly sampling model performance based on

verification is cheap — the same asymmetry behind proof of work.

Competition Is Not Enough

Competition between many small models does not guarantee a strong frontier model. The competition must be carefully designed to encourage both model and data parallelism.

Data parallelism

Instead of one model training on all the data, you make several identical copies of the model and give each copy a different piece of the data to train on at the same time.

Model parallelism

The model is so huge it won’t fit on one GPU, so you cut the model into pieces and put each piece on a different GPU — they all work together to train one giant model.

There must be a competitive advantage to both specializing and unifying with the rest of the network.

Routing Is the Key

SOMA solves this coordination problem with intelligent routing.

Instead of forcing every participant to compete in the same domain or broadcast updates globally, SOMA treats the global model as a sparse, dynamically routed mixture-of-models.

Participants self-report an embedding — a compact vector representation that captures their specialization.

When new data is submitted, a lightweight router selects which models should compete based on the data.

This creates strong incentives:

Models specialize deeply in niches to win more routing probability and rewards.
Yet they still unify into the global frontier model — strong specialists get pulled in whenever relevant data appears, composing a far more capable whole.

The routing itself evolves as part of the broader competition: better-performing models earn more stake over time, naturally shifting influence toward effective specialists without any central coordinator.

This design delivers several critical properties:

Bandwidth efficiency — Only a sparse subset of models is activated per data item, minimizing communication during competition.
Specialization + unification — Nodes gain a competitive edge by becoming the best-in-class expert in a niche (data parallelism via specialization), but they also benefit enormously from being routed into the global mixture (model parallelism via composition).
Progressive scaling — The network starts with small models and coarse routing, then gradually composes larger, more capable mixtures as participation and data grow.

In many ways, this is akin to our own biological brains, which have dedicated areas for routing signals and areas that are highly specialized.