# Continuous Training

Every model on SOMA shares the same [architecture](https://docs.soma.org/concepts/models/#architecture). What differs is the weights. Train good weights, publish them via [commit-reveal](https://docs.soma.org/concepts/models/#publishing-weights), and earn commission every time your model has the lowest loss for a data submission.

The lifecycle is a loop: **train** → **commit** → epoch advances → **reveal** → **train more** → repeat. This guide covers deploying that loop on testnet so it runs continuously. If you haven't run through the localnet cycle yet, start with the [Quickstart](https://docs.soma.org/getting-started/quickstart/#step-2-train-on-localnet).

> Source: [`training.py`](https://github.com/soma-org/quickstart/blob/main/src/quickstart/training.py)
**Prerequisites:** - Quickstart repo cloned, dependencies installed, secrets configured ([Quickstart: Clone and configure](https://docs.soma.org/getting-started/quickstart/#clone-and-configure))
- Funded testnet account ([Installation & Setup](https://docs.soma.org/getting-started/install/))

## Kick off the first round

```
        uv run modal run src/quickstart/training.py --steps-per-round 500
        ```
    ```
        uv run modal run src/quickstart/training.py --steps-per-round 500 --framework flax
        ```
    This trains for 500 steps on an H100, then commits to the network. The `train_and_commit` function runs training followed by commit in a single GPU container:

```python
@app.function(
    image=gpu_image,
    gpu="H100",
    timeout=86400,
    volumes={MODEL_DIR: volume},
    secrets=[modal.Secret.from_name("soma-secrets")],
)
async def train_and_commit(
    steps: int = 500,
    localnet: bool = False,
    framework: str = "torch",
):
    """Train for N steps then commit. GPU is released after commit."""
    await do_training(steps, framework=framework)

    state = await do_commit(localnet=localnet)
    state["framework"] = framework
    save_training_state(state, MODEL_DIR)
    await volume.commit.aio()
    return state
```

The `main` entrypoint calls this remotely:

```python
@app.local_entrypoint()
def main(steps_per_round: int = 500, framework: str = "torch"):
    train_and_commit.remote(
        steps=steps_per_round, localnet=False, framework=framework,
    )
```

After this completes, your model is committed to the network and waiting for the next epoch to reveal.

## Deploy for continuous training

```
uv run modal deploy src/quickstart/training.py
```

Deploying activates the `reveal` cron job, which runs every 6 hours:

```python
@app.function(
    image=cpu_image,
    schedule=modal.Cron("0 */6 * * *"),  # every 6h — epochs are 24h
    timeout=600,
    volumes={MODEL_DIR: volume},
    secrets=[modal.Secret.from_name("soma-secrets")],
)
async def reveal(
    localnet: bool = False,
    auto_continue: bool = True,
    steps_per_round: int = 500,
):
    """Reveal the model if the epoch has advanced. Optionally spawn next round."""
    result = await do_reveal(localnet=localnet)

    if result is not None and auto_continue:
        fw = result.get("framework", "torch")
        train_and_commit.spawn(
            steps=steps_per_round,
            localnet=localnet,
            framework=fw,
        )
```

Each invocation: checks if the epoch advanced past the commit → reveals → spawns the next training round. The cycle then repeats automatically.
**Caution:** The reveal must happen in the epoch immediately after the commit. If you miss the window, the commit expires and you must start over. The 6-hour cron ensures multiple chances per epoch (epochs are 24 hours).

## Training details

### Data pipeline

`make_batches()` streams [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup) and tokenizes each source file into fixed-length byte sequences:

```python
def make_batches(batch_size: int):
    from soma_models.v1.configs import V1_MAX_SEQ_LEN
    from soma_models.v1.tokenizer import tokenize

    ds = load_dataset(
        "bigcode/the-stack-v2-dedup",
        split="train",
        streaming=True,
        token=os.environ.get("HF_TOKEN"),
    )
    ds = ds.shuffle(buffer_size=SHUFFLE_BUFFER)

    buffer_ids, buffer_targets = [], []
    for row in ds:
        sequences = tokenize(
            data=row["content"].encode("utf-8"), max_seq_len=V1_MAX_SEQ_LEN
        )
        for seq in sequences:
            buffer_ids.append(seq.token_ids)
            buffer_targets.append(seq.targets)
            if len(buffer_ids) == batch_size:
                yield buffer_ids, buffer_targets
                buffer_ids, buffer_targets = [], []
```

The vocabulary is 264 tokens: 256 byte values plus PAD (256) and EOS (257). Each sequence is `V1_MAX_SEQ_LEN = 1024` bytes. See [Models: Architecture](https://docs.soma.org/concepts/models/#architecture) for the full spec.

### Training loop

Resumes from the latest checkpoint, trains with gradient accumulation (64 micro-batches, effective batch size = 128), and saves artifacts for the CPU commit step:

```python
async def _do_training_torch(steps, model_dir, vol, grad_accum_steps, log_every):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    ckpt_path, ckpt_step = find_latest_checkpoint(model_dir, CHECKPOINT_PREFIX)
    if ckpt_path:
        model = Model.load(ckpt_path, ModelConfig(dropout_rate=DROPOUT_RATE))
        model = model.to(device)
        start_step = ckpt_step
    else:
        model = Model(ModelConfig(dropout_rate=DROPOUT_RATE)).to(device)
        start_step = 0

    model.train()
    sig_reg = SIGReg(SIGRegConfig()).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

    batches = make_batches(MICRO_BATCH_SIZE)

    for i in range(steps):
        optimizer.zero_grad()
        accum_loss = 0.0

        for _micro in range(grad_accum_steps):
            ids, tgts = next(batches)
            token_ids = torch.tensor(ids, device=device)
            targets = torch.tensor(tgts, device=device)
            loss, embedding = compute_loss(model, sig_reg, token_ids, targets)
            (loss / grad_accum_steps).backward()
            accum_loss += loss.item()

        optimizer.step()

    # Save checkpoint + artifacts for CPU commit
    model.save(f"{model_dir}/{CHECKPOINT_PREFIX}-{final_step}.safetensors")
    weights_bytes = model.save_bytes()
    save_training_artifacts(model_dir, final_step, embedding_list, weights_bytes)
```

    Uses a JIT-compiled micro-step with manual gradient accumulation:

```python
async def _do_training_flax(steps, model_dir, vol, grad_accum_steps, log_every):
    rngs = nnx.Rngs(0)
    model = Model(ModelConfig(dropout_rate=DROPOUT_RATE), rngs)
    model.train()
    sig_reg = SIGReg(SIGRegConfig(), rngs)
    optimizer = nnx.Optimizer(
        model, optax.adam(learning_rate=LEARNING_RATE), wrt=nnx.Param
    )

    @nnx.jit
    def micro_step(model, sig_reg, token_ids, targets):
        def loss_fn(model, sig_reg):
            return compute_loss(model, sig_reg, token_ids, targets)
        (loss, embedding), grads = nnx.value_and_grad(loss_fn, has_aux=True)(
            model, sig_reg
        )
        return loss, embedding, grads

    batches = make_batches(MICRO_BATCH_SIZE)

    for i in range(steps):
        accum_loss = jnp.zeros(())
        accum_grads = None

        for micro in range(grad_accum_steps):
            ids, tgts = next(batches)
            loss, embedding, grads = micro_step(
                model, sig_reg, jnp.array(ids), jnp.array(tgts)
            )
            accum_loss = accum_loss + loss
            accum_grads = grads if accum_grads is None else jax.tree.map(
                jnp.add, accum_grads, grads
            )

        accum_grads = jax.tree.map(lambda g: g / grad_accum_steps, accum_grads)
        optimizer.update(model, accum_grads)

    # Save checkpoint + artifacts for CPU commit
    model.save(f"{model_dir}/{CHECKPOINT_PREFIX}-{final_step}.safetensors")
    weights_bytes = model.save_bytes()
    save_training_artifacts(model_dir, final_step, embedding_list, weights_bytes)
```

    `save_training_artifacts()` saves the embedding and serialized weights to disk so the commit step can run on CPU without loading the model framework.

### Commit and reveal

After training, `do_commit()` encrypts the weights with AES-256-CTR, uploads to your bucket, and commits to the network. On the first commit, it calls `create_model()` with a 10% commission rate (1000 basis points). `do_reveal()` waits for the epoch to advance, then reveals the decryption key and embedding. See the [Quickstart walkthrough](https://docs.soma.org/getting-started/quickstart/#how-it-works-1) for the full code.

## Your model's embedding

The embedding you register determines which [targets](https://docs.soma.org/concepts/targets/#model-assignment) your model competes for. It's how the KNN router finds you. See [Model Strategies: Your model's embedding](https://docs.soma.org/guides/model-strategies/#your-models-embedding) for how to compute, update, and strategically position your embedding.

## Next steps

[Model Strategies](https://docs.soma.org/guides/model-strategies/)

[Data Strategies](https://docs.soma.org/guides/data-strategies/)

[Claim Rewards](https://docs.soma.org/guides/claim-rewards/)