Early access — all features included, free during beta

Stop guessing.
Pre-flight your training.

For PyTorch and HuggingFace teams spending $1k+/month on GPUs. Get VRAM fit, parallelism strategy, and cost estimate in one 2-minute workflow.

✓No code changes✓Works offline✓Estimates, not promises✓Air-gapped compatible

Try the demo Sign up free

terminalpre-flight scan

HOW IT WORKS

Three commands. Zero code changes.

Install

$ pip install alloc

One pip install. No Docker, no sidecar, no config files required.

Scan

$ alloc ghost train.py

Ghost Scan estimates VRAM, checks GPU feasibility, and surfaces optimization opportunities. No GPUs needed.

Run

$ alloc run python train.py

Wraps your training for a quick calibration. Captures real GPU metrics in ~60 seconds.

CI FOR TRAINING

Add a pre-flight check to every PR.

.github/workflows/alloc.yml

name: Training Pre-flight
on: [pull_request]

jobs:
  alloc-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install alloc
      - run: alloc ghost train.py --json
      - run: |
          alloc run python train.py \
            --upload

✓

Catch OOMs before they burn GPU hours

Ghost Scan estimates VRAM from the model graph. No GPUs required.

✓

Flag performance regressions across commits

Track throughput, step time, and GPU utilization over time.

✓

Estimate cost impact before merging

Know what a config change costs before it hits production.

✓

Block over-budget merges

Set budget caps per project. Alloc fails the check if the run exceeds it.

✓

Air-gapped compatible

All profiling runs locally. Upload to the dashboard is optional.

WHAT ALLOC DOES

An intelligence layer for training infrastructure.

Pre-flight forecast

Estimate VRAM, runtime, and cost range before launch. Know what hardware fits before the job hits the queue.

Right-sizing

Pick the smallest hardware that meets your SLA. Stop over-provisioning H100s when an A100 or L40S would do the job at half the cost.

Straggler + bottleneck diagnosis

Find data pipeline stalls, communication overhead, and low GPU utilization. Get actionable suggestions, not just charts.

DIFFERENTIATION

Why not just use PyTorch Profiler?

Profilers (torch.profiler, Nsight, etc.)

•Flame graphs and kernel traces
•Requires code changes to instrument
•Gives you data, not answers
•You still have to interpret the output
•No GPU selection or cost estimation

Alloc

✓Actionable suggestions: "use FSDP on 2xA100"
✓Zero code changes — wraps your existing command
✓Tells you what to change, not just what happened
✓Searches over GPU types, strategies, and configs for you
✓Cost + runtime estimates before you spend a dollar

Profilers are microscopes. Alloc is a flight plan.

PRODUCT TIERS

Pre-flight predictability, not guesswork.

Analyze your entire training pipeline — data loading, GPU utilization, communication overhead, and parallelism strategy — before the job hits the queue. Reduce expensive failures and "Pending forever" jobs.

Ghost Scan

Free

VRAM forecast, feasibility check, and minimum GPU recommendation. Runs locally in seconds. No GPUs burned.

✓Peak VRAM + activation estimate
✓DDP/FSDP strategy feasibility
✓Avoid OOMs before they happen

Confidence

~80%

Alloc Probe

Free

Run 10-50 steps on real hardware to measure actual utilization. Pair with code diagnosis for full coverage.

✓Real GPU utilization + peak VRAM
✓Step timing (p50/p90) + throughput
✓29 code diagnosis rules (data loading, precision, distribution)
✓Actionable patches via alloc diagnose --diff

Confidence

~85%

Fiscal Guard

Enterprise

Org-level budget caps, team spending limits, and cost tracking across all training runs. Know what you're spending.

✓Org → team → user budget hierarchy
✓Budget enforcement on upload (warn or block)
✓Cost-per-run tracking and right-sizing proposals

Visibility

Full

THE DIFFERENCE

Before vs. after.

Before Alloc

✕"Let me just try 4xA100s and see what happens"
✕OOM at step 89,000 of 90,000. Entire run wasted.
✕20 ablation runs to find the right config. 15 were DOA.
✕H100s at 18% utilization because the DataLoader is the bottleneck
✕Rogue jobs running for days. Nobody knows who launched them.

With Alloc

✓Estimate VRAM, cost, and runtime before you hit enter
✓OOM caught at scan time, not after hours of provisioning
✓Catch dead-on-arrival runs early. Only launch what fits.
✓Actionable bottleneck diagnosis: "set num_workers=8, not GPU-bound"
✓Budget guardrails catch runaway jobs before the invoice does

INTEGRATIONS

Fits your stack.

Alloc plugs into the tools you already use. No rip-and-replace.

PyTorchFramework

HuggingFaceFramework

LightningFramework

RaySoon

SlurmSoon

K8sSoon

AWSSoon

GitHubSoon

W&BSoon

SlackSoon

GET STARTED

Your training pipeline has
a missing stage.

Install in 10 seconds. Add a pre-flight check to every run. Ghost Scan is free, runs locally, catches failures before they cost you.