Skip to main content

The lineup

Three models. One principle: yours.

Our models are open-weight. You download the files. They run on your hardware. No API calls, no cloud dependency.

flash-1-mini

Bilingual legal citation, compliance, edge & personal use

Free — forever

Tier 1 (laptop)

Specs

Parameters
4 billion
Context length
8K tokens
Recommended quantization
Q4_K_M (~2.7 GB)
Minimum hardware
Any laptop with 4+ GB RAM
License
Apache 2.0 — open weights, commercial use OK

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Multiple GGUF quantization levels
  • Tuned for Canadian legal citation accuracy
  • Vision-capable — reads images and text
  • Strong multi-part instruction-following
  • Runs on Pi-class to laptop hardware

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime

flash-1

Business daily driver, RAG, function calling

Free — forever

Tier 1–2 (laptop to workstation)

Specs

Parameters
9 billion
Context length
16K tokens
Recommended quantization
Q4_K_M (~5.5 GB)
Minimum hardware
8 GB RAM or entry GPU
License
Free — open weights, commercial use OK

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Multiple GGUF quantization levels
  • Balanced reasoning and speed
  • Function calling and document Q&A (RAG)
  • Quantizations from Q3_K_M to fp16

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
Download links open with launch

flash-1-pro

Enterprise, defense, complex reasoning, multi-user

Free — forever

Tier 2–4 (workstation to colo)

Specs

Parameters
27 billion
Context length
32K tokens
Recommended quantization
Q4_K_M (~16 GB)
Minimum hardware
24+ GB VRAM or 32+ GB RAM
License
Free — open weights, commercial use OK

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Multiple GGUF quantization levels
  • Strongest reasoning and instruction-following
  • Function calling and RAG for agentic workloads
  • Optimized for vLLM multi-user deployments

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
  • vLLM for multi-user inference
Download links open with launch

Where should I run this?

See the ownership spectrum

Coming next

flash-1 (9B), CBLRE v2, and a free showcase API arrive September 30, 2026. flash-1-pro (27B) and the open Canadian legal corpus follow March 31, 2027. Specialist variants are exploration, decided as opportunities arise.

Which model should you use?

Three questions. Two minutes. We don't need your email.

Solo user or a team?

Solo→ flash-1-mini or flash-1
Team→ flash-1 or flash-1-pro

Do you handle regulated data (legal, health, financial)?

Yes→ flash-1 minimum, flash-1-pro recommended
No→ flash-1-mini works for most cases

Air-gapped or multi-user deployment?

Yes→ flash-1-pro
No→ flash-1-mini or flash-1 is enough

Licensing

Free to download

Every model is free to download and run — personal, professional, or commercial. flash-1-mini ships under the Apache 2.0 license.

Yours to run, anywhere

No per-seat fees, no subscription, no usage metering. Once you've downloaded the weights, run them on as many machines as you like — laptop, servers, or your own cloud.

Enterprise & regulated industries

Need on-prem deployment, procurement paperwork, or sovereign hosting? We offer paid deployment support for regulated industries and defense — the models themselves stay free.

Questions you might be having

If you have one we missed, ask us directly.

What does "open-weight" actually mean?

You get the actual model files. You can inspect them, run them, fine-tune them. There is no hosted version of our models that you have to go through us for. The weights live on your machine.

What is GGUF? Do I need anything special to run it?

GGUF is a file format for AI models. It runs on Ollama, LM Studio, llama.cpp, and any compliant runtime. If you have a modern Mac, Windows, or Linux machine with the recommended RAM, you already have what you need. Download the file, point your runtime at it, done.

Are these fine-tuned, and on what base?

flash-1-mini is fine-tuned from an open, Apache-2.0-licensed Qwen base model — not Llama — and we attribute that base as the license requires (see the NOTICE file). That cuts both ways, honestly: the gains are real and measured against that exact base, and you still own what you download. Apache 2.0 is irrevocable — it lets you run, modify, and redistribute the model commercially, and nobody can pull the license out from under you. We fine-tune on Canadian infrastructure.

Can I commercialize the outputs?

Yes. Outputs are yours. Use them in work product, client deliverables, internal tools, products you sell. The model license covers running the model. It does not claim ownership of what you generate.

What about benchmarks?

Benchmarks publish with each model launch. flash-1-mini's are live now — see the model page for the full table, including the honest tradeoffs. We don't pre-announce numbers or cherry-pick; the regressions sit right next to the gains. Real numbers, on the actual model you can download.

How do model updates work?

You get a notification when a new version is available. You decide whether to download it. The version you already have continues to work forever — we cannot break or revoke a model you already downloaded.

What if SimpleDirect goes out of business?

Your models keep working. They are files on your hardware. We cannot revoke them, even by ceasing to exist. This is the entire point of open-weight ownership.

Does this work without internet?

Yes. Once downloaded, the model runs entirely on your hardware. Airplane mode is fine. So is an air-gapped network.

What operating systems are supported?

GGUF runtimes work on macOS, Windows, and Linux. To try the models without a local install, a free browser showcase API arrives September 30, 2026.

Is my data ever sent to you?

When you run a SimpleDirect model on your own hardware, no — your prompts never reach us. The one optional service we run is the free showcase API; the transparency page at /data explains exactly what it stores and doesn't. Self-deployed, your data never leaves your network.

Get notified when models ship

flash-1-mini is available now. flash-1 September 30, 2026. flash-1-pro March 31, 2027. One email per launch.

Join the waitlist