Skip to main content

The lineup

Three models. One principle: yours.

Our models are open-weight. You download the files. They run on your hardware. No API calls, no cloud dependency.

flash-1-mini

Personal AI, mobile, edge, first private AI

Free — forever

Tier 1 (laptop)

Specs

Parameters
4 billion
Context length
8K tokens
Recommended quantization
Q4_K_M (~2.7 GB)
Minimum hardware
Any laptop with 4+ GB RAM
License
Open weights, no employee count limit

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Function calling
  • RAG-optimized
  • Multiple GGUF quantization levels
  • Optimized for low-memory devices
  • Sub-second response on Apple Silicon
  • Quantizations from Q2_K to fp16

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
Download links open with launch

flash-1

Business daily driver, RAG, function calling

$99 one-time

Tier 1–2 (laptop to workstation)

Specs

Parameters
9 billion
Context length
16K tokens
Recommended quantization
Q4_K_M (~5.5 GB)
Minimum hardware
8 GB RAM or entry GPU
License
Free for orgs under 50 employees

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Function calling
  • RAG-optimized
  • Multiple GGUF quantization levels
  • Balanced reasoning and speed
  • Tuned for document Q&A and tool use
  • Quantizations from Q3_K_M to fp16

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
Download links open with launch

flash-1-pro

Enterprise, defense, complex reasoning, multi-user

$499 one-time

Tier 2–4 (workstation to colo)

Specs

Parameters
27 billion
Context length
32K tokens
Recommended quantization
Q4_K_M (~16 GB)
Minimum hardware
24+ GB VRAM or 32+ GB RAM
License
Commercial license required for orgs over 50

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Function calling
  • RAG-optimized
  • Multiple GGUF quantization levels
  • Strongest reasoning and instruction-following
  • Optimized for vLLM multi-user deployments
  • Quantizations from Q3_K_M to fp16

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
  • vLLM for multi-user inference
Download links open with launch

Which model should you use?

Three questions. Two minutes. We don't need your email.

Solo user or a team?

Solo→ flash-1-mini or flash-1
Team→ flash-1 or flash-1-pro

Do you handle regulated data (legal, health, financial)?

Yes→ flash-1 minimum, flash-1-pro recommended
No→ flash-1-mini works for most cases

Air-gapped or multi-user deployment?

Yes→ flash-1-pro
No→ flash-1-mini or flash-1 is enough

Licensing

Free tier

flash-1-mini is free forever — personal, professional, commercial, doesn't matter. Download it and use it.

Small organizations (under 50 employees)

flash-1 and flash-1-pro are one-time purchases. No per-seat fees. No subscription.

Enterprise (50+ employees, regulated industries, defense)

Commercial licensing available with deployment support. See the Enterprise page or contact us.

Questions you might be having

If you have one we missed, ask us directly.

What does "open-weight" actually mean?

You get the actual model files. You can inspect them, run them, fine-tune them. There is no hosted version of our models that you have to go through us for. The weights live on your machine.

What is GGUF? Do I need anything special to run it?

GGUF is a file format for AI models. It runs on Ollama, LM Studio, llama.cpp, and any compliant runtime. If you have a modern Mac, Windows, or Linux machine with the recommended RAM, you already have what you need. Download the file, point your runtime at it, done.

Are these fine-tuned Llama models?

No. We trained them from scratch on Canadian compute. We did this so we own the entire stack and so you do too — no upstream license to worry about, no upstream owner who can change the rules.

Can I commercialize the outputs?

Yes. Outputs are yours. Use them in work product, client deliverables, internal tools, products you sell. The model license covers running the model. It does not claim ownership of what you generate.

What about benchmarks?

Benchmarks publish with each model launch. flash-1-mini benchmarks land with the model at end of May 2026. We will not pre-announce numbers — benchmark gaming is part of what is wrong with the industry. Real numbers, on the actual model you can download.

How do model updates work?

You get a notification when a new version is available. You decide whether to download it. The version you already have continues to work forever — we cannot break or revoke a model you already downloaded.

What if SimpleDirect goes out of business?

Your models keep working. They are files on your hardware. We cannot revoke them, even by ceasing to exist. This is the entire point of open-weight ownership.

Does this work without internet?

Yes. Once downloaded, the model runs entirely on your hardware. Airplane mode is fine. So is an air-gapped network.

What operating systems are supported?

GGUF runtimes work on macOS, Windows, and Linux. The September 2026 desktop app ships on macOS and Windows at launch, with Linux to follow.

Is my data ever sent to you?

No. We do not operate inference servers. We could not collect your prompts even if we wanted to. The desktop app, when it ships, sends no telemetry. See the privacy policy for full details.

Get notified when models ship

flash-1-mini lands end of May 2026. flash-1 in July. flash-1-pro in September. One email per launch.

Join the waitlist