Llama 3.1 in your pocket
Evrmind Labs compressed Llama 3.1 8B down to 4.2 GB — small enough to run entirely on your phone, with no internet connection, no cloud API, and no data leaving your device.
Open source. Runs offline. No cloud required.
Llama 3.1 8B · Compressed · On-device
The same model powering frontier AI apps — compressed to 4.2 GB and running entirely on your phone with no internet connection.
Powerful AI shouldn't need a data centre.
Running LLMs today means sending your conversations to the cloud, buying a GPU, or accepting a toy-sized model. We squeezed Llama 3.1 8B down to 4.2 GB so it fits on your phone — and stays there.
The problems we're solving
Cloud dependency
Every AI call leaks your data to a server and racks up API costs
Bloated model sizes
State-of-the-art LLMs require gigabytes of VRAM and expensive GPUs
No internet, no AI
Existing AI apps break the moment you lose your connection
Latency & privacy
Cloud round-trips mean slow responses and data you can't control
Llama 3.1 8B — compressed to fit a phone
We took Meta's Llama 3.1 8B and compressed it down to 4.2 GB using aggressive quantisation techniques. The result is a frontier-class model that runs entirely on consumer phone hardware — no internet, no GPU, no subscription.
- –Llama 3.1 8B compressed from ~16 GB to 4.2 GB via aggressive quantisation
- –Runs fully on-device — CPU inference on any modern Android or iPhone
- –Open weights, open compression pipeline — fork and fine-tune freely
Size vs. Capability
Phone · CPU inference · no network
Evrmind Device · Gen 1
Physical AI hardware — no internet required
Physical devices — built around the model
We're not just building software. We're designing purpose-built hardware that ships with Evrmind pre-loaded — optimised for local inference, without ever needing to connect to the internet.
- –Designed for always-on AI with no cellular or Wi-Fi required
- –Purpose-built hardware optimised for transformer inference at milliwatt power
- –Open firmware, open hardware schematics — Apache 2.0
Integrate in minutes — on any platform
Evrmind ships with SDKs for Python, Android, and iOS. Drop Llama 3.1 8B into your app with three lines of code — it runs entirely on the user's device, offline, with no API keys or subscriptions.
- –pip install evrmind — Llama 3.1 8B running locally in three lines of Python
- –Android and iOS SDKs for mobile apps with fully on-device inference
- –Rust and C bindings for embedded and bare-metal environments
Community
bnkc merged PR #142 — Nano quantisation fix
3m ago
silasfelinus opened issue #155 — RPi 5 support
41m ago
ymc9 released evrmind-mini v0.4.1
2h ago
Built in public, with the community
Every model, every circuit schematic, every training script is public. We develop Evrmind in the open because AI at the edge should be owned by everyone, not a handful of corporations.
- –Contribute to model architecture, training data, or hardware designs
- –Star and fork on GitHub — issues and PRs welcome
- –Join the research mailing list for early previews and RFCs
Llama 3.1 8B — on your phone, no internet needed.
Get early access to Evrmind — the compressed model that puts a frontier LLM on any phone, fully offline. Whether you're a developer, researcher, or just believe AI should be private by default, we'd love to hear from you.
Open source. No subscription. No cloud. No data leaves your device.