OPEN SOURCE · ON-DEVICE AI · NO INTERNET REQUIRED

Llama 3.1 in your pocket

Evrmind Labs compressed Llama 3.1 8B down to 4.2 GB — small enough to run entirely on your phone, with no internet connection, no cloud API, and no data leaving your device.

Get Early Access View on GitHub

Open source. Runs offline. No cloud required.

Evrmind Labs · Llama 3.1 8B · 4.2 GB · offline

llama-3.1-8b · evrmind-q4 · 4.2 GB · phone · 0 network calls

Llama 3.1 8B · Compressed · On-device

The same model powering frontier AI apps — compressed to 4.2 GB and running entirely on your phone with no internet connection.

4.2 GBModel size on disk

8BLlama 3.1 parameters

100%Runs offline

0Cloud calls required

Powerful AI shouldn't need a data centre.

Running LLMs today means sending your conversations to the cloud, buying a GPU, or accepting a toy-sized model. We squeezed Llama 3.1 8B down to 4.2 GB so it fits on your phone — and stays there.

The problems we're solving

Cloud dependency

Every AI call leaks your data to a server and racks up API costs

Bloated model sizes

State-of-the-art LLMs require gigabytes of VRAM and expensive GPUs

No internet, no AI

Existing AI apps break the moment you lose your connection

Latency & privacy

Cloud round-trips mean slow responses and data you can't control

Llama 3.1 8B — compressed to fit a phone

We took Meta's Llama 3.1 8B and compressed it down to 4.2 GB using aggressive quantisation techniques. The result is a frontier-class model that runs entirely on consumer phone hardware — no internet, no GPU, no subscription.

–Llama 3.1 8B compressed from ~16 GB to 4.2 GB via aggressive quantisation
–Runs fully on-device — CPU inference on any modern Android or iPhone
–Open weights, open compression pipeline — fork and fine-tune freely

Size vs. Capability

Phone · CPU inference · no network

ModelParamsSizeSpeedDevice

Llama 3.1 8B (original)8B~16 GB—GPU required

Evrmind 8B (quantised)8B4.2 GB12 t/sphone CPU

Evrmind 8B (streaming)8B4.2 GB18 t/sphone NPU

4.2 GB on disk · runs on any phone · 0 internet calls

Evrmind Device · Gen 1

Physical AI hardware — no internet required

In development

Base modelLlama 3.1 8B (quantised)

Model size4.2 GB on flash storage

InferenceOn-device CPU/NPU — no cloud

ConnectivityOptional — works fully offline

PrivacyZero data leaves the device

Open firmwareYes — Apache 2.0

Prototype units available for partners

Physical devices — built around the model

We're not just building software. We're designing purpose-built hardware that ships with Evrmind pre-loaded — optimised for local inference, without ever needing to connect to the internet.

–Designed for always-on AI with no cellular or Wi-Fi required
–Purpose-built hardware optimised for transformer inference at milliwatt power
–Open firmware, open hardware schematics — Apache 2.0

Integrate in minutes — on any platform

Evrmind ships with SDKs for Python, Android, and iOS. Drop Llama 3.1 8B into your app with three lines of code — it runs entirely on the user's device, offline, with no API keys or subscriptions.

–pip install evrmind — Llama 3.1 8B running locally in three lines of Python
–Android and iOS SDKs for mobile apps with fully on-device inference
–Rust and C bindings for embedded and bare-metal environments

main.py

$Inference complete · 0 network calls · 4.2 GB on device

Community

Open source

2.4kGitHub stars

38Contributors

1.1kDiscord members

14kModel downloads

bnkc merged PR #142 — Nano quantisation fix

3m ago

silasfelinus opened issue #155 — RPi 5 support

41m ago

ymc9 released evrmind-mini v0.4.1

2h ago

Built in public, with the community

Every model, every circuit schematic, every training script is public. We develop Evrmind in the open because AI at the edge should be owned by everyone, not a handful of corporations.

–Contribute to model architecture, training data, or hardware designs
–Star and fork on GitHub — issues and PRs welcome
–Join the research mailing list for early previews and RFCs

Llama 3.1 8B — on your phone, no internet needed.

Get early access to Evrmind — the compressed model that puts a frontier LLM on any phone, fully offline. Whether you're a developer, researcher, or just believe AI should be private by default, we'd love to hear from you.

Get Early Access

Open source. No subscription. No cloud. No data leaves your device.