neuralboot
Installable local LLM server

Own your model. Audit the code.
Nothing leaves your machine.

Install Trapetum as a background service on your own GPU machine. You get a private ChatGPT-style chat, an OpenAI-compatible API, admin controls and a usage dashboard, all on localhost. The model runs on your hardware. The compression engine is source available, on GitHub, so your security team can read every line.

Why control is the security model

For sensitive or regulated data, a local plus source-available LLM is the only setup where you can actually prove what happens to your data. Control and auditability are not features here, they are the point.

  • Your data never leaves. Prompts and outputs stay on your machine. No third-party API sees them, no cloud, no logging you do not control. The opposite of sending your data to a hosted model.
  • The compression engine is source available. Read every line on GitHub. Verify there is no telemetry, no exfiltration, no backdoor. A closed binary you cannot audit is a risk you cannot measure.
  • You own the whole chain. Build the binary from source, compress the models yourself, run fully air-gapped if you need to. Supply-chain trust by construction.
  • It passes a security review. For health, legal, finance, defense or GDPR data, "the model runs on our hardware and we audited the code" is the answer that clears the room.

What you get

One small service on port 8088. Everything below runs locally, served from your own machine.

Private chat

A ChatGPT-style interface to talk to your compressed models. Pick a model, add more, all on localhost.

OpenAI-compatible API

/v1/chat/completions and /v1/models. Point any OpenAI client at your own server. Swagger docs at /docs.

API tokens

Generate and revoke Bearer tokens from the admin console. Lock the API to your apps only.

Admin settings

Port, network binding, CORS, rate limits, default model, prompt logging on or off, all admin-only behind a password.

Usage dashboard

Graphs of tokens per model, requests, compression rate and energy plus CO2 saved versus fp16, in real time.

Energy and CO2

Live grid carbon intensity at your location. Compressed 4-bit decode uses about 2.1x less energy than fp16.

API access

Drop-in OpenAI-compatible. Create a token in the admin console, then call your own machine.

curl http://localhost:8088/v1/chat/completions \
  -H "Authorization: Bearer trp_your_token" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen25-7b","messages":[{"role":"user","content":"Hello"}]}'
EndpointWhat it does
GET /Chat interface
GET /docsSwagger API documentation
POST /v1/chat/completionsChat completion (OpenAI-compatible)
GET /v1/modelsList installed models
GET /adminAdmin settings (password protected)
GET /admin/dashboardUsage and CO2 dashboard (admin only)

How to install

Requires an NVIDIA GPU with the CUDA runtime. The installer sets up a background service that starts on boot and serves the web UI on http://localhost:8088. During install you set an admin password that locks the settings.

🐧 Linux (systemd)

tar xzf trapetum-linux.tar.gz
sudo ./trapetum-linux/install-linux.sh
# you are prompted for an admin password
# manage: systemctl status|restart trapetum

🪟 Windows (service)

powershell -ExecutionPolicy Bypass `
  -File install-windows.ps1
# elevated PowerShell, prompts for an
# admin password. Manage in services.msc

After install, open http://localhost:8088, click the + to add a model, and the gear to open the admin settings. Models are compressed ahead of time, so the install stays light.