Mumpix

Agents + Jetson

A self-learning browser and hardware automation agent that runs on Jetson Orin. Reads any DOM, controls any input device, accumulates causal knowledge across sessions, and re-plans when the world changes.

Single HTML file Dark by default Zero-config API Mumpix integrated
MumpixAgent  ← zero-config developer API
   │
   ├── ActionPipeline
   │      pre-snap → execute → post-snap → record → drift
   ├── AchieveEngine
   │      plan → execute sequence → replan on failure → promote gene
   ├── DriftDetector
   │      failure streaks → stale suppression → semantic fallback
   │
   └── mumpix-core  ← shared singleton (all layers, one memory file)
          ├── WorldModel   causal triples
          ├── DNA          goal → proven action sequence
          ├── Turbo        observation window → LLM prompt
          └── MumpixDB     evidence ladder LOW → PROVEN

Three services, one memory file:
  :7780 browser   | CDP read + xdotool write
  :7781 hardware  | evdev read + uinput write + WASM handlers
  :7779 meta      | stats, planner, genes, outcomes

Getting Started

This stack is designed so the zero-config path works first, and the deep internals stay available when you need to drop down a layer. The examples below are the three common entry points: direct agent usage, full shared-stack usage, and booting all services together.

System requirements

Jetson Orin on JetPack 6+ or any Linux machine with X11, Chrome, and xdotool. Browser control is HID-based, so Chrome must already be running with a DevTools port open.

Architecture

CDP is read-only. xdotool is write-only. Mumpix sits above both, learning from the structural state transitions and action outcomes without ever sending automation events through the browser.

Persistence

Everything shares one mumpix-memory.json. WorldModel triples, genes, and DB evidence persist every 30 seconds and on clean process exit.

Quick Start · zero-config path
const { createAgent } = require('./src/agent');

const agent = await createAgent({}, { verbose: true });
await agent.connect();

await agent.goto('https://github.com/login');
await agent.fill('#login_field', 'operator@example.com');
await agent.fill('#password', '••••••••');
await agent.submit();

const result = await agent.achieve({ logged_in: true, domain: 'github.com' });
// → { ok: true, achieved: true, steps: 4, confidence: 0.87 }
Quick Start · full stack with createAgent
const { wm, dna, db, turbo } = require('./mumpix-core');
const { createAgent } = require('./src/agent');

const agent = await createAgent({ wm, dna, db, turbo }, { verbose: true });
console.log(agent.core.wm.stats());
console.log(agent.drift.stats());
Quick Start · boot all services
# one-time helper build for virtual hardware devices
npm run build-helper

# start Chrome with a debug port
google-chrome --remote-debugging-port=9222 &

# boot browser API + hardware API + meta API
node launcher.js

Environment variables

VariableDefaultDescription
MUMPIX_SAVEmumpix-memory.jsonShared persistence file for the Mumpix core.
PORT_BROWSER7780Browser API port for CDP read and xdotool write.
PORT_HW7781Hardware bridge API port.
PORT_META7779Meta API port for stats, planner, and DNA inspection.
CDP_PORT9222Chrome remote debugging port.
ACTION_DELAY200Default post-action delay inside the agent.

Core API

The top-level developer interface is the Mumpix agent wrapper. It exposes the clean zero-config path, but it never hides the internals. You can reach the shared Mumpix layers, the raw CDP reader, the HID writer, or the direct pipeline whenever you need them.

Constructor

Constructor
const { createAgent } = require('./src/agent');

const agent = await createAgent({}, { verbose: true });
await agent.connect();
OptionTypeDefaultDescription
portnumber9222Chrome DevTools Protocol port.
verbosebooleanfalseLogs pipeline activity, drift events, and warnings.
actionDelaynumber200Default delay after actions before post-snapshot.
pipeline.settleMsnumber380Per-action settle time for DOM/network changes.
achieve.maxAttemptsnumber3Maximum replans before achieve() gives up.
Deep access always available Zero-config by default

Instance properties

PropertyDescription
agent.core.wmShared WorldModel singleton with all learned causal triples.
agent.core.dnaDNA gene library for proven action sequences and block rules.
agent.core.dbMumpixDB structured truth and evidence ladder.
agent.core.turboTurbo working-memory window for prompt injection.
agent.driftDrift detector tracking stale actions and semantic fallbacks.
agent.cdpRaw CDP browser reader.
agent.hidRaw xdotool HID writer.
agent.pipelineDirect access to ActionPipeline for custom actions.

async agent.goto(url)

Async Returns outcome

Navigates by focusing Chrome, hitting the address bar, typing the URL, and pressing Enter. No browser-side navigation APIs are used, so there is no automation event path inside the page.

ParamTypeDescription
urlstringFull URL including protocol.
Route verificationStructural snapshots before and after

async agent.click(target)

Async Auto-recovery

Uses three resolution strategies in order: selector lookup via CDP coordinates, text or aria-label matching from live page state, then drift alternatives based on prior non-stale triples.

ParamTypeDescription
targetstringCSS selector or a text:… target.

Resolution order

Selector → text/aria-label → drift alternatives. The first successful resolution wins.

async agent.fill(selector, value)

AsyncAllow no-change

Clicks the field, selects existing text, and types or pastes the new value. Falls back from CSS selectors to label, placeholder, aria-label, id, or name matching when needed.

ParamTypeDescription
selectorstringCSS selector or input descriptor.
valuestringText to type.

async agent.submit(selector?)

AsyncButton or Return

Looks for a submit-capable element first, including button text such as Sign in, Log in, Submit, Continue, or Next. Falls back to a real Return keypress.

async agent.observe()

AsyncStructural hashes

Takes a structural snapshot of the current page and returns state information stable across text rewrites. This is the browser-facing state read used to feed the WorldModel and DB.

Observe shape
{
  stateKey: "github.com::0ko33er::1fgr2ox",
  structHash: "1fgr2ox",
  routeHash: "0ko33er",
  meta: { domain: "github.com", url_path: "/login", title: "Sign in to GitHub" }
}

async agent.key(combo)

AsyncRaw HID

Sends raw key combinations using xdotool syntax. This is useful for tab management, escape paths, or browser shortcuts.

Key examples
await agent.key('ctrl+t');
await agent.key('Escape');
await agent.key('ctrl+shift+j');

achieve(goal)

achieve() is the goal-directed entry point. It checks whether the goal is already satisfied, tries to replay any matching DNA plan, falls back to WorldModel planning, executes each step through the pipeline, re-plans on failure, and promotes successful sequences back into DNA as plan::... actions.

Goal fieldTypeDescription
logged_inbooleanSatisfied when auth indicators appear in the structural state.
domainstringTarget hostname used for planning and gene lookup.
url_containsstringExpected route fragment after success.
page_typestringTarget semantic page type such as authenticated.
PriorityStrategySource tagConfidence
1Replay a matching DNA goal plan.dna_planHigh once promoted.
2Plan from the current structural state via WorldModel.plan().world_modelBased on triple confidence.
3Return current context for external reasoning if no plan exists.none0

Promotion path

When a sequence reaches the goal, the engine inserts a DNA gene keyed by the goal descriptor and stores the plan as a plan::label|||label… action for instant replay on the next episode.

Hardware

The hardware side mirrors the browser philosophy: read the real device stream, write through real kernel-facing virtual devices, and let Mumpix learn from the resulting state transitions. This keeps gamepads, MIDI controllers, and custom bridges inside the same memory substrate as browser actions.

EvdevWatcher

Discovers and opens /dev/input/event* readers, emits normalized input events, and lets you observe raw physical devices without modifying them.

Read from physical devices
const { EvdevWatcher } = require('./evdev');

const watcher = new EvdevWatcher().start();
const controller = watcher.open('/dev/input/event3');
controller.on('event', ev => console.log(ev));

UinputDevice

Creates virtual keyboard, mouse, or gamepad devices so the system can emit kernel-visible input instead of faking higher-level events.

TemplateUse
keyboardConfig(name)Text entry and desktop shortcuts.
mouseConfig(name)Relative pointer and button events.
gamepadConfig(name)Absolute axes and button controls.

DeviceBridge and WASM handlers

Bridges a real reader to a virtual device with an optional handler in between. The handler can be built-in or a custom WebAssembly module that parses and transforms device events.

Bridge real → virtual
const { createRemapper } = require('./wasm-loader');
const DeviceBridge = require('./device-bridge');

const remap = createRemapper({ 288: 304, 289: 305 });
const bridge = new DeviceBridge({ source: controller, target: pad, handler: remap });
bridge.start();
HandlerInputOutput
midiRaw MIDI byte bufferNote/channel/velocity object
ds4DualShock 4 HID reportStick/trigger/button state object
createRemapper(map)evdev eventRemapped evdev event
./handler.wasmAny binary bufferCustom JSON payload

Memory Layers

These are the learning and planning layers that make the agent cumulative instead of stateless. The browser agent, hardware bridge, and any future domains all feed the same singletons through mumpix-core.js.

WorldModel

The causal memory store. It records state-action-effect triples and can plan forward from an observed structural state to a goal predicate.

WorldModel API examples
const { wm } = require('./mumpix-core');

wm.observe(
  { domain: 'github.com', page_type: 'auth', logged_in: false },
  'click::text:Sign in',
  { domain: 'github.com', page_type: 'authenticated', logged_in: true }
);

wm.effects({ domain: 'github.com' }, 'click::text:Sign in');
wm.candidates({ domain: 'github.com' }, { logged_in: true });
wm.plan({ domain: 'github.com', logged_in: false }, { logged_in: true });
wm.snapshot();

Planner

The planner is the forward-chain part of the WorldModel. It uses candidate transitions, triple confidence, and distance-to-goal to build a short action sequence.

PriorityStrategySource
1DNA replayStored goal plans
2Greedy best-first searchWorldModel triples
3Context handoffTurbo + DB for LLM reasoning
REST plan example
curl -X POST localhost:7779/wm/plan \
  -H 'Content-Type: application/json' \
  -d '{"state":{"domain":"github.com"},"goal":{"logged_in":true}}'

DriftDetector

Tracks failure streaks for action keys, suppresses stale paths, and surfaces semantically similar alternatives using Jaccard similarity over action tokens.

ConstantValueDescription
STALE_THRESHOLD3Consecutive failures before suppression.
RECOVER_THRESHOLD2Consecutive successes before reactivation.
REVALIDATE_AGE24hAge before the triple is queued for re-probing.

ActionPipeline

This is the direct access point for custom actions that still need the full observation and learning harness. It is how the agent keeps drift, WorldModel, Turbo, DB, and Mumpix synchronized on every action.

Direct pipeline access
await agent.pipeline.run(
  async () => {
    await agent.hid.focusChrome();
    await agent.hid.key('ctrl+l');
  },
  'key',
  'ctrl+l'
);

REST APIs

Three ports make up the running system: the browser service on 7780, the hardware service on 7781, and the meta service on 7779. All three share the same Mumpix core and memory file.

7780

Browser API

GET
/browser/snapshot
Full page state: URL, title, text, links, forms, buttons, alerts.
GET
/browser/rect?sel=…
Element screen coordinates for HID targeting.
POST
/act/goto
HID navigate via address bar keystrokes.
POST
/act/click
Selector → CDP coords → xdotool click.
POST
/act/fill
Selector → coords → click → type.
POST
/act/submit
Submit via focused field or Return key.
7781

Hardware API

GET
/devices
List physical evdev devices.
POST
/devices/open
Open a real device reader.
GET
/devices/:id/stream
SSE stream of input events.
POST
/virtual
Create a virtual keyboard, mouse, or gamepad.
POST
/bridge
Connect real → virtual with an optional handler.
DEL
/bridge/:id
Stop a bridge.
7779

Meta API

GET
/stats
Combined Mumpix, WorldModel, DB, Turbo, and DNA stats.
GET
/wm/triples
Recent causal triples.
POST
/wm/plan
Plan a route from current state to a goal predicate.
GET
/dna/genes
Inspect promoted genes and their current state.
POST
/dna/decay
Run dormancy selection manually.
POST
/save
Flush persistence immediately.

Model Catalog API

Developers can inspect the shipped Jetson model catalog through the shared router surface, then use the control API to switch the active model when they administer that device.

GET
/api/router/v1/models
Visible models plus active model state.
GET
/api/router/v1/models/catalog
Curated shipped model list with speed, VRAM, and context metadata.
GET
/control-api/models/status
Current model, service state, and live router visibility.
POST
/control-api/models/load
Load a shipped model by slug, display name, or full HF id.
POST
/control-api/models/eject
Eject the active model and stop the runtime cleanly.

Shipped Jetson models

RECOMMENDED  llama32-3b       28–38 tok/s  ~2.0 GB VRAM  128K
FASTEST      llama32-1b       55–70 tok/s  ~0.9 GB VRAM  128K
HIGH QUALITY llama31-8b       12–18 tok/s  ~4.5 GB VRAM  128K
CODING       qwen25-coder-3b  26–34 tok/s  ~2.0 GB VRAM   32K
COMPACT      phi35-mini       25–32 tok/s  ~2.2 GB VRAM  128K
COMPACT      gemma2-2b        32–42 tok/s  ~1.5 GB VRAM    8K
NEAR LIMIT   mistral-7b       13–19 tok/s  ~4.1 GB VRAM   32K

Current scope

The developer API supports listing, loading, ejecting, and checking model state. Downloading brand-new models stays separate for now so the device contract remains predictable.

Internals

These pieces are what make the stack robust across copy changes, retries, and long-running sessions. They are worth understanding because they explain why the agent can keep working when the surface text changes but the structure stays the same.

state-hash

HashCapturesBreaks onUsed for
structHashForm shapes, element roles, aria-labels, input typesActual structural redesignPrimary WorldModel state key
routeHashHostname + normalized pathRoute renameRoute-level fallback
contentHashVisible text and alertsAny text rewriteExact change detection
pageKeydomain::routeHash::structHashRoute or structure changeCanonical state identity

Why it matters

If GitHub changes button copy from “Sign in” to “Log in,” the state key survives because the form structure and element roles are unchanged.

mumpix-core

Node’s require() cache guarantees this is a singleton. Every service sees the same wm, dna, db, and turbo instances. That is what lets browser, hardware, and bridge events all accumulate into one live memory substrate.

Singleton import
const { wm, dna, db, turbo } = require('./mumpix-core');
console.log(wm.stats());

Evidence tiers

TierMultiplierMeaning
LOW×1Observed once or twice; still noisy.
MEDIUM×2Repeated evidence but not yet resilient under variation.
HIGH×3Reliable across multiple episodes and structural states.
PROVEN×4Promoted into DNA and ready for direct replay.

Decay and purge

Weak genes are suppressed with DISABLED first, not deleted. They can come back when the environment changes. Hard deletion is a separate purge() path for truly exhausted genes.