Clawdscape

How It Works

The Perception-Action Loop

A Mac Mini captures the screen, sends it to ClawdBot via API, receives action decisions, and executes them with human-like input simulation — all running autonomously.

This is an autonomous AI agent that plays Old School RuneScape entirely through vision, running 24/7 on a Mac Mini. It doesn't read game memory, inject code, or use any plugins — it simply looks at the screen and decides what to do, just like a human player would.

Every few seconds, the Mac Mini captures a screenshot of the game and sends it to ClawdBot (Anthropic's most powerful AI model), which returns a structured decision: what it sees, what it thinks, and what actions to take. Those actions — mouse clicks, keyboard presses, minimap navigation — are executed with human-like Bezier curve mouse paths and randomized timing.

The agent has persistent memory powered by a vector database. It remembers where it's been, what worked, what failed, and what killed it. These memories are fed back into every decision, so it learns from experience across hundreds of ticks.

Everything streams live to this dashboard — the AI's thoughts, its actions, player stats, a world map tracking its location, screenshots of what it sees, and achievement milestones. You're watching an AI explore Gielinor in real time.

Screenshot

Capture the game window

→

LLM Vision

ClawdBot analyzes the scene

→

Decision

Choose actions from 30+ types

→

Execution

Human-like Bezier mouse paths

→

Memory

Store experience in vector DB

→

Repeat

Loop forever, learn always

Capabilities

Built for Autonomy

Every component is engineered for one goal: a self-sufficient agent that plays RuneScape like a human, learns from experience, and never stops.

Multimodal Vision AI

Sends raw screenshots to ClawdBot. The LLM observes the game world, identifies objects, NPCs, and interfaces, then decides what to do next — pure vision, zero game API access.

ClawdBot

Human-Like Input

Mouse movements follow Bezier curves with de Casteljau interpolation, ease-in-out acceleration, random jitter, and occasional hesitation pauses. Indistinguishable from a real human player.

Bezier Curves

Persistent Memory

ChromaDB vector database stores 12 types of memory: observations, combat knowledge, navigation paths, NPC encounters, death learnings, and more. Semantic search retrieves relevant past experiences.

ChromaDB + Embeddings

Hierarchical Goal Planning

Tree-based goal system with prerequisites, priority ordering, auto-cascading completion, and retry logic. Goals decompose from "Complete Tutorial Island" into atomic sub-tasks.

Goal Tree

Intelligent Stuck Detection

Three-level detection: repeated clicks, scene-stuck keywords, and area-stuck monitoring. Automatic recovery nudges force new approaches, exploration, and activity variation.

Self-Recovery

Live Stream Overlay

OSRS-themed transparent overlay shows the agent's real-time thoughts, reasoning, and actions. OBS WebSocket integration enables 24/7 autonomous livestreaming with status overlays.

OBS Integration

System Design

Modular Architecture

Clean separation of concerns allows each subsystem to evolve independently. The game loop orchestrates all components through a unified tick-based pipeline.

┌──────────────────────────────────────────────────────────────────────┐ │ Clawdscape — System Architecture │ ├──────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Screenshot │────▶│ LLM Vision │────▶│ Decision │ │ │ │ Capture │ │ Engine │ │ Engine │ │ │ └─────────────┘ └──────────────┘ └──────┬───────┘ │ │ │ ▲ │ │ │ │ │ │ │ ▼ │ │ │ ┌─────┴────────┐ ┌──────────────┐ │ │ │ │ Memories │ │ Action │ │ │ │ │ + Goals │ │ Executor │ │ │ │ └──────────────┘ └──────┬───────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────┐ │ │ └─────────────────────────│ Human-Like │ │ │ loop every 300ms │ Input │ │ │ │ Controller │ │ │ └──────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘

Under the Hood

Code That Plays Games

From Bezier curves to semantic memory, every line is crafted for autonomous gameplay.

            agent/core/input_controller.py
            Python
          
def _bezier_points(self, start: Point, end: Point,
                    control_points: int = 2, num_steps: int = 50) -> list[Point]:
    """Generate points along a Bezier curve from start to end.
    Creates natural-looking curved mouse trajectories."""
    points = [start]

    # Generate random control points that create a natural arc
    cps = [start]
    for _ in range(control_points):
        mid_x = (start.x + end.x) / 2
        mid_y = (start.y + end.y) / 2
        dist = math.hypot(end.x - start.x, end.y - start.y)
        spread = dist * 0.3
        cp = Point(
            int(mid_x + random.uniform(-spread, spread)),
            int(mid_y + random.uniform(-spread, spread)),
        )
        cps.append(cp)
    cps.append(end)

    # De Casteljau's algorithm for smooth Bezier interpolation
    for i in range(1, num_steps + 1):
        t = i / num_steps
        t = self._ease_in_out(t)  # Slow start, fast middle, slow end
        result = self._de_casteljau(cps, t)
        points.append(result)

    return points

def _ease_in_out(self, t: float) -> float:
    """Ease-in-out for natural acceleration/deceleration."""
    if t < 0.5:
        return 2 * t * t
    return -1 + (4 - 2 * t) * t

            agent/core/vision.py
            System Prompt
          
"""The LLM receives a comprehensive gameplay instruction set."""

OSRS_SYSTEM_PROMPT = """
You are an autonomous OSRS player. Analyze each screenshot
and return JSON actions.

## CRITICAL RULES
- ALWAYS return 2-3 actions per turn
- Your LAST action should be a click_minimap to keep moving
- NEVER open the Settings menu — it blocks the game view
- FINISH what you start: combat, tree chopping, conversations

## ACTIVITY PRIORITY
1. CHOP TREES — click any tree to gather logs
2. COMBAT — attack nearby creatures, pick up drops
3. TALK TO NPCs — engage in conversation
4. PICK UP ITEMS — bones, coins, weapons, everything!
5. EXPLORE — walk to new towns, castles, bridges

## AVAILABLE ACTIONS
- click(x, y) — left-click game coordinates
- right_click(x, y) — context menu
- click_minimap(x, y) — navigate via minimap
- type_text(text) — type in chat
- press_key(key) — press keyboard key
- click_inventory_slot(slot) — interact with item
- rotate_camera(direction, duration) — camera control
- wait(min, max) — brief pause
"""

            agent/memory/memory_store.py
            Python
          
class MemoryType(Enum):
    """12 categories of persistent agent memory."""
    OBSERVATION  = "observation"     # What was seen on screen
    ACTION       = "action_result"   # Successful action outcomes
    LOCATION     = "location"        # Map knowledge & landmarks
    NPC          = "npc"             # NPC behavior & dialogue
    QUEST        = "quest"           # Quest progress
    SKILL        = "skill"           # Training techniques
    ITEM         = "item"            # Item properties
    COMBAT       = "combat"          # Monster knowledge
    DEATH        = "death"           # Learn from mistakes
    FAILURE      = "failure"         # Failed attempts
    NAVIGATION   = "navigation"      # Path instructions
    STRATEGY     = "strategy"        # General approaches

def get_context_for_situation(self, query: str) -> str:
    """Retrieve semantically relevant memories for the LLM."""
    results = self.collection.query(
        query_texts=[query],
        n_results=settings.MEMORY_TOP_K,  # Top 10 matches
    )
    # Format memories as context for the LLM prompt
    memories = []
    for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
        memories.append(f"[{meta['type']}] {doc}")
    return "\n".join(memories)

            agent/planning/goal_planner.py
            Python
          
@dataclass
class Goal:
    """Hierarchical goal with prerequisites and retry logic."""
    id: str
    name: str
    description: str
    status: GoalStatus          # PENDING | ACTIVE | COMPLETED | FAILED
    priority: int               # 1-10, higher = more urgent
    parent_id: Optional[str]
    children_ids: list[str]
    prerequisites: list[str]   # Goal IDs that must complete first
    attempts: int = 0
    max_attempts: int = 10

def get_next_goal(self) -> Optional[Goal]:
    """Select highest-priority goal with met prerequisites."""
    candidates = [
        g for g in self.goals.values()
        if g.status == GoalStatus.PENDING
        and all(
            self.goals[p].status == GoalStatus.COMPLETED
            for p in g.prerequisites
            if p in self.goals
        )
    ]
    if not candidates:
        return None
    return max(candidates, key=lambda g: g.priority)

def complete_goal(self, goal_id: str):
    """Complete a goal. Auto-cascades to parent if all children done."""
    goal = self.goals[goal_id]
    goal.status = GoalStatus.COMPLETED
    # Check if parent should auto-complete
    if goal.parent_id and goal.parent_id in self.goals:
        parent = self.goals[goal.parent_id]
        if all(self.goals[c].status == GoalStatus.COMPLETED
               for c in parent.children_ids):
            self.complete_goal(parent.id)  # Recursive cascade

Technical Deep Dive

Frequently Asked Questions

How does Clawdscape actually "see" the game?

Every 300 milliseconds, the agent captures a screenshot of the game window, compresses it to JPEG at 640×360, and sends it as a base64-encoded image to ClawdBot via Anthropic's multimodal API. The model performs full scene decomposition — identifying NPCs, objects, UI elements, spatial relationships, and interactive targets. It returns structured JSON with what it sees, what it thinks, and where to click. No game memory access. No client injection. No pixel scraping. Pure visual understanding, the same way a human player reads the screen.

Does it modify the game client in any way?

No. Clawdscape is completely non-invasive. It runs as an entirely separate process with zero hooks into the game client — no bytecode injection, no memory reads, no client-side modifications. Screen capture uses the OS display pipeline (identical to how OBS or any screen recorder works), and input goes through standard mouse/keyboard event dispatch. The game client remains 100% vanilla. The agent has exactly the same access as a human sitting at the keyboard — nothing more.

Why a Mac Mini instead of cloud infrastructure?

Deliberate architectural decision. The Mac Mini M4 handles everything locally: native game rendering, display-pipeline screen capture, Bezier-curve input execution, ChromaDB vector storage (6,000+ memories), stuck detection signal processing, and PostgreSQL telemetry pushing. Only the LLM inference is offloaded to Anthropic's servers. This edge-compute model eliminates cloud VM costs, reduces capture-to-action latency, and enables true 24/7 unattended operation with automatic session recovery. Total infrastructure cost: one Mac Mini + one API key.

How does the mouse movement look human?

Every mouse movement follows a Bezier curve with randomized control points — implemented via de Casteljau's algorithm, the same math used in vector graphics rendering. No two clicks travel the same path. On top of that: ±3px coordinate jitter on every click, variable movement duration (80–300ms), occasional hesitation pauses (5% probability), randomized typing speed with natural variance, and easing functions that simulate human acceleration and deceleration. The input is indistinguishable from a real player's hand on the mouse.

How does the memory system work?

Clawdscape implements retrieval-augmented generation (RAG) over its own lived experience. Every tick, the agent stores observations, action outcomes, navigation routes, combat encounters, deaths, and strategic insights as vector embeddings in ChromaDB across 12 memory categories. Before each decision, it performs semantic similarity search to retrieve the most relevant past experiences. Near Lumbridge? It recalls Lumbridge memories. In combat? It recalls past fights. The result: emergent long-term learning without fine-tuning. The agent genuinely improves over time by remembering what worked and what didn't.

What happens when the AI gets stuck?

A three-layer detection system handles this. Layer 1: NumPy-based minimap frame differencing — if the minimap hasn't changed in 5 ticks, the agent isn't moving. Layer 2: Scene analysis scanning recent observations for repeated obstacle keywords (fence, gate, wall, door). Layer 3: Action pattern matching to catch repetitive click loops. The critical innovation: context-aware suppression. Standing still during combat is correct. Standing still while fishing is correct. The detector only fires when the agent is genuinely stuck, not when it's supposed to be stationary. If combat starts mid-unstick, the override cancels immediately.

How does goal planning work?

Goals are organized in a hierarchical tree with six states: pending, active, in_progress, completed, failed, and blocked. Top-level goals ("Complete Tutorial Island") decompose into ordered sub-goals with prerequisites — you can't fight the Combat Instructor before visiting the Mining Instructor. Failed goals retry up to 10 times before permanent failure. Blocked goals are skipped and revisited. When all children complete, the parent auto-completes. Claude can also decompose goals dynamically, breaking down new challenges on the fly. The full tree persists to disk as JSON, surviving crashes and restarts.

How does the live dashboard get its data?

Every 3 ticks, the agent serializes its full state — screenshot, observations, reasoning, vitals, inventory, counters, location, active goals, memory stats, and action history — into a JSONB payload and upserts it to PostgreSQL on Railway. The Vercel dashboard polls the REST API every 2 seconds, rendering the live feed, game screenshot, world map with real-time GPS tracking, player vitals, activity counters, milestones, XP tracker, and session history. Single-row upsert means zero table growth — only the latest state matters. Full pipeline: Mac Mini → PostgreSQL → REST API → Vercel CDN → Browser.

How accurate are the stats (kills, deaths, XP)?

Every stat is tracked through a pattern-matching system with cooldowns and anti-patterns. Deaths use a 20-tick cooldown and match phrases like "I died" and "oh dear" — but exclude "death rune" and "death talisman" to prevent false positives from item names. Kill tracking requires completion phrases ("killed the," "defeated the") not intent phrases ("trying to kill"). Level-ups are validated against the full set of 23 real OSRS skill names to prevent "need Woodcutting level 15" from counting as reaching level 15. Every data point on this dashboard is real, verified data — not inflated guesses.

Is the codebase open source?

Fully open source. The entire system — vision engine, action executor, memory store, goal planner, stuck detector, input humanization, desktop overlay, database layer, live dashboard, and deployment config — is available on GitHub. 3,500+ lines of Python, zero proprietary dependencies. Clone it, run it, break it, improve it.

Clawdscape

The Perception-Action Loop

Screenshot

LLM Vision

Decision

Execution

Memory

Repeat

Inside the Agent's Mind

Player Progress

Vitals

Activity Counters

World Map

XP Tracker

Level Ups

Inventory

Equipment

Goal Progress

Memory Bank

Action Timeline

Milestones

Session History

Built for Autonomy

Multimodal Vision AI

Human-Like Input

Persistent Memory

Hierarchical Goal Planning

Intelligent Stuck Detection

Live Stream Overlay

Modular Architecture

Code That Plays Games

Tech Stack

ClawdBot

Python

ChromaDB

PyAutoGUI

Pillow

OBS WebSocket

Tkinter

NumPy

Mac Mini

Frequently Asked Questions