TL;DR
Most developers treat an LLM like a glorified database query-a synchronous request-response cycle handled within a single Next.js or Node server. This is architectural suicide. When you’re running complex agentic workflows that might take 30 seconds to “think” and another 10 to validate, you cannot block your UI thread. In my lab, I’ve pioneered a Split-Brain Architecture. The “Brain” (Intelligence) lives in specialized Python environments across a distributed hardware cluster, while the “Body” (Interface) is a lean, mean Astro machine that prioritizes speed and SEO.
The Architecture
My lab is distributed. I don’t believe in putting all my compute in one basket. A single deployment going down shouldn’t take the lab’s intelligence offline; the architecture should survive any individual failure.
graph TD
subgraph "The Body (Astro 4)"
UI[Web Interface] -->|Fetch| API[FastAPI Gateway]
end
subgraph "The Nervous System (Redis)"
API <--> QUEUE[Task Queue]
end
subgraph "The Brain (Python 3.12)"
QUEUE <--> P1[Mac Mini - Cloud Models]
QUEUE <--> P2[Pi Cluster - Local Fallback]
QUEUE <--> P3[Workstation - GPU Heavy]
end
| Layer | Technology | Primary Role |
|---|---|---|
| Body | Astro + Tailwind v4 | UI delivery, SEO, and static documentation. |
| Brain | Python + LangGraph | Long-running reasoning cycles and model orchestration. |
| Nervous System | FastAPI + Redis | Asynchronous state management and event routing. |
| Compute | Together AI / Ollama | Inference engines (Cloud and Local). |
The Build
Implementation starts with decoupling. The “Brain” should never care about CSS, and the “Body” should never care about temperature-sampling or top-p values.
1. The Brain: A Stateless Logic Engine
I use FastAPI to expose the agents. This allows the Astro “Body” to trigger thoughts without managing the underlying Python dependencies.
# brain/main.py
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import redis
app = FastAPI()
r = redis.Redis(host='localhost', port=6379, db=0)
class Task(BaseModel):
instruction: str
session_id: str
@app.post("/think")
async def run_thought_cycle(task: Task, background_tasks: BackgroundTasks):
# Update Body that we are 'Thinking'
r.set(f"status:{task.session_id}", "processing")
# Run the expensive AI logic in the background
background_tasks.add_task(expensive_reasoning, task.instruction, task.session_id)
return {"status": "accepted", "session_id": task.session_id}
def expensive_reasoning(prompt, sid):
from gekro_client import GekroLLMClient
client = GekroLLMClient()
result = client.chat([{"role": "user", "content": prompt}])
r.set(f"status:{sid}", "completed")
r.set(f"result:{sid}", result)
2. The Body: Astro Request Pattern
In Astro, I fetch the initial state during SSR, but use a small “Island” (Preact or SolidJS) to poll the status if a thought cycle is active. This keeps the initial load instant.
---
// apps/web/src/pages/lab.astro
import LabStatus from '../components/LabStatus.tsx';
const initialStatus = await fetch('http://brain-gateway/status').then(res => res.json());
---
<Layout title="Lab Controls">
<h1>System Orchestration</h1>
<!-- The 'Island' that handles the live updates -->
<LabStatus client:load initialData={initialStatus} />
</Layout>
WSL2 Note
When bridging these layers on a Windows machine, I run Redis and the FastAPI “Brain” inside WSL2 but use the Windows-native Astro dev server for the “Body.” This allows me to use the Windows Chrome debugger for UI work while the heavy Linux-optimized Python code runs in its natural environment.
The Tradeoffs
The biggest challenge isn’t the code; it’s State Synchronization. If the Brain completes a task but the Body doesn’t poll for the update, the user sees a stale UI. I spent three weeks chasing a bug where an agent had finished summarizing a 4k log file, but the Redis key hadn’t propagated correctly, leading to “Infinite Thinking” loops in the browser.
The complexity of a distributed system is its own form of debt. If you’re building a simple app, don’t do this. But if you’re building a lab that needs to survive a 2 AM cloud blackout, you need the resilience that only a split-brain architecture provides.
Where This Goes
This setup is moving toward Physical Feedback. I’m currently wiring the “Brain” outputs to a set of Hue lights in my DFW office. If the lab detects a critical failure on a remote server, the room literally turns red. Architecture isn’t just about software; it’s about the environment where the software works.
Continue Reading
Related from the lab
The Token Economics of Local AI
Why every team running sustained AI workloads - from a solo home lab to a 50,000-user enterprise - pays itself back faster by owning the inference layer than by renting it.
Sonic Phoenix: Bringing 7,246 Forgotten Songs Back from the Dead
How my child's request to hear my childhood music turned into a seven-phase pipeline that fingerprinted, sorted, enriched, and synced a 30GB library to Spotify - then handed it to an AI skill for on-demand playlist curation.
AI Codes Like a Genius. Architects Like a Goldfish.
Why zero-shot AI POCs fall apart at scale, and how constrained architectural templates are the only thing standing between a clean codebase and a production time bomb.
Written by
Rohit BuraniAI engineer building local-first systems, self-hosted infrastructure, and autonomous tools from the lab.
The Lab Newsletter
Get notified when I build
something worth sharing.
No filler. No frequency commitment. Just the experiments, tools, and breakdowns I'd want to read myself.
Discussion
Comments
Powered by GitHub Discussions — requires a GitHub account.