Agentic AI Platform

Storymaker

An agentic AI platform for authoring illustrated books —
from concept to print-ready PDF, fully orchestrated.

6 AI agents in production 4-model evaluation harness Full-stack · Python · FastAPI · Postgres
02
PRODUCT OVERVIEW

What Was Built

Storymaker is a full-stack, multi-agent book creation platform. A user enters a story concept; the system orchestrates AI agents across scene authoring, illustration generation, multi-model accuracy grading, PDF assembly, and promo page publishing — entirely autonomously.

The architecture, product logic, and agentic pipeline were designed end-to-end. Zero lines of code were written by hand.

03
ARCHITECTURE

End-to-End Agentic Pipeline

From concept to print-ready PDF — fully orchestrated

01
📝
Scene Authoring
Prompt file parsed into Scene rows via sync_scenes_from_document
02
🎨
Character Reference
One-shot char sheet image; idempotent S3 key; reused across all scenes
03
🖼️
Illustration Agent
asyncio.gather ×5 concurrent; multimodal prompt + char ref; Gemini 2.5 Flash
04
⚖️
Accuracy Jury
4-model parallel LLM-as-judge; inter-model confidence scoring; partial-failure safe
05
📚
PDF Assembly
PyMuPDF block-by-block composition; 1024×1024pt pages; font embedding
06
🌐
Promo Publish
Auto-generated scroll page: cover, synopsis, scenes, parent notes, Q&A
Python 3.11 + FastAPI PostgreSQL + SQLAlchemy 2.x OpenRouter Gemini 2.5 Flash Image GPT-4o-mini Claude 3.5 Haiku Mistral Large PyMuPDF S3 / MinIO asyncio Render Next.js Docker
04
AGENT: IMAGE GENERATION

Scene Editor & Illustration Agent

Prompt-driven illustration with character continuity

Each scene carries a structured image description with injected variables (character traits, settings, visual style). The illustration agent calls gemini-2.5-flash-image via OpenRouter with a multimodal character reference sheet, ensuring visual consistency across all 20 scenes.

Semaphore-bounded asyncio.gather runs up to 5 scene generations in parallel. If OpenRouter rejects a reference-image request, a single retry fires with text-only context — partial failures never abort the batch.

Scene editor
Scene 1 of 20 — 'Everywhere I Look'. Left: structured prompt + page text. Center: live illustration. Right: print text panel.
05
AGENT: GENERALIZATION

Multi-Domain Book Generation

Same pipeline, radically different outputs

The system is fully domain-agnostic. The same agentic pipeline that generates a children's emotional literacy book produces a Pixar-style historical non-fiction illustrated narrative — no code changes, only a different prompt file and character sheet.

The framework adapts to domain through configuration, not re-engineering — a core property of well-designed agentic systems.

Gandhi march scene
Scene 15 — Gandhi's Salt March (30-scene book). Different domain, identical pipeline architecture.
06
AGENT: EVALUATION HARNESS

Multi-Model LLM-as-Judge

Automated accuracy verification across 4 models

A parallel asyncio jury grades every scene: accuracy score (0–100), editorial dimensions (age appropriateness, political risk, reading level, mood, style), and inter-model agreement.

Confidence score blends mean grade with inter-model spread to detect specification gaming. Adversarial probe injection tests robustness.

Partial model failures surface as null grades — the jury never aborts. Models: GPT-4o-mini, Gemini 2.0 Flash, Claude 3.5 Haiku, Mistral Large.

Accuracy dashboard
Accuracy dashboard: clean text 97/100 vs adversarial probe 52/100. Per-model deltas, editorial dimension scoring shown.
07
PLATFORM

Library & Storyline Orchestration

Multi-book, multi-user, concurrent

Each book is an independent storyline with its own scene graph, prompt file, character reference, and assembly block sequence. Multiple books can be in active generation simultaneously, each using an isolated SessionLocal() to prevent ORM conflicts across concurrent commits.

Per-book actions — Editor, Assemble, Accuracy, Promo.

Library view
Library: multiple concurrent books across different genres and audiences. Right: new storyline creation flow.
08
AGENT: OUTPUT PUBLISHING

Auto-Generated Promotional Page

From assembled book to market-ready shareable page — no design tool, no manual layout.

Promo cover
Cover hero — title, tagline, book cover image
Promo scene
Interior scene preview — illustration + verse extract
Promo parent note
Parent note + Q&A — auto-generated from book content
09
OBSERVABILITY

AI Cost Attribution by Call Type

Full observability into per-operation model spend

Every AI call is logged to an ai_calls ledger: model, call_kind, cost_usd, latency, user, and storyline linkage. This granularity enables ongoing pricing model validation, per-user profitability analysis, and model benchmarking across call types.

Knowing the cost and performance of each agent operation — not just the total — is what allows a system to be optimised, not just monitored.

AI usage dashboard
AI usage & spend: breakdown by call_kind and by model. OpenRouter as unified routing layer across 7 models.
10
PLATFORM INTELLIGENCE

Operational & Financial Tracking

Full visibility into revenue, AI spend, credit activity, and live data queries — built in.

Earnings dashboard
Revenue vs API cost · per-user margin · lifetime view
Credit activity
Credit ledger activity · 7-day step chart · all users combined
AI usage
AI usage & spend · by call type · by model · 30-day window
Admin insights
Admin insights · natural language queries · read-only live DB
11

Core AI Engineering Skills — Demonstrated

Not described. Built, shipped, and running.

Skill
Multi-agent orchestration across domains
6 discrete agents coordinated as a linear pipeline with fan-out parallelism. Demonstrated across children's literacy, biography, and historical narrative — same framework, different domain.
Skill
LLM-as-judge & robustness evaluation
4-model parallel grading jury with adversarial probe injection. Confidence scoring detects specification gaming via inter-model spread. Partial failures never abort the jury.
Skill
Agentic harness design
Communication architectures for agents: semaphore back-pressure, isolated ORM sessions per concurrent task, idempotent object keys for expensive operations, retry logic with graceful degradation.
Skill
Vertical-specific product design
Purpose-built for a specific workflow — not a general tool with a chatbot bolted on. The entire pipeline exists to serve one domain deeply.
Skill
Full-stack data pipeline & infrastructure
FastAPI + PostgreSQL + S3 + OpenRouter + Render. Ledger-pattern cost and credit accounting. Per-call AI attribution. Async parallelism with semaphore control. Production on day one.
Skill
0→1 under uncertainty
Sole architect of a production platform — concept, system design, agentic pipeline, monetisation, and observability layer — shipped to paying users. Zero lines of code written by hand.
12
THE BUILDER

Joel Horowitz

20 years of quant finance.
The last 2 in agentic AI.

PhD in Mathematics (ULB). Started as an exotic derivatives trader, became a quant strategist, then a full-stack quant engineer. The through-line: building systems that let non-engineers work at the level of engineers.

At Freestone Grove (L/S equity), currently building LLM-powered platforms for data scientists: sandboxed execution environments, Pydantic-enforced configs, hallucination-resistant by design — production agentic safety patterns.

Goldman Sachs · London & NY · 9 yrs BlueMountain Capital · 5 yrs Squarepoint Capital · 2 yrs Hidden Road Partners Freestone Grove
Domain
Quant Finance
Volatility trading, prime brokerage, risk infrastructure, margin systems, commodities. Structured exotics across equity, FX, and rates.
Engineering
Full-Stack Systems Builder
Custom DSLs, risk calculation engines, ML frameworks, billing infrastructure, real-time dashboards. Python · FastAPI · SQLAlchemy · Snowflake · AWS.
Agentic AI
Production Agentic Patterns
LLM execution sandboxes with pre-promotion validation. Multi-agent orchestration pipelines. LLM-as-judge evaluation harnesses.
Foundation
PhD · Mathematics
Differential geometry, Université Libre de Bruxelles. IMO & IPhO Belgian team, Moscow & Helsinki 1992.

0→1.

Concept · Architecture · Agentic pipeline · Production · Revenue

The product was defined, the system designed, the build directed, and it was shipped to early users — without a single line of code written.