Agentic AI Platform

Storymaker

An agentic AI platform for authoring illustrated books —
from concept to print-ready PDF, fully orchestrated.

6 AI agents in production 4-model evaluation harness Full-stack · Python · FastAPI · Postgres

02

PRODUCT OVERVIEW

What Was Built

Storymaker is a full-stack, multi-agent book creation platform. A user enters a story concept; the system orchestrates AI agents across scene authoring, illustration generation, multi-model accuracy grading, PDF assembly, and promo page publishing — entirely autonomously.

The architecture, product logic, and agentic pipeline were designed end-to-end. Zero lines of code were written by hand.

03

ARCHITECTURE

End-to-End Agentic Pipeline

From concept to print-ready PDF — fully orchestrated

01

📝

Scene Authoring

Prompt file parsed into Scene rows via sync_scenes_from_document

›

02

🎨

Character Reference

One-shot char sheet image; idempotent S3 key; reused across all scenes

›

03

🖼️

Illustration Agent

asyncio.gather ×5 concurrent; multimodal prompt + char ref; Gemini 2.5 Flash

›

04

⚖️

Accuracy Jury

4-model parallel LLM-as-judge; inter-model confidence scoring; partial-failure safe

›

05

📚

PDF Assembly

PyMuPDF block-by-block composition; 1024×1024pt pages; font embedding

›

06

🌐

Promo Publish

Auto-generated scroll page: cover, synopsis, scenes, parent notes, Q&A

Python 3.11 + FastAPI PostgreSQL + SQLAlchemy 2.x OpenRouter Gemini 2.5 Flash Image GPT-4o-mini Claude 3.5 Haiku Mistral Large PyMuPDF S3 / MinIO asyncio Render Next.js Docker

04

AGENT: IMAGE GENERATION

Scene Editor & Illustration Agent

Prompt-driven illustration with character continuity

Each scene carries a structured image description with injected variables (character traits, settings, visual style). The illustration agent calls gemini-2.5-flash-image via OpenRouter with a multimodal character reference sheet, ensuring visual consistency across all 20 scenes.

Semaphore-bounded asyncio.gather runs up to 5 scene generations in parallel. If OpenRouter rejects a reference-image request, a single retry fires with text-only context — partial failures never abort the batch.

Scene 1 of 20 — 'Everywhere I Look'. Left: structured prompt + page text. Center: live illustration. Right: print text panel.

05

AGENT: GENERALIZATION

Multi-Domain Book Generation

Same pipeline, radically different outputs

The system is fully domain-agnostic. The same agentic pipeline that generates a children's emotional literacy book produces a Pixar-style historical non-fiction illustrated narrative — no code changes, only a different prompt file and character sheet.

The framework adapts to domain through configuration, not re-engineering — a core property of well-designed agentic systems.

Scene 15 — Gandhi's Salt March (30-scene book). Different domain, identical pipeline architecture.

06

AGENT: EVALUATION HARNESS

Multi-Model LLM-as-Judge

Automated accuracy verification across 4 models

A parallel asyncio jury grades every scene: accuracy score (0–100), editorial dimensions (age appropriateness, political risk, reading level, mood, style), and inter-model agreement.

Confidence score blends mean grade with inter-model spread to detect specification gaming. Adversarial probe injection tests robustness.

Partial model failures surface as null grades — the jury never aborts. Models: GPT-4o-mini, Gemini 2.0 Flash, Claude 3.5 Haiku, Mistral Large.

Accuracy dashboard: clean text 97/100 vs adversarial probe 52/100. Per-model deltas, editorial dimension scoring shown.

07

PLATFORM

Library & Storyline Orchestration

Multi-book, multi-user, concurrent

Each book is an independent storyline with its own scene graph, prompt file, character reference, and assembly block sequence. Multiple books can be in active generation simultaneously, each using an isolated SessionLocal() to prevent ORM conflicts across concurrent commits.

Per-book actions — Editor, Assemble, Accuracy, Promo.

Library: multiple concurrent books across different genres and audiences. Right: new storyline creation flow.

08

AGENT: OUTPUT PUBLISHING

Auto-Generated Promotional Page

From assembled book to market-ready shareable page — no design tool, no manual layout.

09

OBSERVABILITY

AI Cost Attribution by Call Type

Full observability into per-operation model spend

Every AI call is logged to an ai_calls ledger: model, call_kind, cost_usd, latency, user, and storyline linkage. This granularity enables ongoing pricing model validation, per-user profitability analysis, and model benchmarking across call types.

Knowing the cost and performance of each agent operation — not just the total — is what allows a system to be optimised, not just monitored.

AI usage & spend: breakdown by call_kind and by model. OpenRouter as unified routing layer across 7 models.

10

PLATFORM INTELLIGENCE

Operational & Financial Tracking

Full visibility into revenue, AI spend, credit activity, and live data queries — built in.

Revenue vs API cost · per-user margin · lifetime view

Credit ledger activity · 7-day step chart · all users combined

AI usage & spend · by call type · by model · 30-day window

Admin insights · natural language queries · read-only live DB

11

Core AI Engineering Skills — Demonstrated

Not described. Built, shipped, and running.

Skill

Multi-agent orchestration across domains

6 discrete agents coordinated as a linear pipeline with fan-out parallelism. Demonstrated across children's literacy, biography, and historical narrative — same framework, different domain.

Skill

LLM-as-judge & robustness evaluation

4-model parallel grading jury with adversarial probe injection. Confidence scoring detects specification gaming via inter-model spread. Partial failures never abort the jury.

Skill

Agentic harness design

Communication architectures for agents: semaphore back-pressure, isolated ORM sessions per concurrent task, idempotent object keys for expensive operations, retry logic with graceful degradation.

Skill

Vertical-specific product design

Purpose-built for a specific workflow — not a general tool with a chatbot bolted on. The entire pipeline exists to serve one domain deeply.

Skill

Full-stack data pipeline & infrastructure

FastAPI + PostgreSQL + S3 + OpenRouter + Render. Ledger-pattern cost and credit accounting. Per-call AI attribution. Async parallelism with semaphore control. Production on day one.

Skill

0→1 under uncertainty

Sole architect of a production platform — concept, system design, agentic pipeline, monetisation, and observability layer — shipped to paying users. Zero lines of code written by hand.

12

THE BUILDER

Joel Horowitz

20 years of quant finance.
The last 2 in agentic AI.

PhD in Mathematics (ULB). Started as an exotic derivatives trader, became a quant strategist, then a full-stack quant engineer. The through-line: building systems that let non-engineers work at the level of engineers.

At Freestone Grove (L/S equity), currently building LLM-powered platforms for data scientists: sandboxed execution environments, Pydantic-enforced configs, hallucination-resistant by design — production agentic safety patterns.

Goldman Sachs · London & NY · 9 yrs BlueMountain Capital · 5 yrs Squarepoint Capital · 2 yrs Hidden Road Partners Freestone Grove

Domain

Quant Finance

Volatility trading, prime brokerage, risk infrastructure, margin systems, commodities. Structured exotics across equity, FX, and rates.

Engineering

Full-Stack Systems Builder

Custom DSLs, risk calculation engines, ML frameworks, billing infrastructure, real-time dashboards. Python · FastAPI · SQLAlchemy · Snowflake · AWS.

Agentic AI

Production Agentic Patterns

LLM execution sandboxes with pre-promotion validation. Multi-agent orchestration pipelines. LLM-as-judge evaluation harnesses.

Foundation

PhD · Mathematics

Differential geometry, Université Libre de Bruxelles. IMO & IPhO Belgian team, Moscow & Helsinki 1992.

0→1.

Concept · Architecture · Agentic pipeline · Production · Revenue

The product was defined, the system designed, the build directed, and it was shipped to early users — without a single line of code written.