
Sprint 3 β end-to-end account of how AI was integrated into Wally as a paywall generation engine: what we discovered, what was broken, what we built to fix it, and the framework that now governs every paywall the system produces.
Before writing any rules or standards, we had to find out whether AI could produce store-compliant paywalls at all. Not theoretically β hands-on, with the real schema and the real renderer. The risk of skipping this step was writing a framework around capabilities the model didn't have.
A user prompt flows into app/api/generate/route.ts, which calls a language model (Groq / Llama-3.3-70B) with a structured system prompt. The model must return a single JSON object matching the PaywallPayload type defined in lib/types.ts. That output passes through two validation layers before it can be used:
The critical finding: the model treats the legal block as optional. In unconstrained runs, it was omitted in roughly 60% of outputs. When it did appear, it lacked privacy_url, terms_url, and the show_restore field β the three elements Apple and Google review teams check first. Block ordering was also unreliable: legal appeared mid-screen, CTA appeared after legal, benefits appeared after pricing. Schema validation passed all of these because AJV only checks types and required fields β it has no concept of block order or compliance semantics.
AI generation is viable, but only with a structural contract enforced at two levels: a system prompt that prescribes exact block order and compliance requirements, and a policy check layer that catches what schema validation cannot. The capability exists; the constraints are what make it safe to deploy.
A gap is a mismatch between what the framework requires and what the codebase enforces. Some gaps are silent β the schema accepts a timer block, but no component existed to render it, so the block simply vanished at runtime. Others are compliance gaps β the legal block had no structured fields for Privacy Policy or Terms of Use URLs, so those links existed only in free-text copy that AI could omit without triggering any check.
G1 is the most instructive gap. Timer blocks were fully defined in the schema, generatable by AI, and would pass AJV validation β but at render time, PaywallRenderer.tsx hit the default: return null branch in its switch statement and silently dropped the block. No error, no warning. The paywall would look complete in preview and broken in production. The fix was both the component and a discipline: every block type in the schema must have a renderer, and new block types must be added to both simultaneously.
Every paywall β whether AI-generated or hand-authored β must pass both layers before it can be saved or deployed. Layer 1 is structural: AJV compiles schemas/paywall.schema.json at startup and checks types, required fields, enum values, URI formats, and numeric minimums. Layer 2 is semantic: lib/policyChecks.ts runs 15+ checks that require understanding the meaning of blocks, not just their shape.
The contrast check uses the real WCAG 2.1 relative luminance formula β not an approximation. Each hex colour channel is converted to linear light via the sRGB gamma function (channel β€ 0.03928 uses linear division; above it uses the power curve). Luminance is then calculated as 0.2126R + 0.7152G + 0.0722B. The ratio between lighter and darker is compared against 4.5:1. Any payload whose CTA button fails this check is rejected with the exact ratio in the error message so the author knows how far off it is.
Without a single source of truth, Product, Design, and Engineering each had different mental models of what a valid paywall looked like. Designers would add elements that had no schema field. Engineers would merge paywalls that passed AJV but would fail App Store review. AI would generate structurally plausible but non-compliant output. The framework resolves all three problems with one shared contract.
Maps Apple App Store (14 rules) and Google Play (10 rules) subscription requirements directly to schema fields and policyChecks.ts checks. Every compliance rule has a code (A1βA14, G1βG10) that is referenced in policyChecks.ts violation messages so engineers know which policy a failure violates.
Defines the 5 mandatory zones (hero β benefits β pricing β cta β legal) and the 2 optional zones (social_proof, timer). Documents 14 experimentation dimensions β which ones can vary between variants (copy, badge, urgency, price framing, social proof) and which ones cannot (zone structure, legal fields, product_refs, tracking events, TTL).
Traces every block type to its renderer component. Documents the gap registry (G1βG10) and the resolution status of each. Defines the implementation patterns for sticky CTA (position: sticky), floating badge (absolute top:-14px), and timer (deadline_utc preferred, stops at zero).
The single source of truth. Combines T1βT3 into a unified reference and adds the production-ready 26-rule AI system prompt. Rules cover: block structure, compliance, copy constraints, experiment limits, tracking requirements, and output format. This is the prompt that should live in app/api/generate/route.ts.
Every list or cards layout paywall must contain exactly these zones in this order. policyChecks.ts verifies all five are present and correctly sequenced on every save.
The AI generation endpoint (app/api/generate/route.ts) is currently disabled β it returns HTTP 503 with a clear message. This is a deliberate decision: the 26-rule system prompt from T4 has not yet been deployed to replace the original prompt. Enabling generation before updating the prompt would produce paywalls that bypass the structural constraints the framework defines. The endpoint stays off until the T4 prompt is live.