The Architecture of Not Waiting · Piero Bozzolo, the serverless guy

Lumcast is a personal side project I built to explore what a fully serverless, AI-powered audio pipeline looks like end to end.

The idea is simple: you open the iOS app, describe a topic, pick a duration and style, and the app generates a complete podcast episode — script written by a Claude model, narrated by Amazon Polly’s neural TTS, chaptered, and ready to play or download.

No recording, no editing, no microphone required. The iOS application is underway and if you want to join the testflight, write me an email and i’ll be happy to add you to the test team :)

The stack is AWS serverless throughout: API Gateway, DynamoDB, Step Functions, Bedrock, Polly, S3, and a SwiftUI iOS client. It’s not a startup, not a product team — just one developer
trying to figure out how far you can push managed services before you hit their limits. Some of those limits turned out to be interesting.

This post is about one of the first architectural decisions I made, and probably the one with the most downstream consequences.

There is a category of serverless mistake that only reveals itself under real traffic: putting a Lambda function in the path where a user is waiting for a response.

Lambda cold starts are real. They range from 100ms for a warm Python runtime to over a second when a VPC is involved. For a backend that does nothing but validate a request and write a record, that latency is pure waste — you’re paying in user experience for work the runtime is doing on your behalf, not work your code is doing. And when your system also needs to invoke an LLM, run TTS synthesis, and concatenate audio files — work that takes 30 to 90 seconds — the cold-start problem in the synchronous path becomes almost insulting.

Lumcast’s answer is blunt: no Lambda function is ever invoked while a user is waiting for a response. The entire generation pipeline runs asynchronously. The user gets a 202 Accepted and a job ID, and everything else happens in the background. That single decision shapes the entire architecture.

The Ingestion Path

When a client posts a podcast generation request, this is what happens:

sequenceDiagram
    participant iOS
    participant APIGW as API Gateway
    participant DDB as DynamoDB
    iOS->>APIGW: POST /podcasts/generate (JWT)
    APIGW->>DDB: PutItem (direct integration)
    DDB-->>APIGW: 200 OK
    APIGW-->>iOS: 202 Accepted { jobId, pollUrl }

No Lambda. API Gateway uses a service integration — a first-class feature that lets you call AWS APIs directly from a mapping template without invoking any compute. The $context.requestId from the gateway becomes the podcast ID. The request body is mapped to a DynamoDB Item format in a VTL template. The whole round trip — from iOS sending the request to receiving a 202 — involves zero Lambda cold starts and zero Lambda invocations. It costs a fraction of a cent in API Gateway + DynamoDB write capacity, deterministically, every time.

The DynamoDB item lands with status: "pending". That status field is the trigger for everything that follows.

The System Map

It helps to see the full picture before diving into any single layer:

graph TD
    A[iOS Client] -->|POST /generate JWT| B[API Gateway]
    B -->|direct PutItem| C[DynamoDB\nlumcast-podcasts]
    C -->|NEW_IMAGE stream| D[EventBridge Pipe\nfilter: INSERT + pending]
    D -->|enqueue| E[SQS]
    E -->|batch=1| F[Lambda\nsqs-consumer]
    F -->|StartExecution| G[Step Functions\nGeneration Pipeline]
    G -->|10 states| H[Audio + Transcript\nin S3]
    G -->|final state| I[Lambda\nsfn-callback]
    I -->|UpdateItem| C
    I -->|UpdateItem| J[DynamoDB\nlumcast-jobs]

The synchronous path is the left edge: iOS → API Gateway → DynamoDB. Everything else is asynchronous. The client polls GET /podcasts/{id}/status — served by a Lambda that reads the jobs table — until the job reaches ready or failed.

Why Direct Integration Instead of a Lambda?

The obvious implementation would be: API Gateway → Lambda → DynamoDB. That’s what most tutorials show. Lumcast skips the Lambda for three reasons.

Latency. Even a warm Lambda adds overhead. A direct integration runs in the API Gateway process. P99 latency drops measurably.

Cost. For a write-and-return endpoint, the Lambda does almost nothing: deserialise the event, call put_item, return a response. That’s not worth the Lambda invocation cost at scale.

Failure surface. A direct integration has no application code to crash, no dependency to fail to import, no memory limit to exceed. Its failure modes are narrower and better understood.

The trade-off is real: VTL mapping templates are not fun to write or debug. The request template must manually escape user input, construct DynamoDB’s wire format ({"S": "value"}), and handle optional fields. There is no unit test for a VTL template. But for an endpoint this simple, it is the right trade.

On authentication: API Gateway validates the Cognito JWT before the integration fires. The authorizer checks the aud claim, which is only present in ID tokens — not access tokens. This is a subtle but important distinction: the iOS client must request an ID token for API calls, not an access token.

The Status Polling Contract

The client doesn’t know when the job finishes. It polls. The polling contract is simple:

Status	Meaning	Client action
`pending`	Queued, not yet in SFN	Keep polling
`processing`	SFN execution running	Keep polling
`ready`	Audio + transcript available	Fetch presigned URLs
`failed`	Pipeline error	Show retry option

Presigned S3 URLs (1-hour expiry) are generated fresh on each GET /status call. The URLs are not stored in DynamoDB — that would couple the storage address to the record lifetime. Instead the Lambda reads the S3 keys from DynamoDB and generates URLs on demand.

Apply This

1. Direct service integrations for simple write endpoints. API Gateway supports DynamoDB, SQS, SNS, EventBridge, and others without Lambda. If your Lambda does nothing but receive a payload and call one AWS API, replace it. The pitfall: VTL templates have poor tooling. Keep them minimal and test with API Gateway’s built-in test console.

2. Separate the synchronous path from the asynchronous path explicitly. Define a hard rule: the synchronous path only accepts, validates, and acknowledges. It never calls downstream services, never waits for I/O, never computes. Enforce this boundary architecturally — a DynamoDB write can’t accidentally become a blocking Bedrock call.

3. Use $context.requestId as a generated ID. For resources that need a globally unique ID, API Gateway’s request ID is UUID-shaped, free, and already available in the response. No UUID library required in a Lambda.

4. Bake status transitions into the data model from day one. pending → processing → ready | failed is not an afterthought. Every downstream component in the pipeline reads or writes this field. If you define it late, you retrofit every subscriber. Define your state machine on the data before you write any business logic.

5. Accept that the first 202 is a promise, not a result. This is the fundamental UX bet of async architectures. The client must handle a polling loop (or WebSockets, or push notifications). Build the polling contract — status enum, poll interval, terminal states — before building the pipeline. Post 2 covers what triggers the pipeline once that DynamoDB record lands.