Off-by-none: Issue #363

April 28, 2026

Serverless Isn't Stateless Anymore 💾

In our previous issue, Claude got a major upgrade, AWS made AI costs more visible, and Cloudflare went all-in on agents. This week, serverless becomes less stateless, OpenAI drops two major model upgrades, and Claude goes after creatives. Plus, we've got plenty of content from the cloud, serverless, and AI communities.

News & Announcements

Maybe you noticed that AWS is turning serverless into something a lot more… stateful. Lambda can now mount S3 as a file system with S3 Files, which is a pretty big shift in how you think about data access in functions. Pair that with the Lambda Durable Execution SDK for Java going GA and durable functions expanding to 16 more regions, and it’s clear AWS is moving Lambda toward long-running, stateful workflows without giving up the "serverless" model.

On the agent side, AWS continues its work to remove developer friction. The latest Amazon Bedrock AgentCore updates promise you can get a working agent running in minutes, with new capabilities around orchestration, tooling, and faster setup. That’s backed by additional AgentCore feature releases and infrastructure improvements like Gateway + Identity support for VPC egress, which handles one of the more annoying real-world constraints when connecting agents to private systems.

AWS and Anthropic also continue to get closer. There’s an expanded partnership for massive new compute capacity, and you can now run Claude Cowork directly in Amazon Bedrock. I still think this is a great bet by AWS to own the integration point for the AI model ecosystem.

After last week's Opus 4.7 announcement, you knew it wouldn't be long before OpenAI responded. GPT-5.5 is here with all the expected benchmark wins and a 1M token context window, which is starting to feel less like a flex and more like table stakes. They also dropped ChatGPT Images 2.0, which is scary good. Alongside that, we got workspace agents in ChatGPT, more signs of the next phase of the Microsoft partnership, and a fresh set of “principles” to remind us everything is under control. 😳

Anthropic isn't slowing down either. They just announced Claude for Creative Work, which includes new plugins and integrations with partners like Blender, Autodesk, Adobe, Ableton, and Splice. These are tools that let Claude work directly alongside the software creative professionals are using every day. Their strategy is absolutely 🔥. They’re also rolling out built-in memory for Claude managed agents, now in public beta. Memory is quickly becoming the differentiator, and everyone is racing to make it feel less like a hack and more like infrastructure.

Tutorials

DSQL SQL Dialect: How Amazon Aurora DSQL differs from single-instance PostgreSQL by Rob Petersen
Your AWS Cognito Emails Are Going to Spam — Here Is How to Fix It Step by Step by Tanseer
DynamoDB vs RDS at 10K, 100K, and 1M RPS: a pre-deployment simulation comparison by Abhishek Gupta
Serverless applications on AWS with Lambda using Java 25, API Gateway and Aurora DSQL - Part 6 Using GraalVM Native Image by Vadym Kazulkin
Best practices and architecture patterns for cross-account sharing in Oracle Database@AWS by Yamuna Palasamudram
Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch by Gleb Geinke
Build Strands Agents with SageMaker AI models and MLflow by Dheeraj Hegde
Securing Private Video Content with CloudFront Signed URLs and Serverless on AWS by Lee Gilmore
Building an agent harness by Heeki Park
OpenAI API deployment checklist by OpenAI

Reads

Anthropic Opus 4.6 vs 4.7 - Which is better? A code quality experiment.
An AWS AI Hero tests Claude Opus 4.6 against 4.7 using the same Tetris implementation requirements across 13 code quality dimensions. Some folks are calling 4.6 a step back, but 4.7 seems to be finding its footing. I’ve been pretty happy with it so far. Feels like a reminder that model progress isn’t always a straight line, but the trajectory still points up.

Serverless FinOps: Why Lambda Cost Models Break Every Assumption You Learned from VMs
Riya Mittal explains how Lambda's three-dimensional pricing (invocations, duration, memory) creates a fundamentally different cost model than VMs. Keeping cost top of mind is table stakes now. But optimizing for cost alone misses the bigger picture. Scale, performance, and operational overhead all show up eventually. The real game is balancing all three without painting yourself into a corner.

Building agents that reach production systems with MCP
Nice breakdown of three different ways to wire systems into MCP servers. More importantly, it’s another example of patterns starting to solidify. Still early, still messy, but the industry is slowly converging on what “good” looks like.

Cold Starts Are Dead
Cold starts aren’t what they used to be. There are still edge cases, but for most workloads, they’re manageable or negligible. Eric Johnson covers how platform improvements and better patterns minimize them, resulting in them rarely showing up where it actually matters.

Speeding up agentic workflows with WebSockets in the Responses API
OpenAI explains how they reduced agentic workflow latency by 40% using WebSockets instead of repeated HTTP requests. The technical approach maintains persistent connections and caches conversation state, eliminating redundant processing of conversation history while exposing the full speed of their faster GPT-5.3-Codex-Spark model.

Reducing Token Burn Rate With A Well-Designed Architecture
Teri Radichel walks through building a Lambda troubleshooting system that separates deterministic data gathering from AI analysis. The approach avoids burning tokens on repetitive queries by using traditional code to collect logs and configuration, only invoking AI for interpretation. Stop wasting tokens on repetitive work and only pay for actual insight.

I Run Qwen 3.6 on Two GPUs Because Renting AI Is Boring
Tyler Folkman explains that locally hosted models might not match the top-tier APIs, but they have one big advantage. They don’t go down. While Anthropic and others keep having “moments” (like as I'm writing this), running your own stack looks a lot less boring.

The Escalation Trap
Aaron Sempf walks through three failure modes of human escalation in AI systems: over-escalation creating bottlenecks, selective escalation missing new edge cases, and avoiding escalation entirely. The piece argues for moving escalation decisions to a separate governance layer that evaluates authority boundaries before execution. This is a hard problem.

How I Use Claude Cowork To Write With AI In My Voice
Ran Isenberg walks through his Claude Cowork configuration for generating content that sounds human. But trying too hard to “not sound like AI” can backfire. A lot of the things people avoid, like short sentences and clarity, are just good writing. The goal shouldn't be to hide AI; it should be to help you articulate your thoughts and ideas clearly.

Podcasts, Videos, and more

Serverless CrAIc Ep 83 Psychological Safety in the AI Era (No One Talks About This)
Serverless CrAIc explores how rapid AI adoption challenges team dynamics, mentorship capacity, and organizational culture. Keeping up used to be hard, but now it’s relentless. Fast doesn’t guarantee success, but it does help with learning. And that’s the frustrating part. Watching others move quickly and wondering what they’ve figured out that you haven’t. Also, no, we’re probably not all losing our jobs tomorrow. But it’s not crazy to wonder what the people in charge think.

AWS Lambda durable functions: Best Practices, AI patterns, and Futures | Serverless Office Hours
Michael Gasch and Eric Johnson join Julian Wood to explore the latest in AWS Lambda durable functions, from Java SDK GA, S3 File support, to what's coming next.

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)
Lenny's interview with Cat Wu explores how Anthropic builds products in days rather than months, and why their employees build custom internal tools instead of buying SaaS. Lots of great insights in here, including emerging PM skills in AI and the shift toward managing AI agent fleets rather than doing tasks yourself.

New from AWS

Developer Tools

Create minimal reproductions for AWS SDK JavaScript v3 with create-aws-sdk-repro by John Lwin
AWS released create-aws-sdk-repro, a CLI tool that generates boilerplate for AWS SDK for JavaScript v3 projects. It handles service selection, environment setup (Node.js, Browser, or React Native), and creates projects with proper imports and credentials configuration already in place.

Final Thoughts 🤔

It's getting a lot easier to build systems that don't forget.

Serverless isn't stateless anymore. Agents are getting access to real tools, real workflows, and persistent memory. And the infrastructure is finally starting to reflect that shift, with better primitives for state, orchestration, and long-running execution.

But the tradeoffs are changing. We're layering memory into systems that were designed to be ephemeral. Giving agents persistence across sessions. Letting them interact with tools and data in ways that blur the line between request and workflow. The hard part isn't adding memory. It's deciding what gets promoted from a single session into something durable, what stays scoped to one agent versus shared across many, and what should be forgotten on purpose.

That's where things get complicated. State introduces responsibility. Memory introduces risk. Every piece of context an agent carries forward is something you now have to govern. Who can read it, when it expires, how it's surfaced back into a prompt, and what happens when it's wrong. The more capable these systems become, the more those decisions start to look like product decisions, not implementation details.

At the same time, the direction is forming. Serverless platforms are adding stateful primitives. Agent frameworks are focusing on orchestration instead of just prompts. Memory is becoming a first-class concept instead of a bolted-on feature. Even model providers are starting to expose more control over how context is stored, retrieved, and applied.

It's not just about generating better responses anymore. It's about building systems that can carry context forward. Systems that can act, adapt, and remember without breaking the guarantees we still rely on. The shift is real, and it's accelerating.

Because once systems start remembering, everything else has to change with them.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Previous Issue

Issue #362 • April 21, 2026

Mo’ Models, Mo’ Problems ⚠️

This Week's Top Links

We share a lot of links each week. Check out the Most Popular links from this week's issue as chosen by our email subscribers.

This Week's Sponsor

Check out all of our amazing sponsors and find out how you can help spread the #serverless word by sponsoring an issue.

About the Author

Jeremy is the founder of Ampt, a Cloud & AI consultant, and an AWS Serverless Hero that has a soft spot for helping people solve problems using the cloud. You can find him ranting about serverless, cloud, and AI on Bluesky, LinkedIn, X, and at conferences around the world.

Nominate a Serverless Star

Off-by-none is committed to celebrating the diversity of the serverless community and recognizing the people who make it awesome. If you know of someone doing amazing things with serverless, please nominate them to be a Serverless Star ⭐️!