Off-by-none Serverless Newsletter

Issue #371: Could your SaaS be replaced by a Markdown File? 📝

2026-07-07T12:00:00Z

Could your SaaS be replaced by a Markdown File? 📝

In our previous issue, Anthropic launched Claude Sonnet 5, CloudFormation got much faster, and OpenAI started making Jalapeños. In this issue, Claude Cowork breaks free of your laptop, MiniMax lands on Bedrock, and we weigh what taste and judgment are still worth. Plus, we have lots of awesome content from the cloud, serverless, and AI communities.

News & Announcements

Most of the interesting news from this week is around agent plumbing, and AWS still seems to be doing a lot of that work. The most interesting is structured memory filtering with metadata in AgentCore Memory, which layers attribute-based filtering on top of namespace isolation. Memory scope and attributes are two different things, and it's nice to see that AWS is getting this one right. AgentCore also bumped its default runtime quotas, now up to 200 agent interactions and 25 new sessions per second, with US East and West carrying 5,000 concurrent sessions. More headroom is good, just be sure to use it wisely.

On the model side, MiniMax models are now on Amazon Bedrock across 14 regions. I like the MiniMax models, and with tool-calling, implicit prompt caching, and $0.30 price tag per million input tokens, it could be a nice drop-in for your agentic workflows. AWS also shipped an open source MCP server for the Registry of Open Data, giving AI assistants access to more than 1,100 public datasets for natural-language discovery. Useful for anyone who wants to ground their research against satellite imagery, climate, genomics data, etc. without knowing the catalog cold.

For the folks who actually keep things running in the cloud, ECS picked up real-time deployment observability in the console with a live timeline, circuit breaker monitoring, and failed-task diagnostics at no extra charge, which is exactly the kind of thing you appreciate at 2 a.m. while watching a rollout stall. And Cognito moved API rate limits to self-service, so you can raise them from the console up to your account maximum instead of opening a Service Quotas ticket and waiting.

Outside of AWS, Claude Cowork landed on web and mobile with background execution and scheduled runs, so a task can prep your morning briefing at 6am and ping your phone when it needs a decision before continuing. That was a big theme at the AI Engineer World's Fair last week, workflows moving off your laptop so you're no longer limited by compute (just money). Together AI raised $800M at an $8.3B valuation to grow its AI-optimized public cloud. They claim their ATLAS technology speeds up some inference workloads by as much as 400%.

Vercel will now run any Dockerfile on its Fluid compute platform, so Go, Rails, or Spring Boot deploy with the same preview-and-scale workflow as everything else. And EventCatalog v4 reframes itself from event docs to a broader architecture catalog, adding a Systems resource type and an agent that keeps your docs synced to the codebase.

Tutorials

Teaching models to forget: Selective unlearning with Amazon Nova by Qian Hu
Build a serverless image editing agent with Amazon Bedrock AgentCore harness by Salman Ahmed
Automatically redact PII in images with Amazon Nova by Caroline Des Rochers
Data masking in Amazon RDS for Oracle by Jobin Joseph
Choosing a Claude model and effort level in Claude Code by Lydia Hallie
I Built PR Preview Environments With AWS Lambda MicroVMs and Cut Staging Costs by 78% by Jatin Mehrotra

Reads

Ten years of micro-frontend decisions, condensed into a skill by Luca Mezzalira
Luca took a decade of hard-won micro-frontend lessons and packed them into a skill that keeps AI agents from wrecking your boundaries. If you've ever watched a cross-boundary import or a shared global sneak into a codebase, this is the guardrail you wish you'd had.

What America has meant to me by Shawn "swyx" Wang
Swyx walks through his three "runs" in America across almost twenty years, from finance to engineering to founding Latent Space and the AI Engineer movement.

Claude Sonnet 5 Launch Analysis: What Changed, What Matters, and What to Validate by Guille Ojeda
Guille breaks down what actually changed in Sonnet 5, including the adaptive thinking defaults, effort controls, and a tokenizer shift that bumps your token counts by about 30%. The real takeaway is to validate against your own workloads instead of trusting the benchmarks.

In defense of AI mandates by Charity Majors
Charity argues that top-down AI mandates give managers the cover they need to actually spend time and budget on adoption. Her point is that companies have to decide whether AI is existential or optional, then fund that decision like they mean it.

A Field Guide to Claude Fable: Finding Your Unknowns by Thariq Shihipar
Thariq lays out a framework for categorizing what you don't know and applying the right pattern at each stage of an AI coding workflow. The blind spot passes and pre-implementation planning are the parts you should steal first.

I replaced my GitHub runners with Lambda MicroVMs, and maybe you should too by Luc van Donkersgoed
Luc swaps GitHub-hosted runners for Lambda MicroVMs and is honest about the tradeoffs, which land at minimal cost savings but a few minutes faster per run. The real cost is the operational overhead of owning your runner infrastructure, and he doesn't pretend otherwise.

What if building an AI chatbot was as easy as snapping LEGO bricks together? - AWS Blocks by Luis Fernando de León Ramírez
Luis shows off AWS Blocks, the new AWS open-source framework that turns TypeScript into AWS serverless services with a snap-together feel. The walkthrough covers local dev, sandbox testing, IAM, and shipping a Bedrock Nova Lite chatbot with real code.

Lambda MicroVMs vs AgentCore Runtime: When to Use Each for Production Agents by Gerardo Arroyo
Gerardo lays out when to reach for Lambda MicroVMs versus AgentCore Runtime for production agents, and where the two actually complement each other. The framing around coding agents that need secure execution sandboxes is the useful bit.

15 things I learned at AI Engineer World’s Fair 2026 by Dave Thackeray
Dave's fifteen takeaways cover the mismatch between probabilistic models and deterministic infra, context cost, and patterns like semantic routing and post-generation veto systems. Since I was there too, I can tell you his list holds up.

The AI Chatbot Era Is Ending. Teams Are Optimizing the Wrong Layer. by Tyler Folkman
Tyler Folkman argues that teams should stop optimizing prompts and start designing delegation frameworks for AI agents. He presents a five-layer Delegation Stack based on hundreds of agent sessions, backed by recent data from OpenAI and Anthropic showing enterprise shifts toward agent-based workflows.

Podcasts, Videos, and more

Sonnet 5 review: I ran 64 generations to find out if it's worth it
Claire ran 64 generations pitting Sonnet 5 against Sonnet 4.6, Opus 4.8, GPT-5.5, and Gemini 3 Pro with her own eval framework. She scored PRD quality, prototype generation, agentic completion, and agent personality, and the results surprised her.

AI Engineer - YouTube
The AI Engineer channel is where the World's Fair sessions land once they're posted, so subscribe if you couldn't make it in person. Right now, all the livestreams of the keynotes and main stage track are posted, so check those out when you get some time.

New from AWS

Thoughts from Social

“Half of the companies here at @aiDotEngineer could be a markdown file!” ~ @theo 🌶️🌶️🌶️ pic.twitter.com/kxaTCZUd2G
— Jeremy Daly (@jeremy_daly) July 2, 2026

Developer Tools

Simplify model selection in Amazon Bedrock with the open source Model Profiler by Maria Oliva Calero
Maria walks through the Bedrock Model Profiler, an open-source tool that pulls 120+ foundation models into one searchable view. The serverless pipeline stitches together five AWS APIs and two public sources so you can filter on pricing, region, quotas, and lifecycle without tab-hopping.

Final Thoughts 🤔

Theo's comment that half the companies at the AI Engineer World's Fair could be a markdown file landed hard for some. Because he's not entirely wrong. Point a capable model at a well-structured markdown file, wrap it in a skill and hand it to Claude, and a lot of what those startups demoed just falls out the other end. Instruction-following has gotten good enough that the markdown file basically becomes the product.

The catch is that a markdown file captures what you want done, not the judgment behind how it's done. Romain Huet from OpenAI made this point in his keynote: engineering has always been about solving problems by combining the latest science "with design, with taste, with judgment, and most of all, imagination" to make something people can actually use. That judgment is what a lot of SaaS companies are actually selling. They've spent years sequencing business processes and getting them battle-tested across thousands of customers, and the good startups are staffed by domain experts who already worked out the right way to handle a problem so you don't have to. Replace that with a markdown file and you're trading years of accumulated judgment for something you wrote in an afternoon.

Then there's the bill. Running a skill once is cheap and kind of magical. Running it thousands of times a day against a frontier model is a line item nobody budgeted for. The workflows that actually scale are mostly deterministic, with the model reserved for the handful of points where a real decision has to be made: read this data, decide whether to proceed, escalate to a human, or route somewhere else. That's where an LLM earns its cost. Wrapping the whole process in one because you can is how you light money on fire.

And almost all of it shows up in the last 10%. AI might get you 90% of the way to a working product in a weekend, and that first 90% is impressive. But the last 10% is the edge cases, the industry quirks, and the hard-won defaults that come from domain experts and a team of PMs and engineers talking to real customers. You'll build the features you use every day and feel great about it, right up until you hit the edge cases and nuances that really matter. That gap is brutal, and no markdown file is going to close it for you.

So could your SaaS be replaced by a markdown file? For a thin wrapper, sure, and probably very soon. For anything that encodes real judgment about a hard problem, the markdown file gets you a convincing demo and a bill for the 10% you didn't build. Know which one you're building.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #370: Live from AI Engineer World's Fair 🎡

2026-06-30T12:00:00Z

Live from AI Engineer World's Fair 🎡

In our previous issue, AWS Summit NYC unsurprisingly went heavy on AI agents, Lambda picked up MicroVMs for isolated sandboxes, and AWS Blocks brought IfC back into the conversation. This week, Anthropic launches Claude Sonnet 5, CloudFormation gets much faster, and OpenAI starts making Jalapeños. Plus, we have plenty of excellent cloud, serverless, and AI content from the community.

News & Announcements

I'm hanging out at the AI Engineer World's Fair this week in San Francisco, and the vibe here is amazing. We'll get to that in a minute. The big headline is Claude Sonnet 5. Anthropic shipped it with a real jump in coding and agentic performance at Sonnet pricing, with introductory rates of $2 and $10 per million tokens through August 2026. It's already on AWS via Bedrock and the Claude Platform, and Aamna Najmi's walkthrough has the SDK and Converse examples.

Anthropic paired that with platform news: Claude is GA in Microsoft Foundry (Opus 4.8 and Haiku 4.5, Azure- or Anthropic-hosted), a self-hosted Claude apps gateway that adds SSO and centralized policy to Claude Code on Bedrock and GCP, and agent identity, which finally gives Claude Tag its own workspace account instead of borrowing a human's.

On AWS, CloudFormation got faster with Express mode speeding up stack operations up to 4x by returning once a configuration is applied (Channy's writeup covers the tradeoffs), and pre-deployment validation now catches quota, Config, and ECR issues before provisioning. Faster loops and fewer failed deploys is what the codegen era needs out of IaC. ElastiCache added Valkey 9.1 with a new I/O threading model and commands like HGETDEL (more details here).

Also note that AWS is moving several services to maintenance mode on July 30, including Bedrock Agents (now Bedrock Agents Classic), Kendra, and Q Business. Existing customers can stay, but if you just haven't gotten around to trying out myApplications on the AWS Console yet, you're out of luck. 😉 Glad to see that AWS continues to trim some fat, but as I've said before, they should have taken the whole leg at once instead of just cutting out the rot. That's a horrible visual, but an apt analogy IMO.

Outside AWS, Oracle opened up MySQL governance with AWS and Google Cloud on the steering committee, OpenAI and Broadcom unveiled Jalapeño, a from-scratch LLM inference chip already running GPT-5.3-Codex-Spark, and OpenAI previewed GPT-5.6 in three variants with a phased rollout. I know some people with access and now I'm super jealous.

Tutorials

Getting started with loops by Anthropic
Build generative UI for AI agents on Amazon Bedrock AgentCore with the AG-UI protocol by Ryan Razkenari
Fine-tune Amazon Nova models for accurate email data extraction by Le Vy
Pair Nova 2 Lite with Claude for cost-optimized document processing by Sanghwa Na
How Loka Built a Natural, Low-Latency Voice Agent with Amazon Nova 2 Sonic by Bojan Jakimovski
User authentication and session management with Amazon Aurora DSQL by Chaitanya chary Chatlapally
AWS Amplify + Lambda MicroVMs = A Serverless Linux Desktop! by Kanahiro Iguchi
I made an AWS Lambda MicroVM publicly accessible for $0/month (here's the full setup) by Alexey Vidanov

Reads

Lessons learned from scaling to 1 million Lambda functions by Ben Freiberg
A million Lambda functions across thousands of accounts is the kind of scale that breaks tooling nobody expects to break, and CloudFormation StackSets is right at the top of that list. Read it for the observability cost lessons, which is where most teams get caught off guard.

PSA: That probably doesn't need to be SaaS | Ready, Set, Cloud! by
Allen's point about builders shipping products now instead of writing up what they learned is one I've felt myself lately. The tradeoff he names, easier building at the cost of shared knowledge, is real and has stuck with me.

Make AI Boring Again by Charity Majors
Charity's case for learning AI so you understand how it fails, rather than opting out, is the right instinct for engineers. She doesn't wave away the real problems around training data, labor, and energy, which is what makes the argument land.

AWS Fargate vs Lambda: When Does Lambda Stop Being Cheaper? by Matt S
The useful reframe here is that Lambda's breakeven is driven by execution duration rather than request volume, which is backwards from how most people reason about it. A 200ms API staying cheaper up to 6-8M calls a month is a handy number to keep in your back pocket.

Why We Built Our Own CRM for Under $5 using AWS Kiro by Lee Gilmore
Lee provides a solid writeup advocating build versus buy, with a clever DynamoDB-to-Aurora DSQL sync using change data capture. Just remember the few dollars a month doesn't include the time you'll spend maintaining it, which is the part that'll bite you. But I'd probably build this myself too. 😂

Why I still approve my memory by hand by Javier Villanueva
The argument that human-in-the-loop curation beats automated validation for a single-user knowledge base is the pattern I recommend. Automated approval mostly gives you correlated bias dressed up as confirmation, which is a trap you don't want at this scale.

A return to two-pizza culture by Dr. Werner Vogels
Werner tying two-pizza teams to AI agents is a sharp framing, and the Quick Desktop story of an overnight prototype reshaping how the team planned is the example that sells it. The part I'd watch is what happens to documentation once prototypes get this cheap.

Impressions from visiting OpenAI, Anthropic, & Cursor by Gergely Orosz
Gergely's four trends are a good pulse check, especially cloud agents going mainstream and engineers optimizing their code for agent efficiency. The cost-reduction pressure he describes is what I'd keep an eye on, since it shapes what these tools become next.

What you need to know about Lambda MicroVMs by Yan Cui
Yan's framing is the clearest I've seen: MicroVMs sit closer to EC2 than Lambda, since you're running persistent VMs and managing the fleet yourself. If you came in expecting request/response Lambda ergonomics, read this first to reset your expectations.

I Tried AWS Blocks on a Real Amplify Gen2 Project — Local DynamoDB, No AWS Account, 1-Second Loops by Kohei Aoki
A hands-on look at AWS Blocks with simulated local DynamoDB and one-second feedback loops instead of cloud deploy cycles. The fast feedback is a nice side-benefit, but the real win is not having to bifurcate your business logic into IaC.

Podcasts, Videos, and more

Introducing AWS Lambda MicroVMs | Serverless Office Hours
The Serverless Office Hours crew demos MicroVMs live, which is the fastest way to see snapshot launches and suspend/resume in action. If you want the mechanics behind the sandbox-per-session pattern, start here before the docs.

GLM 5.2: why I’m replacing Opus in Claude Code with this new model
Claire's walkthrough of dropping GLM 5.2 into Claude Code is a useful look at what open-weight actually buys you on cost and vendor independence. She's clear about where it fell short too, which keeps it from being a hype piece.

New from AWS

Final Thoughts 🤔

If 2025 was the year of agents, 2026 is the year of loops and software factories. That was the throughline at the AI Engineer World's Fair today, where more than 7,000 engineers gathered in San Francisco to compare notes and trade war stories. Shawn "swyx" Wang set the stage with a talk about Loopcraft, tracing how the loops keep compounding until you reach the highest one: engineers learning from each other. Peter Steinberger, the creator of Open Claw, and probably several steps ahead of most, put the operational edge on it: keeping ten terminals open to babysit your agents is already the old way. An agent manager that lets you drop into session when you need to take control is what comes next.

The energy around that vision is hard to miss, and the pace backs it up. Romain Huet and Alexander Embiricos from OpenAI mentioned they're shipping new models roughly every six weeks now, a cadence that would have sounded absurd a year ago. And it isn't only the frontier labs. Zixuan Li from Z.ai dialed in to show off GLM 5.2, an open-weight model built for long-horizon tasks that's pretty close to Opus 4.8 and GPT 5.5 on the benchmarks. Claire Vo's video on swapping Opus for GLM 5.2 in Claude Code says it holds up in real work too, and it has me tempted to throw an RTX 5090 in an Ubuntu box and run my own local AI lab. Capability is getting faster, cheaper, and more portable by the month.

But the cracks are starting to show, and it's in the same place it always is: software maintenance. The software-factory pitch was great for greenfield, and one-shotting a brand new app is exactly where these current models shine. Dexter Horthy hammered on this in his harness talk, that maintaining all the AI slop we're generating starts to break down after only a few months, and there's still a stubborn list of problems the agents can't solve without human intervention. The models are getting great at producing something from nothing, yet still struggle the moment you point them at a large, living codebase, including the ones it generated from scratch.

That's the gap I keep coming back to. Faster models, open weights, even a local lab of my own, none of it touches the part where the code you wrote three months ago now needs maintenance, new features, and security/performance upgrades. Loops are fantastic at the start of a project and shaky in the messy middle, where most useful software lives. If 2026 really is the year of software factories, the interesting work is less about building faster and more about whether the loop can survive contact with a real world software lifecycle, because that's the part nobody has solved yet.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #369: Infrastructure FROM Code is Back! 🧑‍💻

2026-06-23T12:00:00Z

Infrastructure FROM Code is Back! 🧑‍💻

In our previous issue, the US government ordered Anthropic to pull Fable 5 and Mythos 5, AWS WAF started charging AI bots for content, and Bedrock added Grok, Gemma, and a pair of GPTs. This week, AWS Summit NYC unsurprisingly goes heavy on AI agents, Lambda picks up MicroVMs for isolated sandboxes, and AWS Blocks leans into IfC. Plus, we've got lots of amazing cloud, serverless, and AI content from the community.

News & Announcements

Infrastructure FROM Code is back, baby! AWS announced AWS Blocks, an open-source TypeScript framework for composing application backends without wrestling with the underlying infrastructure tooling, and if the concept sounds familiar, it should. It's very close to the ideas we built into Ampt, and the thinking behind it is still as powerful as ever. You write application code, Blocks infers the services it needs, runs locally with built-in auth, and deploys to production AWS with no code changes. Seeing AWS adopt and lean into IfC development this directly is amazing to watch. It's early and still in preview, but it looks very promising and could be a really nice companion for lots of different projects.

Most of what follows came out of the AWS Summit in New York, and the recap is the fastest way to see the full slate in one place. The headliners are worth taking one at a time.

The one I keep coming back to is Lambda MicroVMs, a new compute primitive built on Firecracker that gives you VM-level isolation with near-instant launch and resume. You get stateful sessions that persist memory and disk for up to eight hours, full lifecycle control to launch, suspend, resume, and terminate, and support for HTTP/2, gRPC, and WebSockets. The AWS blog walks through the mechanics: you supply a Dockerfile and a code artifact, Lambda builds a Firecracker snapshot with your app already initialized, and each user session gets its own environment with up to 16 vCPUs and 32 GB of memory. The pricing isn't great (you still pay for wall time 🤦), but this is one of the most interesting things to happen to Lambda in a while.

Amazon Bedrock Managed Knowledge Base is now generally available, a fully managed take on RAG with six native connectors (S3, SharePoint, Confluence, Google Drive, OneDrive, and a web crawler), managed vector storage, hybrid search, and an agentic retriever that can break multi-hop queries into step-by-step plans. The launch post has the details, including the Smart Parsing pass for content optimization. If you've been hand-rolling RAG plumbing, this collapses a lot of it into a managed service.

Bedrock AgentCore also had a big week. The AgentCore harness is now GA, a config-based path to deploying agents with managed runtime, memory strategies, multi-model support across Bedrock, OpenAI, and Gemini, and Step Functions integration, with an export path to custom code when you outgrow it (AWS frames it as going from idea to production agent in two API calls). Web Search shipped as a GA feature in US East (N. Virginia), giving agents real-time web access through Amazon's own index without bolting on an external provider. There are two solid reads on it from Channy and the ML blog, just watch out for that $7-per-1,000-queries pricing. 😬 Guardrails in policy went GA for evaluating agent actions and blocking things like prompt injection, with policies written in natural language or code. And AgentCore Memory added cross-account access, which sounds dull until you're trying to share memory in a multi-tenant setup. If you want a bundled view, the broader knowledge and continuous learning post ties the memory, web search, and paid-content pieces together.

Adjacent to that, Bedrock Guardrails picked up a new API aimed at agentic workflows. It runs in detect-only mode and returns numeric severity and confidence scores, so you set your own thresholds for blocking, retrying, or just logging at each step. Plus it hooks into agent frameworks through lifecycle hooks without making you stand up guardrail resources first. Sandeep Singh's walkthrough of the InvokeGuardrailChecks API is a useful guide if you want to dig deeper.

On the storage and data side, S3 Vectors got two upgrades. It now returns up to 10,000 results per query instead of 100, with pagination so you can start processing the first page right away, and it cut query charges by up to 80% on large indexes (10M+ vectors), automatically across regions. S3 also added annotations, up to 1 GB of mutable metadata per object that surfaces automatically as queryable Iceberg tables, built for agents that need to understand data context without a human in the loop. That same theme runs through AWS Context, which maps enterprise data relationships into knowledge graphs agents can query, extending the same technology already powering QuickSight.

For the boring-but-useful column, Amazon ECS added faster service auto scaling with 20-second high-resolution metrics across Fargate and EC2. Channy's breakdown shows scale-out dropping from over six minutes to under 90 seconds, and you can swap awkward step-scaling policies for target tracking. If you've been overprovisioning to cover slow reactions, this is your chance to right-size.

Two more preview tools from AWS push on the generated code angle. AWS DevOps Agent added release management, which runs readiness reviews, validates infrastructure against Well-Architected practices, and generates and runs tests in isolated environments before production (the blog has the workflow). And AWS Transform shipped continuous modernization, autonomously scanning repos to find and prioritize tech debt and opening remediation PRs, with GitHub, GitLab, and Bitbucket support (the AWS post covers end-of-life dependency detection and the Security Agent tie-in). Both are pointed squarely at the flood of AI-generated code that still needs reviewing and maintaining.

Outside of AWS, Anthropic shipped a batch of updates. Claude Design now stays on brand with design-system imports from GitHub or design files, bidirectional sync with Code, and direct canvas editing. Claude Code picked up artifacts in beta for Team and Enterprise, generating shareable visual pages like incident timelines and PR walkthroughs from your codebase and conversation context. And Claude rolled out centrally managed authorization for MCP connectors using the Enterprise-Managed Authorization extension, so admins can shorten access-token lifetimes and a deprovisioned user's connector access expires fast instead of lingering. Elsewhere, Cloudflare introduced temporary accounts for AI agents that let an agent deploy a Worker with wrangler deploy --temporary and no signup, live for 60 minutes and claimable afterward, and Vercel Functions can now run up to 30 minutes for Pro and Enterprise teams on Node.js and Python, aimed at LLM reasoning, AI streaming, and document processing.

Tutorials

Improve query performance with EXPLAIN plans in Amazon Aurora DSQL by Prema Iyer
Agentic Coding Hooks: Deterministic AI Guardrails by Ran Isenberg
Shared infrastructure, isolated tenants: Pool model multi-tenancy with Amazon Bedrock AgentCore by Ashley Chen
Migrating AWS Lambda from Node.js 20 to 22 — every breaking change by ntoledo319
DigitalOcean Presents Hybrid Inference Pattern for AI Workloads | Let's Data Science by

Reads

I built an event-driven order system with both ECS and Lambda. Here's why. by Suleiman Abdulkadir
Nice walkthrough of mixing ECS and Lambda instead of forcing everything into one compute model. The saga pattern with EventBridge is probably how I'd build it too, though fifteen services for an order system is a lot of surface area to operate.

IaC Isn't Dying. AI Makes it More Important - DevOps.com by Jonah Kowall
The argument that IaC becomes your system of record for non-deterministic AI output is exactly right. If agents are generating infrastructure, you need something deterministic to reconcile against, and as of right now, that's still some form of IaC.

Why I Ripped AgentCore and Strands Out of Production by Anderson Carvalho
This is the anti-framework story I keep seeing lately: the agent SDKs pile on abstraction you don't need until you suddenly do. Swapping back to Lambda-per-customer with direct Bedrock calls won't fit everyone, but matching complexity to the actual workload is the right instinct.

Podcasts, Videos, and more

What's new in Strands Agents | Serverless Office Hours
Julian Wood and the team run through the Strands updates, and Evals 1.0 is the one worth your time. Pre-production testing is the agent gap nobody has filled well yet.

The Great AI Reality Check Has Begun
The core point holds: generating code was never the hard part of software engineering, and 2025 drove that home. The "Doorman Fallacy" applies perfectly here, because the gap between code generation and shipping real systems gets really wide without the right people guiding it.

MIT Just Revealed the AI Bubble's Fatal Flaw
The title oversells it, but the breakdown of who actually has the compute and data to compete is a useful gut check. Worth a watch if you want a clearer read on the economics underneath all the model announcements. AI will remain incredibly useful, but I don't think there's a moat that will sustain these valuations.

New from AWS

Security

HazyBeacon Abuses AWS Lambda Function URLs for Stealthy Command-and-Control Operations
HazyBeacon uses stolen IAM credentials to stand up Lambda Function URLs as command-and-control channels that blend right into trusted AWS traffic. The takeaway is about identity governance and egress monitoring, since the technique leans on credential theft rather than any flaw in Lambda itself.

Events

June 29 - July 2, 2026 - AI Engineer World's Fair 2026: San Francisco 🗣️ (I'll be there!)

Final Thoughts 🤔

For as long as we've been shipping to the cloud, there's been a wall between the code that does the work and the code that describes where it runs. You write a function, then you go write the CloudFormation, the Terraform, the CDK stack, or the SAM template that tells the cloud how to host it. Two artifacts, two mental models, kept in sync by hand and by hope. Infrastructure as Code was a real step forward because it made that second artifact deterministic and reviewable, and I'm not here to argue against determinism. You want a system of record that says exactly what's running and why.

What AWS Blocks gets right is that the determinism doesn't have to live in a separate file. It can sit right next to the code that uses it. That's the same instinct behind the annotations Wing was doing, and the approaches others like Encore and Nitric have taken, where you declare what you need inside your application code and let the framework work out the provisioning. Watching AWS lean into that idea, smartly using TypeScript's integrated type safety, with a clean local-to-production story, is a good sign for where this is heading.

Blocks doesn't go as far as I'd like, and that gap is exactly the part I've spent the last five-plus years on at Ampt. Mapping code to generated infrastructure is the easy half. The harder and more valuable half is making the infrastructure itself adaptable, a living thing that responds as the code and its usage patterns change, rather than a fixed target you have to keep reconciling and reshaping by hand. Blocks lines up closely with what we call Productized Patterns at Ampt, and that adaptability is the direction I keep wanting more of.

This matters more now than it did two years ago, because rapid code generation is the new normal. When code gets written this fast and a lot of it is throwaway, the old contract is backwards. Asking freshly generated code to also provision the infrastructure to run itself puts the burden in the wrong place. Flip it around: let the code be produced, and let the infrastructure figure out the best way to run it. That's a far better loop for testing and prototyping, and you can take the next step to harden it without locking yourself into something as rigid as a hand-maintained IaC stack.

The conversation being back on the table, with AWS in it, is the real story here. Blocks is early and it's still in preview, but the principle underneath it is the one worth betting on: code you can throw away cheaply, and infrastructure that adapts to keep up. If the codegen tools are going to keep producing at this pace, that's the model that lets you move fast without leaving a pile of brittle stacks behind you.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #368: Claude Fable 5 is currently unavailable 🚫

2026-06-16T12:00:00Z

Claude Fable 5 is currently unavailable 🚫

In our previous issue, Anthropic shipped two major models, DynamoDB got "extended" to run locally on Postgres, and Aurora DSQL added JSONB support. In this issue, the US government orders Anthropic to pull Fable 5 and Mythos 5, AWS WAF starts charging AI bots for content, and Bedrock adds Grok, Gemma, and a pair of GPTs. Plus, we've got lots of great content from the cloud, serverless, and AI communities.

News & Announcements

The biggest story from last week is a takedown, not a product launch. Anthropic published a statement responding to a US government directive to suspend access to Fable 5 and Mythos 5 over "national security concerns." Anthropic walks through its defense-in-depth approach and argues the jailbreak vulnerabilities that triggered the order are about the same as the ones in models that are still happily serving traffic to North Korea (I may have made that last part up). Two weeks ago Fable 5 was the first generally available Mythos-class model on AWS. Now it's gone. Whatever you think of the merits, watching a government switch off a frontier model overnight is a preview of a world many of us haven't planned for. The residency assumptions baked into your architecture diagrams may be softer than you think.

Speaking of Bedrock, the model menu keeps growing. Grok 4.3 from xAI is now available on Amazon Bedrock with configurable reasoning effort levels, running on Mantle, the new inference engine AWS built for price performance. Google DeepMind's Gemma 4 family landed too, three open-weight variants with reasoning, multimodal understanding across text, image, video, and audio, native function calling, 35+ languages, and 256K-token context windows (the AWS ML blog has the deeper writeup covering the bedrock-mantle endpoint and OpenAI-compatible APIs). And OpenAI's GPT-5.4 and GPT-5.5 are now in US East (N. Virginia), both with 272K-token context and Responses API streaming, GPT-5.5 aimed at coding and research and GPT-5.4 at production reasoning. Three model families through one endpoint is great until the bill shows up, which is why the cost attribution work AWS has been doing will eventually pay off.

The item I keep coming back to is the one that turns your bot traffic into a revenue line. AWS WAF announced AI traffic monetization using the x402 protocol for machine-to-machine payments, letting publishers set differentiated pricing for AI bots and collect stablecoin payouts through Coinbase. The AWS blog has the mechanics: WAF returns an HTTP 402 with a machine-readable JSON price manifest, works with CloudFront distributions, and settles through Coinbase's x402 Facilitator. It's not happening in a vacuum, either. Visa invested in Replit to power agentic payments for developers, including work on Visa's Trusted Agent Protocol, so the plumbing for agents that pay for things is getting built on multiple fronts.

Agent platforms keep on maturing as well. Claude Managed Agents added scheduled deployments and environment vaults, with Rakuten and Notion already running recurring spreadsheet analysis and report generation, plus Browserbase and KERNEL integrations for browser work. OpenAI is acquiring Ona to give Codex persistent cloud execution, so agents can grind on a task for hours or days inside a customer-controlled environment. OpenAI also struck a deal to let OCI customers reach its models and Codex through Oracle Universal Credits, wiring AI spend into existing enterprise purchasing. And Amazon OpenSearch Service launched MCP Apps for agentic observability, letting agents dig into logs, traces, metrics, and alerts for root cause analysis from inside Claude Desktop or VS Code.

On the data and ops side, Amazon Bedrock AgentCore Memory now supports strictly consistent metadata for long-term memory, so you can attach values from your application that pass through without LLM inference. That gives you department-scoped retrieval, compliance boundaries, and multi-tenant memory where each tenant gets processed on its own, which is the kind of thing that sounds boring until you try to build memory for more than one customer. And Amazon CloudWatch added cross-account metrics centralization through AWS Organizations, replicating metrics from many accounts and regions into one destination account for unified monitoring and governance.

A few more worth your attention. AWS and Snowflake released a joint Custom Lens for the Well-Architected Framework, folding both platforms' best practices into one review across seven pillars, so you can stop juggling two separate sets of guidance. AWS CLI v1 is entering maintenance mode in July 2026, with botocore and s3transfer vendored directly into the codebase, which means if you're running CLI v1 and boto3 side by side, they'll each carry their own copies from here on out. And Kiro shipped a $100/mo Pro Max tier with more credits and access to all premium models. The jump from $40 to $200 was definitely a bit much for your average user, so dropping a tier right in the middle is a smart read of who actually churns.

Finally, I shipped a new Prisma 7 adapter in the Data API Client v2.4, so now you can point Prisma, Knex, Drizzle, or Kysely at the RDS Data API for Provisioned or Serverless Aurora clusters without a connection pool or VPC.

Tutorials

AI Agent Failure Detection and Root Cause Analysis with Strands Evals by Po-Shin Chen
Evaluate AI agents systematically with Agent-EvalKit by Ishan Singh
How Samsung achieved real-time pricing with AWS Lambda Response Streaming by Vijay Naik
Serverless applications on AWS with Lambda using Java 25, API Gateway and Aurora DSQL - Lambda performance optimization approaches by Vadym Kazulkin
Run Your Email Agent on Serverless by Qasim Muhammad
The Death of /tmp: S3 Mounting for Lambda is a Game-Changer by Yogesh Gupta
Cut Your AWS Fargate Bill by 40% — 10 Waste Patterns I Fixed in Production by Chirag Mehta
MCP Apps: Because Your Users Deserve More Than a Wall of Text by Maciej Sodkiewicz

Reads

How frontier teams are reinventing AI-native development
Swami details three approaches AWS used to test AI-native workflows, including pathfinder initiatives and structured sprints, and lays out five practices for teams restructuring around autonomous agents. If you're still treating AI as a fancier autocomplete, this is a nudge to think bigger about how the work itself changes.

The Review Bottleneck: Rethinking Software and Infrastructure Design for the Agent Era
A look at how coding agents moved the delivery bottleneck from writing code to reviewing and coordinating it. The proposed fixes, bounded contexts, contract-driven development, and pushing review upstream to intent instead of output, line up with what a lot of teams are feeling right now but haven't named yet.

AI demands more engineering discipline. Not less.
Charity Majors makes the case for what she calls Phoenix Architectures, where code becomes a materialized view you can regenerate once it goes stale. She draws the line from immutable infrastructure to treating AI-generated code as disposable, with validation moving to production. Classic Charity, and definitely worth your time.

The evolution of agentic surfaces: building with Claude Managed Agents
Anthropic introduces Claude Managed Agents as a set of composable APIs for production agents, handling orchestration, session management, credential isolation, and observability so teams can spend their time on context management instead of babysitting execution harnesses. Pairs well with the scheduling and vaults news above.

Takeaways from AWS Generative AI Lens
Amit Kayal breaks down the AWS Generative AI Lens with a focus on controlled AI-assisted workflows versus fully autonomous agents, walking through when AI should classify, when it should recommend, and when it should actually execute. The data governance and multi-tenant sections are the parts I'd read twice.

Lambda in a VPC Is Fine
Michael Walmsley walks through the evolution of Lambda VPC networking, from the painful 2016 days of on-demand ENI creation to today's Hyperplane implementation. If you're still repeating the old "never put Lambda in a VPC" advice, this explains why it stopped being true years ago.

Why AWS scrapped OpenSearch's architecture to chase agent workloads
Frederic Lardinois of The New Stack covers AWS's near-complete rebuild of OpenSearch Serverless, with separated storage and compute that scales to zero when idle and auto-scales 20x faster than before. It's built for the burst-and-idle usage that agent workloads generate, with log analytics arriving in June and agent memory features in H2 2026.

New from AWS

Security

AWS Destroyed the Value Proposition for Bedrock by Chris Farris
Chris digs into the part of the Fable 5 and Mythos 5 launch nobody put in the headline: the only allowed retention mode for these models on Bedrock is provider_data_share. Using them means your prompts and outputs leave the AWS boundary, land with Anthropic for 30 days, and become subject to human review. That breaks the neutral-broker guarantee that sent regulated and European shops to Bedrock in the first place. He walks through the compliance fallout and the SCP you should deploy today to deny anything other than none. Read this before you point a workload at either model, assuming they get turned back on.

From Socials

Just spent the last two weeks reworking my local Agent Hub system to use @opencode as the harness with qwen, gemma4, and mistral local models. Then I get this at 7:01pm. 😑 pic.twitter.com/T9as70aqdQ
— Jeremy Daly (@jeremy_daly) June 16, 2026

I'm not sure whether to be excited by this message, or if I should prepare for another rug pull. Either way, it forced me down an interesting multi-harness orchestration path.

Final Thoughts 🤔

HTTP 402 had been sitting in the spec since the early 90s with a note that said "reserved for future use." For three decades it was the status code nobody got to use, a placeholder for a payment layer the web never seemed to materialize. Then about a year ago, Coinbase introduced "x402: An open standard for
internet-native payments." Wait, did the crypto bros get it right? 😬 (fyi, I'm still a hard no on that)

AWS WAF now returns a 402 with a machine-readable price manifest when an AI bot asks for your content. The bot's agent reads the manifest, pays in stablecoin through Coinbase's x402 facilitator, and gets the content. No human in the loop and no checkout page. At the same time, Visa is putting money into Replit to build agentic payments and pushing its Trusted Agent Protocol, so the same machinery is getting assembled by the incumbents who actually move money for a living. When a 30-year-old dead status code and a Visa investment point in the same direction, that's usually a signal worth paying attention to.

What's happening here is a shift in how we treat bots. For most of the web's history, automated traffic was something you blocked, rate-limited, or grudgingly tolerated. The robots.txt era assumed crawlers were either friendly enough to respect a text file or hostile enough to fight. Now there's a third option: charge them. If an agent wants your content badly enough to pay for it, you can let it, and you can put a number on exactly how much that access is worth.

I'm not sure this scales, and there are real reasons for skepticism. Stablecoin payouts assume a settlement story most finance teams haven't signed off on. Differentiated pricing for bots assumes agents will agree to pay instead of routing around you, and the whole thing has a chicken-and-egg problem where it only matters once enough agents speak the protocol and enough publishers demand payment. None of that is solved. But the direction is clear, and for the first time the economics of serving an AI bot aren't automatically negative.

There's a question worth thinking about if you run content or an API. "Block all bots" is no longer the only defensive move available to you. The more interesting question is which agents you'd actually want to charge, which ones you'd serve for free because they send value back, and what your content is worth to a machine that has a budget and no patience for a paywall modal. That's a pricing exercise, not a security one, and most of us have never had to think about it. We probably should start.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #367: What did I miss? 🎓

2026-06-09T12:00:00Z

What did I miss? 🎓

I took a couple of weeks off, so we're playing catch-up. My youngest daughter graduated from high school last week, and between that, the after-prom party she threw at my house, and her graduation party (also at my house), there wasn't a lot of time left for keeping up with serverless, AI, and cloud. So this one covers about three weeks of news, and it's a long one. Apologies in advance.

In this issue, Anthropic ships two major models, DynamoDB gets "extended" to run locally on Postgres, and Aurora DSQL adds JSONB support. Plus, we've got plenty of awesome content from the cloud, serverless, and AI communities.

News & Announcements

Let's start with the money, because it's the reason for everything else. Anthropic raised a boatload of money, $65 billion in a Series H at a $965 billion post-money valuation. That kind of capital buys a lot of compute, and the spending showed up almost immediately in the product line.

First came Claude Opus 4.8, which introduced dynamic workflows in Claude Code as a research preview, better coding and browser-automation numbers, and effort control settings, all at the same price as Opus 4.7. Then, before anyone had a chance to settle in, Anthropic announced Claude Fable 5 and Claude Mythos 5, the first generation of Mythos-class models built for autonomous, professional work. Fable 5 is the one you can actually use, and Mythos 5 remains the locked-down sibling. If you want a second opinion before you commit, Claire Vo's review of Fable 5 puts it through three real-world scenarios and is honest about where it falls down.

AWS, predictably, did not want to be left out. Claude Opus 4.8 landed on AWS through Bedrock and Claude Platform, and then Fable 5 showed up as the first generally available Mythos-class model on AWS too, with a longer writeup on the AWS blog covering the built-in safeguards for autonomous operation. Anthropic wasn't the only model vendor getting the Bedrock treatment, either. OpenAI's GPT-5.5, GPT-5.4, and Codex are now generally available on Bedrock with pay-per-token pricing matching OpenAI's direct rates, inference staying inside your chosen region, and the usual KMS, VPC, and CloudTrail story for compliance.

To make all of this easier to work with, Bedrock also shipped a redesigned console optimized for the OpenAI- and Anthropic-compatible APIs (there's a hands-on writeup on the AWS blog) built around the bedrock-mantle endpoint, with project-based organization, side-by-side comparisons, and prefilled code snippets. They rounded it out with request-level usage attribution so you can tag individual inference calls by team or environment, CloudWatch metrics for the mantle endpoint, and expanded Service Quotas support. The cost attribution piece is the one I'd pay attention to. Once you've got three model families running through one endpoint, knowing which team is spending what stops being optional.

The agent side of Bedrock kept pace. AgentCore Runtime added interactive shells via a new InvokeAgentRuntimeCommandShell API, giving you WebSocket terminal access into a running agent's microVM to inspect files, run commands, or debug state without losing session context. AgentCore Identity now lets you bring your own secrets through AWS Secrets Manager, and Step Functions added an AgentCore-powered agentic reasoning step so you can drop a reasoning task into a state machine without bolting on extra infrastructure. The AWS MCP Server picked up cross-account and cross-role access too, so a coding agent can finally hop between accounts and roles in a single session instead of stopping, swapping credentials, and starting over. Anyone who's managed agents across more than one account knows exactly how annoying that loop was.

The most interesting database news of the bunch didn't get a flashy launch event. AWS released ExtendDB 0.1, an open source adapter that implements the DynamoDB API on top of pluggable storage backends, with PostgreSQL as the first reference implementation. That means you can write code against DynamoDB programming patterns and run it locally, in CI, or on-prem against Postgres. I've been wanting something like this for years. DynamoDB Local has always been a reasonable stand-in, but a pluggable adapter that lets you point real DynamoDB access patterns at a Postgres backend opens up a lot of testing and migration scenarios that used to be a pain. It's 0.1, so temper your expectations, but the direction is genuinely useful.

Aurora DSQL stayed busy, picking up JSONB support with compression on by default, so you can store semi-structured config and API parameters next to your relational data and let DSQL compress the larger payloads for you. Over in search, the next generation of Amazon OpenSearch Serverless went GA, and the headline feature is scale-to-zero. There's a proper deep-dive on the AWS blog that leans into the agentic AI angle with instant resource creation and Vercel and Kiro integrations, and OpenSearch Serverless also added Agentic Search on top. Scale-to-zero is the big one for me. Vector and search backends that scale to zero change the math on a whole category of side projects and low-traffic workloads that previously couldn't justify the always-on cost.

A small but welcome bit of housekeeping: AWS is standardizing retry behavior across all SDKs and tools. The change splits backoff into two strategies, a fast 50ms for transient errors and a slower 1000ms for throttling, which is a more sensible default than treating every failure the same way. It becomes the default in November 2026, but you can opt in today with AWS_NEW_RETRIES_2026=true. If you've ever hand-tuned retry configs to stop hammering a throttled service, this is the kind of quiet fix that saves you from rediscovering the same lesson on the next project.

There was plenty more from AWS over the past few weeks. FinOps Agent went into preview, answering cost questions and surfacing optimization opportunities out of Cost Optimization Hub and Compute Optimizer. Cognito added multi-Region replication as an add-on for Essentials and Plus tier user pools, syncing identities to a standby Region so you can redirect traffic during a regional disruption. And AWS named four new Heroes for May 2026, with serverless and AI/ML leaders from Italy, Canada, and Argentina. Congratulations to all of them. The community is better for the work you do.

One last thing from me. I pushed an update to data-api-client, my DocumentClient-style wrapper for the Amazon Aurora Serverless Data API. If you're working with the Data API and want the familiar parameter-mapping ergonomics instead of the raw request format, give it a look.

Tutorials

Building type-safe applications with Drizzle ORM in Aurora DSQL by Dipen Patel
Pagination patterns in Amazon Aurora DSQL by Sandhya Khanderia
It's safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore by Evandro Franco
Break the context window barrier with Amazon Bedrock AgentCore by Yuan Tian
Building multi-tenant agents with Amazon Bedrock AgentCore by Dhawalkumar Patel
Best practices for Amazon DynamoDB Global Tables – Part 1: Operational readiness by Lee Hannigan
Best practices for Amazon DynamoDB Global Tables – Part 2: Failover strategies by Lee Hannigan
Best practices for Amazon DynamoDB Global Tables – Part 3: Validating regional resilience with AWS Fault Injection Service by Lee Hannigan
SMS Delivery Receipts on AWS Lambda by Gunnar Grosch
Your agent is repeating itself by Allen Helton
On-Demand Archives on S3 by Jérémie Rodon
AWS SAM WebSocket & Lambda Durable Functions: Canary Deploy by Darryl Ruggles
Serverless applications on AWS with Lambda using Java 25, API Gateway and DynamoDB - Part 7 Lambda performance optimization approaches by Vadym Kazulkin
AWS Lambda Managed Instances with Java 25 and AWS SAM – Part 7 Implement scheduled scaling by Vadym Kazulkin
S3 Files Killed My Least Favorite Lambda Pattern by Mwanza Simi
Closing the loop from code generation to sandboxed code execution by Heeki Park
Triggering Lambda Durable Functions from SQS by Pubudu Jayawardana

Reads

Vector Storage Costs: S3, OpenSearch, pgvector, Pinecone by Darryl Ruggles
Darryl built a full cost model and benchmark harness comparing S3 Vectors, OpenSearch Serverless NextGen, Aurora pgvector, and Pinecone, including how the May 2026 scale-to-zero launch shifts the comparison. There's a calculator to find the crossover point for your own workload shape, which is exactly the kind of thing you want before you pick a vector store and regret it later.

AI Changed How We Build. Our Tools Didn't. by Ran Isenberg
Ran walks through the gap between AI-driven development and the tooling we still use to manage it. IDEs, GitHub, Jira, and sprint planning were all built for a world where humans wrote the code, and they haven't caught up to one where agents write and engineers mostly review. He's got a companion piece on adapting the engineer's job that gets into burnout risk and rising token costs too.

AI enthusiasts are in a race against time, AI skeptics are in a race against entropy by Charity Majors
Charity uses Fin's productivity gains as a case study and lands on a point that's easy to lose in the hype: the wins came from engineering discipline and fast feedback loops, not from AI being magic. If you're trying to bridge the gap between the true believers and the people rolling their eyes on your team, this is a good framework.

Running an AI-native engineering org
Anthropic's engineering team shares how their process changed once every commit became Claude-assisted, including the move from six-month roadmaps to far more fluid planning. The bit about going past "who changed this" to "what information do I actually need" is the part worth sitting with.

Lessons from building Claude Code: How we use skills
The Claude Code team breaks down nine skill types they use internally, from library reference to verification to scaffolding, plus the practices that make a skill actually work. If you're building anything with skills, this is required reading.

Using Claude Code: The unreasonable effectiveness of HTML by Thariq Shihipar
Anthropic makes the case that HTML beats Markdown for AI output because of its density and interactivity, with examples spanning richer docs, code reviews, and throwaway custom editors. I said it last issue and I'll say it again: I'm sold on the HTML move.

Your Agent Loops are Hungrier Than You Think by Michael Walmsley
Michael lays out why agentic loops burn tokens quadratically: every turn replays the full conversation history, so turn 20 is paying for turns 1 through 19. He backs it with real token counts from actual scenarios, and if you've been surprised by an agent bill, this explains where it went.

The Claude Cowork product guide
Anthropic's guide to Claude Cowork, their desktop knowledge-work agent, covers local file access, Slack and Google Drive integration, when to reach for it over other Claude tools, and seven worked examples. A useful orientation if you're trying to figure out where Cowork fits.

Codex is becoming a productivity tool for everyone
OpenAI shared usage data putting Codex at 5 million weekly active users, with knowledge workers growing three times faster than developers. The use cases have spread well past code into reports, spreadsheets, presentations, and analysis. The line between "coding tool" and "work tool" keeps getting blurrier.

AI Memory Systems Explained: From Retrieval to Durable, Context-Aware Agents by Jeremy Daly
This is mine. It's a deep architectural walkthrough of how to move from basic RAG to a production-grade memory system, covering five memory types (policy, preference, fact, episodic, trace), how their storage patterns differ, hybrid retrieval, and why you need a memory manager controlling what gets stored and retrieved while keeping governance and privacy intact. If memory has been the fuzzy part of your agent design, this should sharpen it up.

Podcasts, Videos, and more

Building TypeScript agents with Strands | Serverless Office Hours
Erik walks through the Strands Agents TypeScript SDK for building agents on AWS, including agents that run in Node.js and the browser, connecting multiple model providers, and orchestrating multi-agent workflows.

Building with Claude: Lessons from real projects | Serverless Office Hours
Ran Isenberg joins Julian Wood to talk through practical Claude Code workflows in serverless development: custom skills, configuration strategies, and context management. Worth watching if you're still figuring out how these tools fit your process.

AI-assisted development in practice | Serverless Office Hours
Darryl Ruggles builds a full serverless blogging platform with AI coding tools and is honest about what works (MCP servers for Terraform and AWS docs), what breaks, and how to keep security and best practices intact when you let AI write your infrastructure.

Serverless Craic Ep86 AI and Software Development - the Real Problem
The Serverless Edge crew makes the case that AI amplifies both good and bad engineering practices, with a discussion that wanders through platform engineering, cognitive load, and socio-technical systems.

Serverless CrAIc Ep85 Why Team Topologies Matters More Than Ever in the AI Era
The crew asks whether AI agents count as team members and what that does to cognitive load, working through how organizational frameworks bend when code generation speeds up but human collaboration stays the bottleneck.

AWS Bites #154: S3 Files
Eoin and Luciano dig into S3 Files, explaining why S3 was never really a file system (no atomic renames, expensive listings, immutable objects) and how this service bridges the gap, with benchmark data and a frank look at the 60-second write-back delay and eventual consistency.

Claude Fable 5 review: what the new Mythos model gets right (and very wrong)
Claire Vo reviews Anthropic's first generally available Mythos-class model and the launches around it, including Managed Agents and safety classifiers, testing it on product specs and multi-agent orchestration. A grounded look at a model that's getting a lot of breathless coverage everywhere else.

A rational conversation on where AI is actually going | Benedict Evans
Benedict Evans argues foundation models won't hold lasting pricing power and that value moves up the stack, with distribution becoming the real moat now that software is cheap to build. A nice counterweight to the model-vendor news in this issue.

The AI paradox: More automation, more humans, more work | Dan Shipper
Dan Shipper draws on running Every to argue that work is moving inside AI agents, that SaaS is thriving rather than dying because agents drive more usage, and that roles like PM are getting more leverage from AI tooling.

New from AWS

Developer Tools

DynamoSQL™ — ANSI SQL for Amazon DynamoDB
DynamoSQL is a SQL query engine for DynamoDB with JOINs, CTEs, aggregations, and subqueries, no pipelines or ETL required. It's in beta with early access through AWS Marketplace and offers MCP integration for AI applications.

I Built pretext-pdf: Serverless PDFs Without Chromium by Himanshu Jain
Himanshu built pretext-pdf, a Node.js library that generates PDFs from JSON without Chromium, aimed at structured documents like invoices and reports with 40-100ms generation times. If you've ever wrestled a headless Chromium into a Lambda just to make a PDF, this is a lighter path.

Introducing Open-Source Skills for AWS SDK Best Practices by David Yaffe
AWS released open-source skills for their Agent Toolkit to improve how AI coding agents generate SDK code, currently for Swift, JavaScript v3, and Python (Boto3), targeting the common mistakes like wrong API names, bad parameter types, and missed paginators.

Final Thoughts 🤔

That $65 billion raise is the smoke rising from the Anthropic and OpenAI IPO talk, with their valuations looking shakier than the headlines suggest once you do the math on token economics. Burning compute to win benchmarks is one thing. Making the unit economics work when customers actually use the product is another, and that's where the recent billing changes come in. Anthropic pulling claude -p out of what your Max subscription covers, plus the GitHub Copilot billing changes are already having a real effect on how people use these tools. The tokenmaxing that let everyone ship slop faster is getting expensive, and maybe that's (kind of) a good thing.

It forces discipline, which is the thread running through several pieces in this issue. Charity Majors makes the case that the AI productivity wins came from engineering discipline and tight feedback loops, not magic. Ran Isenberg points out that our tools were built for humans writing code and are straining under agents doing it. Both are circling the same idea: the teams that come out ahead won't be the ones with the most tokens, they'll be the ones with the most discipline. If that discipline doesn't show up, we're all in trouble.

The model you use this year will be obsolete by next. The patterns you build around storage, cost, and testing will outlast all of them.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #366: The Flat-Rate Honeymoon is Over 📈

2026-05-19T12:00:00Z

The Flat-Rate Honeymoon is Over 📈

In our previous issue, Claude Platform set up shop on AWS, ElastiCache learned to do full-text and hybrid search, and Ampt rolled out Node.js 24 as the default runtime. This week, Anthropic brings subscription clarity to Claude, Codex goes mobile, and Amazon DSQL gets CDC on DPUs. Plus, we've got plenty of awesome content from cloud, serverless, and AI communities.

News & Announcements

Anthropic announced this past week that paid Claude plans will get a dedicated monthly credit for programmatic usage starting June 15. Pro gets $20 in monthly credits, Max 20x gets $200, and anything past the allocation rolls onto API rates. If you've been running SDK loops, claude -p jobs, or GitHub Actions agents on your subscription, the honeymoon is over. Theo Browne took it about as well as you'd expect. His reaction video is appropriately titled "I'm done," and his X post promised to donate $10 to open source for every screenshot of a cancelled Claude Code plan that was shared. He's not wrong to be frustrated. The rules have been ambiguous for quite some time, and this announcement does provide some clarity, just not what most of us were hoping for. The bigger lesson here is about AI platform risk. Cancelling Claude Code doesn't fix the real problem, and if your business depends on one vendor's pricing staying frozen forever, your subscription isn't the thing that needs changing.

Even with all the developer backlash, Anthropic seems undeterred. They continue to ship more and more enterprise plumbing. Claude Managed Agents now support self-hosted sandboxes and MCP tunnels, so your agents' tools can run inside your own infrastructure while orchestration stays on Anthropic's platform. Cloudflare, Daytona, Modal, and Vercel are all in the launch lineup, with Cloudflare getting its own first-class slot including Browser Run for automation and quick-start templates. Anthropic also pushed two vertical packages: Claude for the legal industry and Claude for Small Business. Legal feels like the obvious play. High-value industries with messy document workflows are where managed agents make a lot of sense, so long as it doesn't keep hallucinating case law.

Elsewhere in the agent space, OpenAI brought Codex to the ChatGPT mobile app with Remote SSH now GA, programmatic access tokens, and HIPAA compliance for healthcare. Coding from your phone still sounds like a stretch, but the strategy is right: meet developers wherever they happen to be. Also, Temporal added Workflow Streams and Standalone Activities to its durable execution platform, both aimed squarely at the AI-in-production folks. Durability and debuggability are the two things most agentic systems are starving for, so the direction makes sense.

AWS had a busy week too. Amazon Aurora DSQL now supports change data capture in preview, streaming database changes to Kinesis Data Streams for event-driven apps and real-time analytics. I think the Distributed Processing Units plus Kinesis pricing is going to trip a few people up, but I've been wrong before. Amazon Bedrock launched Advanced Prompt Optimization (with a deeper writeup on the AWS blog), automating prompt comparison across up to 5 models with custom evaluation metrics, Lambda-based scoring, LLM-as-a-Judge rubrics, and multimodal inputs including images and PDFs.

On the Lambda side, AWS added scheduled scaling for functions on Lambda Managed Instances via EventBridge Scheduler, useful for adjusting capacity ahead of expected traffic, and ARC Region switch now automates Lambda event source mapping execution during failovers across Kinesis, DynamoDB Streams, MSK, and SQS, with cross-account support. That last one is very cool. CloudFront got two updates: Passthrough Mode for mTLS that forwards certificates to origins without edge validation, and configurable usage allowances on the Premium flat-rate plan from 500 million to 6 billion requests and 50 TB to 600 TB per month. And EventBridge Scheduler added 619 new SDK API actions across 13 services, bringing the total coverage to over 270 AWS services.

Finally, on the security side, Wiz's Runtime Sensor for Google Cloud Run is now GA, with 2000+ detection rules and AI-driven investigation through their Blue Agent. Serverless container monitoring built for serverless containers. Who would have thought?

Tutorials

Zero-downtime DynamoDB construct migration: from Table to TableV2 with cdk orphan by Lee Hannigan
Getting started with Change Data Capture in Amazon Aurora DSQL by Vijay Karumajji
Dynamic Looping Comes to AWS SAM by Eric Johnson
Best practices for computer and browser use with Claude by Lucas Gonzalez and Luca Weihs
AtMostOncePerRetry vs AtLeastOncePerRetry Semantics in Lambda Durable Function Step by Rishi
Build custom code-based evaluators in Amazon Bedrock AgentCore by Bharathi Srinivasan
Layered Configuration in Claude Code by Michael Walmsley
Per-tenant DynamoDB isolation with the Token Vending Machine pattern by Monica Colangelo
Lambda Durable Functions, When You Don't Need Step Functions by Lewis Sawe
Live Canary Deployments with AWS SAM, the New WebSocket API Resource, and Lambda Durable Functions by Darryl Ruggles

Reads

The founder's playbook: Building an AI-native startup
Anthropic walks through Idea, MVP, Launch, and Scale for AI-native startups, with real founder stories woven in throughout. Playbooks aren't the whole answer, but if you're staring at a blank canvas, this is a pretty good starting point.

How Claude Code works in large codebases: Best practices and where to start by
A complete walkthrough of Claude Code's extension points (CLAUDE.md files, LSP integrations, MCP servers, subagents) and how they shape behavior in enterprise codebases. If you're getting mediocre results from Claude Code once you push past your toy project, this will probably help you fill some gaps.

Project Glasswing: what Mythos showed us by Grant Bourzikas
Cloudflare tested Anthropic's Mythos Preview model on a number of their repos and got firsthand knowledge of why off-the-shelf coding agents fall short. The multi-stage architecture they built is a useful reference for anyone doing serious agentic work outside of the basic coding use case.

Opinion | The Generation That Grew Up With A.I. Hates It by Michelle Goldberg
Only 18% of Gen Z is hopeful about AI, and 47% of voters under 30 rate it as mostly bad. As someone with two daughters in that demographic, I can't say I'm surprised. They've watched the technology arrive with lots of promises but not a lot of upside for them.

Local agents scare me by Allen Helton
Allen walks through four attack vectors for local AI agents (shared userland, network adjacency, poisoned context, persistent state) and makes the case that traditional IAM controls don't fit. Definitely worth reading before you give any agent unrestricted shell access on your machine.

Is AWS Lambda Tenant Isolation Mode Enough for SaaS? by Ran Isenberg
Ran breaks down what Lambda's tenant isolation actually solves and what it doesn't. The compute side is handled, but data access control is still on you, which has always been the hard part of multi-tenancy.

10 Practical Serverless Architecture Lessons from AWS Summit London 2026 by Siddarth Patil
A grab-bag of serverless patterns from AWS Summit London: Lambda boundaries, async with EventBridge and SQS, cold starts, cost management, and applying the same patterns to GenAI workloads. Most of it is table stakes if you've been doing this a while, but the GenAI section is worth a skim.

Cross-Domain Governance by Aaron Sempf
Aaron explains how autonomous systems should behave when they cross organizational boundaries, proposing monotonic reduction: authority can only be restricted, never amplified, as you move outward. It's a tidy way to think about a problem most agent platforms haven't even acknowledged yet. I always feel smarter after reading his stuff.

Podcasts, Videos, and more

Building Apps with AI + MCP Servers | Serverless Office Hours
Brian Zambrano joins Darko Mesaroš to build a serverless application from prompts using Kiro and MCP servers. A solid walkthrough from natural language to deployed AWS infrastructure.

How I AI: HTML is the new Markdown: How Anthropic engineers are building with Claude Code
Claire Vo interviews Thariq Shihipar from Anthropic's Claude Code team on the shift from Markdown to HTML for AI output, plus patterns like living design systems and micro-apps. I'm all for the HTML move. Markdown files have gotten easier and easier to gloss over, and giving the model a proper display layer makes the output feel a lot less disposable.

Serverless CrAIc Ep84 AI-Generated Code Is a Liability: Technical Debt & Engineering Excellence
The Serverless Craic crew digs into the velocity versus debt tradeoff in AI-generated code, including the awkward truth that more tests and more code don’t always mean better quality. The discussion around engineering excellence as a counterweight to AI-driven throughput is where this one gets especially heady. They argue that production code still has to be maintained by humans eventually. Let's hope.

New from AWS

Final Thoughts 🤔

There's a technical nuance to this Claude Code change that's worth pulling apart. Running claude -p isn't the same as hitting the API directly. Claude Code ships with caching, tool-use optimizations, context management, and prompt structuring that make the interactive product feel as useful as it does. When you wire claude -p into a script, you get those same optimizations applied to your automated workflows. Hit the raw API yourself and you're rebuilding all of that from scratch, usually badly, burning extra tokens on every loop, and probably wrecking the economics in the process.

That's why this change stings more than a normal pricing adjustment. I understand metering for large-scale programmatic use, but the bigger shift is that now the cheap, optimized path is effectively reserved for interactive use. If you want automation, you either pay API rates or stay inside one of Claude's tightly controlled (typically not great) interfaces.

That's the part that bothers me. You're paying for more than just access to a raw model. You're paying for the orchestration layer around it: the caching, context handling, tool execution, prompt shaping, and all the little optimizations that make Claude Code actually useful day to day. Whether those requests originate from a human typing into a terminal or a script running in the background seems mostly immaterial.

And that's the bigger question this raises for the industry. Are these systems ultimately meant to become programmable infrastructure, or are they meant to remain interactive products with a human sitting in front of them? Because the economics matter. Automation only works when the cost structure makes sense. If the optimized path is reserved for interactive use while automated use is pushed onto significantly more expensive APIs, then we're implicitly putting limits on how far these tools can evolve beyond "copilot" workflows.

That's worth thinking about before we build entire engineering organizations around them.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #365: Valkey 9 Unlocks Hybrid Search on ElastiCache 🔍

2026-05-12T12:00:00Z

Valkey 9 Unlocks Hybrid Search on ElastiCache 🔍

In our previous issue, Amazon Bedrock crossed the final frontier of hosted frontier models, AI agents started buying domain names, and Amazon Q Developer got a one-way ticket to the AWS graveyard. This week, Claude Platform sets up shop on AWS, ElastiCache learns to do full-text and hybrid search, and Ampt rolls out Node.js 24 as the default runtime. Plus, we've got plenty of awesome cloud, serverless, and AI content from the community.

News & Announcements

Anthropic and AWS got even closer this week. Anthropic introduced the Claude Platform on AWS, which sits alongside Claude on Bedrock as a second, distinct way to use Claude inside your AWS account. The split is worth understanding: Claude Platform is Anthropic-operated with data processed outside AWS, while Claude on Bedrock keeps data inside the AWS boundary. Claude Platform on AWS is now generally available across 18 regions with direct access to Anthropic's APIs, console, Managed Agents, web search, and prompt caching, all billed through AWS Marketplace. AWS has its own post on the launch explaining the IAM and Marketplace plumbing. The short version: enterprises that want full Anthropic-native features without leaving their AWS account just got a much cleaner deployment path.

AgentCore also had a heck of a week. AgentCore Runtime now supports bring-your-own file system from S3 and EFS, letting you mount durable storage directly at agent runtime paths instead of bolting on file access through tools. AgentCore Memory now supports metadata for long-term memory with up to ten indexed keys that can be set manually or inferred by an LLM, making retrieval over long-term memory actually targetable instead of a vector similarity guessing game. And in the "what could possibly go wrong" category, Bedrock AgentCore Payments launched in preview (read the official announcement blog), built with Coinbase and Stripe and using the x402 protocol to let agents pay for APIs, MCP servers, and web content in stablecoins. So agents now have file systems, memory with metadata, and a wallet. 🔥

On the agent tooling side, AWS announced the Agent Toolkit for AWS, a managed suite of pre-validated skills for AI coding agents covering application development, data analytics, and AgentCore, with IAM guardrails baked in. Also, the AWS MCP Server is generally available, now with IAM context keys, a sandboxed Python execution tool, and better token efficiency. AWS is trying really hard to be the default platform for AI coding agents. Giving devs an opinionated, authenticated entry point seems like the smart play, but AWS doesn't have the same head start they did with serverless.

It was a big week for ElastiCache as well. Valkey turned two, with Docker pulls up 17x year over year and adoption across the major clouds, which is a pretty good trajectory given that it started as a Redis fork barely 24 months ago. They also announced the release of Valkey 9.0 for Amazon ElastiCache, which brings built-in search, hash field expiration, and multi-database support in cluster mode. The headline features got their own announcements: ElastiCache now supports real-time full-text, exact-match, and numeric range search, hybrid search combining vector similarity and full-text, and real-time aggregations, all at microsecond latency and across all regions at no extra cost. Chaitanya Nuthalapati has a walkthrough of building search and recommendation engines on top of it with full code, and there's a separate post on the aggregations specifically. ElastiCache is turning into a serious AI workload backend, but it might also be the serverless full-text search service we've been waiting for.

For AWS SAM users, two nice quality-of-life updates: SAM now natively supports WebSocket APIs for API Gateway, auto-generating routes, integrations, and IAM permissions from your template, and SAM CLI 1.159.0 added BuildKit support for Lambda container images, bringing multi-stage builds, better caching, cross-architecture builds, and Docker secrets to the workflow. It seems like these updates should have shipped years ago, but I'm glad to see them land.

In other Anthropic news, Claude Code got agent view, a centralized UI for managing multiple coding sessions in parallel without juggling terminal tabs. If you've been doing this manually with tmux and worktrees, this is going to save you some major pain. Anthropic also rolled out Claude integrations across Excel, PowerPoint, Word, and Outlook, with Excel, PowerPoint, and Word now GA and Outlook in public beta. Context follows you across apps, and enterprises get OpenTelemetry logging and Analytics API access for governance. And Claude Managed Agents picked up "dreaming," outcomes, and multiagent orchestration, with outcomes being a rubric-based eval system showing up to 10-point improvements on hard tasks. Netflix and Wisedocs are already shipping with it.

Ampt now supports Node.js 24 as the default runtime, bringing Web Streams, URLPattern, iterator helpers, and a pile of features that used to require third-party npm packages.

Finally, Cloudflare is laying off over 1,100 employees, which they're framing as a reorganization for the AI era rather than cost-cutting. The severance package is genuinely good (full base pay through end of 2026 and accelerated equity vesting), but the framing is doing a lot of work. "Reorganization for the AI era" is becoming the corporate euphemism of the decade.

Tutorials

Writing middlewares for Rust Lambda functions by Luciano Mammino
Choosing between single or multiple organizations in AWS Organizations by John White
Amazon Aurora DSQL connections: Drivers, strings, and best practices by Rob Petersen
Query billion-scale vectors with SQL: Integrating Amazon S3 Vectors and Aurora PostgreSQL by Shayon Sanyal
How I Locked Down a Static Site with Lambda@Edge and Cognito (No Backend Required) by Roberto Belotti
Migrating data from an Amazon Aurora snapshot into Amazon Aurora DSQL by Dan Blaner

Reads

Notes from Code with Claude 2026 by Chris Ebert
Chris pulls together the announcements that mattered from Code with Claude 2026: the SpaceX compute deal, Multiagent Orchestration, and Dreaming inside Managed Agents. The context window observations are the most useful part for anyone actually shipping agents right now.

AWS Lambda Is Dead. The $0.20 Was Never the Price
The author migrated 47 Lambda functions to Cloudflare Workers and dropped their monthly bill from $8,362 to $1,790, with most of the savings coming from the orchestration tax (API Gateway, CloudWatch, NAT, egress) rather than Lambda itself. He's right that the bundle is where the real money goes, and the August 2025 INIT billing change is worth knowing about. But the workloads he's describing (HTTP APIs, webhooks, auth, edge functions waiting on a database) were never the shape Lambda was built for. Lambda's actual sweet spot is async event-driven work that needs to fan out to thousands of concurrent executions for seconds at a time, not synchronous request/response paths burning wall clock waiting on Postgres. High-volume systems need to be designed for the runtime you're putting them on. Putting a sync API behind API Gateway and a NAT'd Lambda and then complaining about the bundle is a design problem dressed up as a pricing problem. Workers is a better fit for that workload, and he should use it. Just don't declare the tool dead because it was the wrong one for the job.

Rethinking Distributed Systems for Serverless Performance and Reliability by Aaron Davidson, Roland Fäustlin, and Zach Williams
Databricks walks through how their serverless Spark platform works, including Spark Connect that decouples apps from clusters, a Serverless Gateway that does the routing, and an autoscaler that earns its name. Using serverless to take 4-5 hour jobs down to 20 minutes is the kind of number that makes the architectural decisions worth reading about.

Podcasts, Videos, and more

How serverless experts build with AI today | Serverless Office Hours
Mark Sailes joins Julian Wood to share how serverless experts built Study from Experts, a focused video learning platform for AWS professionals.

Beyond the Basics: Production Serverless Patterns for Extreme Scale • Janak Agarwal • GOTO 2025
Janak digs into Lambda patterns that actually hold up under load, with two grounded examples: rapid scale-out for spiky traffic and real-time financial analytics built on Step Functions Distributed Map. This is the kind of content that should be louder than the "Lambda is dead" takes, because it shows what the architecture is genuinely good at.

Spec-driven development: The AI engineering workflow at Notion | Ryan Nystrom
Claire Vo interviews Ryan Nystrom about how Notion engineers use their internal Boxy system to @mention Codex from comments and get full PRs with screenshots in 20 minutes. The conversation covers practical workflows including configuring subagents, MCP integrations, and the shift toward spec-first development where AI handles implementation.

New from AWS

Final Thoughts 🤔

Look at what AWS shipped this week and squint a little. Claude Platform on AWS, Agent Toolkit, and AWS MCP Server GA, plus AgentCore gets durable file systems, metadata for long-term memory, and payments with stablecoin rails. AWS is staking out the substrate layer for the agentic era, and the feature list isn't random.

The bet is straightforward. If agents need compute, identity, storage, memory, payment, and an authenticated way to call services, AWS already has four of those and is shipping the other two as fast as they can write their press releases. The pitch to enterprises is: your agents already run on AWS, your data already lives on AWS, your IAM already governs everything, so why would you run the agent loop anywhere else?

It's a credible play. But the serverless comparison I mentioned earlier is the one worth thinking about. AWS had a multi-year head start with Lambda, and the platform shape was so unfamiliar that competitors took years to even define the category. Agents don't have that property. Cloudflare, Vercel, Modal, Fly, and a dozen smaller platforms are already shipping agent primitives. The Anthropic-AWS deal is notable, but Anthropic will sell its service to anyone willing to buy. Model providers are commodity inputs now. The differentiation has to come from somewhere else.

The substrate fight will be won on governance, observability, and cost controls, not raw capability. Every platform is going to give agents file systems and wallets and OS-level actions. The platform that wins is the one where, when an agent does something dumb or expensive at 3 a.m., you can see exactly what happened, who authorized it, what it cost, and how to stop it from happening again. AWS has decades of muscle memory on that exact problem, which is their edge.

If you're building on any of these primitives, the planning question is no longer "can the agent do this." It's "when this agent does something I didn't expect, what's my blast radius and how fast can I close it." Build for that and the rest takes care of itself.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #364: Agents With Credit Cards 🛒

2026-05-05T12:00:00Z

Agents With Credit Cards 🛒

In our previous issue, serverless became less stateless, OpenAI dropped two major model upgrades, and Claude went after creatives. This week, Amazon Bedrock crosses the final frontier of hosted frontier models, AI agents can now buy domain names for side projects they'll never finish, and Amazon Q Developer gets a one-way ticket to the AWS graveyard. Plus, we've got lots of amazing cloud, serverless, and AI content from the community.

News & Announcements

In AWS news, Amazon Aurora DSQL now supports the JSON data type with compression, which is a great addition that pushes DSQL closer to Postgres-style storage semantics. Over on the edge, Amazon CloudFront announced WebSocket support for VPC origins, letting you keep origins secured inside the VPC while still allowing WebSocket traffic through. CloudFront also now supports invalidation by cache tag, which is a really big win. If you wanted to invalidate groups of files before, you had to specify all the URL patterns yourself and keep track of them. Tag-based invalidation lets you flush a logical batch of files without nuking the entire cache, which is way cheaper and more efficient.

The agent autonomy story keeps getting bigger (and scarier). AWS announced that Amazon WorkSpaces now gives AI agents their own desktop in preview. If you still have your inventory managed with Microsoft Access on Windows 95, then this might be for you. We're slowly starting to treat AI agents as independent, autonomous things with increasingly more permissive sandboxes. That has real upside, but also real downside risk. Pair that with OS Level Actions in Amazon Bedrock AgentCore Browser, which lets agents interact with native popups and dialogs that previously blocked browser automation, and the sandbox metaphor gets thinner every minute. Cloudflare is on the same trajectory: agents can now create Cloudflare accounts, buy domains, and deploy, which is impressive, but means an agent that can stand up infrastructure is also an agent that can run up your cloud bills.

Inside Bedrock AgentCore itself there was a steady stream of updates. AgentCore Optimization is now in preview, allowing agents to improve production performance by analyzing their own traces. AgentCore Identity now supports On-Behalf-Of token exchange, letting an agent log in as a delegated human user, which is again powerful and a little terrifying. And AgentCore Runtime now supports Node.js for direct code deployment, so you can ship Node agents as ZIP uploads with bundled node_modules instead of needing a container. Also, Bedrock now offers OpenAI models, Codex, and Managed Agents in limited preview, which means Bedrock now hosts effectively every major frontier model.

On the compute and tooling side, AWS Lambda added support for Ruby 4.0. AWS is also leaning hard into Amazon Quick. You can now generate dashboards from natural language prompts and it's now available as a desktop application for macOS and Windows in preview. Meanwhile, Amazon Q Developer got an end-of-support announcement, which we all knew was coming. Q Developer was a waypoint along AWS's agentic coding journey, not the destination. And the Serverless ICYMI Q1 2026 roundup is worth a look. Lots of interesting stuff including durable function updates, larger Lambda, SQS, and EventBridge payloads, DynamoDB cross-account replication, and a bunch of AgentCore infrastructure work.

In Anthropic news, Claude Security is now in public beta, which scans codebases for vulnerabilities by inspecting how components interact rather than pattern-matching against a CVE list. They've already tested it with hundreds of organizations over the past two months, and the approach is impressive. Also, the Claude API skill is now available in CodeRabbit, JetBrains, Resolve AI, and Warp, bundling production-ready knowledge of API patterns, prompt caching rules, and per-model configuration directly into those tools and staying current as you work.

Finally, on the Cloudflare side, they introduced Dynamic Workflows, which combines durable execution with dynamic Workers so the platform can route workflow instances to different tenant code without pre-deployed targets. It's another interesting AI-agent primitive, especially for things like per-tenant CI/CD pipelines.

Tutorials

Inbox & Outbox patterns for reliable event processing by Yan Cui
Organizing Agents’ memory at scale: Namespace design patterns in AgentCore Memory by Noor Randhawa
Run custom MCP proxies serverless on Amazon Bedrock AgentCore Runtime by Nizar Kheir
Before You Rebuild Your RAG Stack: Why Your Answers are Weak | Serverless Guru by Cyril Bandolo
S3 Files: Simplified AWS Lambda Processing with Terraform by Darryl Ruggles
Replacing Puppeteer on AWS Lambda for Screenshots by Mike Griffiths
How I Used Amazon Quick to Run a Full Security Audit on My SaaS — and Fixed 11 Vulnerabilities in One Session by Asad Marcus
I Injected Three Faults. The Agent Found All of Them. by Romar Cablao
Building agentic AI for Amazon RDS for SQL Server with Strands and AgentCore by Sudhir Amin
It's All About That Memory - Using Long and Short Term Memory with Agents by Darryl Ruggles

Reads

Lessons from building Claude Code: Prompt caching is everything
The Claude Code team treats prompt cache hit rate as an SRE metric with SEV alerts, because caching's prefix-match rule makes obvious optimizations backfire: switching to Haiku mid-session for an easy question costs more than letting Opus answer it. The post covers the patterns that follow, including modeling Plan Mode as tools, deferring MCP schemas via stubs, and cache-safe forking for compaction.

The Reinvention Problem
Hans Schabert and Aaron Sempf ran the same prescribed agent procedure hundreds of times and watched it splinter into dozens of execution paths, with the most common one accounting for barely a quarter of runs. Their argument: stuffing a workflow into a system prompt hands the model a reference manual when what governance actually requires is an order, and no amount of better prompting or larger context will close that gap.

Interrupting agents with human-in-the-loop feedback
Heeki Park catalogs four ways to wedge human approval into an agent before it issues a refund or revokes access: model-moderated inline functions in AgentCore harness, Strands BeforeToolCallEvent hooks, in-tool ctx.interrupt() calls, and MCP server elicitations. Each comes with code samples and a clear "when to use" rubric depending on whether tool names are known upfront and who owns the tool code.

Podcasts, Videos, and more

Automating AWS Lambda runtime upgrades | Serverless Office Hours
Dan Fox and Brian Krygsman join Julian Wood to explore how AWS Transform custom can take the pain out of Lambda runtime migrations. They cover AWS Transform custom, a tool for automating Lambda runtime upgrades, and walk through how the AI agent manages code changes, dependency updates, and validation when migrating from deprecated to modern runtimes.

Serverless & OpenTelemetry ❤️ Better Together
James Eastham shows you how to escape the pain of clicking through endless CloudWatch log groups and trying to piece together X-Ray by learning how to instrument your .NET serverless apps with OpenTelemetry.

New from AWS

Final Thoughts 🤔

Agents now get their own Windows desktops. They can buy domains, spin up Cloudflare accounts, deploy infrastructure, dismiss native OS dialogs, and impersonate users via delegated tokens. A year ago we were arguing about whether agents should be allowed to run shell commands. Now AWS is handing them WorkSpaces and Cloudflare is handing them credit cards. The sandbox keeps getting roomier, and the blast radius keeps growing with it.

I'm not against any of this. The capability story is genuinely exciting, and most of these primitives are things real production systems need. But we're shipping the autonomy faster than the controls. On-Behalf-Of token exchange in AgentCore Identity is a great example: powerful for legitimate delegation, also a fantastic way to lose the audit trail if you're not careful about how you scope it. Same story with agents that can stand up cloud accounts. Great until one of them runs a runaway loop on your billing.

The Bedrock news is the other shoe dropping. Adding OpenAI models, Codex, and Managed Agents in preview means Bedrock is now the universal hosting layer for frontier models. That's a real shift. Model choice is becoming an AWS configuration setting rather than a vendor commitment, which is good for builders and very interesting for the rest of the market.

The pattern across all of this is clear: the platforms are racing to give agents more rope, and the governance, observability, and cost-control story is still catching up. If you're building on these primitives, that gap is where you live now. Plan for it.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #363: Serverless Isn't Stateless Anymore 💾

2026-04-28T12:00:00Z

Serverless Isn't Stateless Anymore 💾

In our previous issue, Claude got a major upgrade, AWS made AI costs more visible, and Cloudflare went all-in on agents. This week, serverless becomes less stateless, OpenAI drops two major model upgrades, and Claude goes after creatives. Plus, we've got plenty of content from the cloud, serverless, and AI communities.

News & Announcements

Maybe you noticed that AWS is turning serverless into something a lot more… stateful. Lambda can now mount S3 as a file system with S3 Files, which is a pretty big shift in how you think about data access in functions. Pair that with the Lambda Durable Execution SDK for Java going GA and durable functions expanding to 16 more regions, and it’s clear AWS is moving Lambda toward long-running, stateful workflows without giving up the "serverless" model.

On the agent side, AWS continues its work to remove developer friction. The latest Amazon Bedrock AgentCore updates promise you can get a working agent running in minutes, with new capabilities around orchestration, tooling, and faster setup. That’s backed by additional AgentCore feature releases and infrastructure improvements like Gateway + Identity support for VPC egress, which handles one of the more annoying real-world constraints when connecting agents to private systems.

AWS and Anthropic also continue to get closer. There’s an expanded partnership for massive new compute capacity, and you can now run Claude Cowork directly in Amazon Bedrock. I still think this is a great bet by AWS to own the integration point for the AI model ecosystem.

After last week's Opus 4.7 announcement, you knew it wouldn't be long before OpenAI responded. GPT-5.5 is here with all the expected benchmark wins and a 1M token context window, which is starting to feel less like a flex and more like table stakes. They also dropped ChatGPT Images 2.0, which is scary good. Alongside that, we got workspace agents in ChatGPT, more signs of the next phase of the Microsoft partnership, and a fresh set of “principles” to remind us everything is under control. 😳

Anthropic isn't slowing down either. They just announced Claude for Creative Work, which includes new plugins and integrations with partners like Blender, Autodesk, Adobe, Ableton, and Splice. These are tools that let Claude work directly alongside the software creative professionals are using every day. Their strategy is absolutely 🔥. They’re also rolling out built-in memory for Claude managed agents, now in public beta. Memory is quickly becoming the differentiator, and everyone is racing to make it feel less like a hack and more like infrastructure.

Tutorials

DSQL SQL Dialect: How Amazon Aurora DSQL differs from single-instance PostgreSQL by Rob Petersen
Your AWS Cognito Emails Are Going to Spam — Here Is How to Fix It Step by Step by Tanseer
DynamoDB vs RDS at 10K, 100K, and 1M RPS: a pre-deployment simulation comparison by Abhishek Gupta
Serverless applications on AWS with Lambda using Java 25, API Gateway and Aurora DSQL - Part 6 Using GraalVM Native Image by Vadym Kazulkin
Best practices and architecture patterns for cross-account sharing in Oracle Database@AWS by Yamuna Palasamudram
Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch by Gleb Geinke
Build Strands Agents with SageMaker AI models and MLflow by Dheeraj Hegde
Securing Private Video Content with CloudFront Signed URLs and Serverless on AWS by Lee Gilmore
Building an agent harness by Heeki Park
OpenAI API deployment checklist by OpenAI

Reads

Anthropic Opus 4.6 vs 4.7 - Which is better? A code quality experiment.
An AWS AI Hero tests Claude Opus 4.6 against 4.7 using the same Tetris implementation requirements across 13 code quality dimensions. Some folks are calling 4.6 a step back, but 4.7 seems to be finding its footing. I’ve been pretty happy with it so far. Feels like a reminder that model progress isn’t always a straight line, but the trajectory still points up.

Serverless FinOps: Why Lambda Cost Models Break Every Assumption You Learned from VMs
Riya Mittal explains how Lambda's three-dimensional pricing (invocations, duration, memory) creates a fundamentally different cost model than VMs. Keeping cost top of mind is table stakes now. But optimizing for cost alone misses the bigger picture. Scale, performance, and operational overhead all show up eventually. The real game is balancing all three without painting yourself into a corner.

Building agents that reach production systems with MCP
Nice breakdown of three different ways to wire systems into MCP servers. More importantly, it’s another example of patterns starting to solidify. Still early, still messy, but the industry is slowly converging on what “good” looks like.

Cold Starts Are Dead
Cold starts aren’t what they used to be. There are still edge cases, but for most workloads, they’re manageable or negligible. Eric Johnson covers how platform improvements and better patterns minimize them, resulting in them rarely showing up where it actually matters.

Speeding up agentic workflows with WebSockets in the Responses API
OpenAI explains how they reduced agentic workflow latency by 40% using WebSockets instead of repeated HTTP requests. The technical approach maintains persistent connections and caches conversation state, eliminating redundant processing of conversation history while exposing the full speed of their faster GPT-5.3-Codex-Spark model.

Reducing Token Burn Rate With A Well-Designed Architecture
Teri Radichel walks through building a Lambda troubleshooting system that separates deterministic data gathering from AI analysis. The approach avoids burning tokens on repetitive queries by using traditional code to collect logs and configuration, only invoking AI for interpretation. Stop wasting tokens on repetitive work and only pay for actual insight.

I Run Qwen 3.6 on Two GPUs Because Renting AI Is Boring
Tyler Folkman explains that locally hosted models might not match the top-tier APIs, but they have one big advantage. They don’t go down. While Anthropic and others keep having “moments” (like as I'm writing this), running your own stack looks a lot less boring.

The Escalation Trap
Aaron Sempf walks through three failure modes of human escalation in AI systems: over-escalation creating bottlenecks, selective escalation missing new edge cases, and avoiding escalation entirely. The piece argues for moving escalation decisions to a separate governance layer that evaluates authority boundaries before execution. This is a hard problem.

How I Use Claude Cowork To Write With AI In My Voice
Ran Isenberg walks through his Claude Cowork configuration for generating content that sounds human. But trying too hard to “not sound like AI” can backfire. A lot of the things people avoid, like short sentences and clarity, are just good writing. The goal shouldn't be to hide AI; it should be to help you articulate your thoughts and ideas clearly.

Podcasts, Videos, and more

Serverless CrAIc Ep 83 Psychological Safety in the AI Era (No One Talks About This)
Serverless CrAIc explores how rapid AI adoption challenges team dynamics, mentorship capacity, and organizational culture. Keeping up used to be hard, but now it’s relentless. Fast doesn’t guarantee success, but it does help with learning. And that’s the frustrating part. Watching others move quickly and wondering what they’ve figured out that you haven’t. Also, no, we’re probably not all losing our jobs tomorrow. But it’s not crazy to wonder what the people in charge think.

AWS Lambda durable functions: Best Practices, AI patterns, and Futures | Serverless Office Hours
Michael Gasch and Eric Johnson join Julian Wood to explore the latest in AWS Lambda durable functions, from Java SDK GA, S3 File support, to what's coming next.

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)
Lenny's interview with Cat Wu explores how Anthropic builds products in days rather than months, and why their employees build custom internal tools instead of buying SaaS. Lots of great insights in here, including emerging PM skills in AI and the shift toward managing AI agent fleets rather than doing tasks yourself.

New from AWS

Developer Tools

Create minimal reproductions for AWS SDK JavaScript v3 with create-aws-sdk-repro by John Lwin
AWS released create-aws-sdk-repro, a CLI tool that generates boilerplate for AWS SDK for JavaScript v3 projects. It handles service selection, environment setup (Node.js, Browser, or React Native), and creates projects with proper imports and credentials configuration already in place.

Final Thoughts 🤔

It's getting a lot easier to build systems that don't forget.

Serverless isn't stateless anymore. Agents are getting access to real tools, real workflows, and persistent memory. And the infrastructure is finally starting to reflect that shift, with better primitives for state, orchestration, and long-running execution.

But the tradeoffs are changing. We're layering memory into systems that were designed to be ephemeral. Giving agents persistence across sessions. Letting them interact with tools and data in ways that blur the line between request and workflow. The hard part isn't adding memory. It's deciding what gets promoted from a single session into something durable, what stays scoped to one agent versus shared across many, and what should be forgotten on purpose.

That's where things get complicated. State introduces responsibility. Memory introduces risk. Every piece of context an agent carries forward is something you now have to govern. Who can read it, when it expires, how it's surfaced back into a prompt, and what happens when it's wrong. The more capable these systems become, the more those decisions start to look like product decisions, not implementation details.

At the same time, the direction is forming. Serverless platforms are adding stateful primitives. Agent frameworks are focusing on orchestration instead of just prompts. Memory is becoming a first-class concept instead of a bolted-on feature. Even model providers are starting to expose more control over how context is stored, retrieved, and applied.

It's not just about generating better responses anymore. It's about building systems that can carry context forward. Systems that can act, adapt, and remember without breaking the guarantees we still rely on. The shift is real, and it's accelerating.

Because once systems start remembering, everything else has to change with them.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Issue #362: Mo’ Models, Mo’ Problems ⚠️

2026-04-21T12:00:00Z

Mo’ Models, Mo’ Problems ⚠️

In our previous issue, AI started breaking things faster than we can defend them, AWS launched an agent registry, and S3 kinda became a filesystem. This week, Claude gets a major upgrade, AWS makes AI costs more visible, and Cloudflare goes all-in on agents. Plus, we've got some amazing cloud, serverless, and AI content from the community.

News & Announcements

Anthropic announced Claude Opus 4.7 this past week as their latest push towards world domination. Early signals point to serious gains in software engineering, especially for long-running tasks, plus stronger vision support. AWS wasted no time rolling it out in Bedrock.

AWS also introduced granular cost attribution for Amazon Bedrock, which is a big step toward actually understanding AI spend. Cost control and observability for LLMs is still pretty messy, and being able to map usage down to IAM users and roles starts to make that problem a lot more tractable.

Amazon Aurora Serverless is getting up to 30% better performance with smarter scaling, while still keeping the scale-to-zero promise. There’s a deeper dive from the team here if you want the details. I like this direction.

AWS also announced general availability of AWS Interconnect, kicking things off with Google Cloud. Dedicated bandwidth between clouds is becoming a thing, with Azure and Oracle Cloud Infrastructure expected to follow later this year. Let the homogeneity begin.

Anthropic introduced routines in Claude Code, which basically turns repeatable development workflows into something you can automate. Feels like another positive step toward making agents more useful in day-to-day dev work. They also highlighted what people are building in their ecosystem with their latest hackathon winners. No fluff, they're all practical AI solutions that address real pain points. 🤷

It was Agents Week over at Cloudflare last week, and they shipped a lot. The full rundown of launches is here, but there were a few standouts: AI Search as a core primitive for agents, Flagship to bring feature flags into the agent era, Agent Memory, and a new email service for agents in public beta.

Not on my 2026 Bingo card, but Apple announced that Tim Cook is stepping into the Executive Chairman role at Apple, with John Ternus taking over as CEO. Big shift for one of the most stable leadership runs in tech. I'm sure it has nothing to do with Apple Intelligence. 😬

And in case you missed it, the recent Vercel hack highlights a growing pattern in cybersecurity. Third-party AI tooling accessing internal systems is introducing a whole new threat model. One that most teams aren’t even aware of, never mind prepared for.

If your incident response still involves five tabs, three tools, and someone asking “who’s on point?”, it might be time to rethink things. incident.io is an all-in-one platform that runs Slack and Teams native, so you can declare, manage, and resolve incidents without leaving the conversation. It handles the busywork too, auto-assigning roles, kicking off workflows, and even surfacing insights from past incidents so you don’t keep fixing the same problem twice. Definitely worth a deeper look if you want faster response times without adding more process: incident.io. Sponsored

Tutorials

Lambda Managed Instances: A Working Demo and the Math Behind It by Eric Johnson
Transform retail with AWS generative AI services by Bhavya Chugh
The Hidden Cost of AWS Lambda SnapStart for Python, and How I Fixed It with Durable Functions by Jaya Ganesh
Best practices for using Claude Opus 4.7 with Claude Code
Power video semantic search with Amazon Nova Multimodal Embeddings by Amit Kalawat
Serverless applications on AWS with Lambda using Java 25, API Gateway and DynamoDB - Part 6 Using GraalVM Native Image by Vadym Kazulkin
Accelerate database migration to Amazon Aurora DSQL with Kiro and Amazon Bedrock AgentCore by Noorul Mahajabeen Mustafa
Using AWS Lambda Extensions to Run Post-Response Telemetry Flush by Melvin Philips
Serverless applications on AWS with Lambda using Java 25, API Gateway and Aurora DSQL - Part 5 SnapStart + full priming by Vadym Kazulkin

Reads

Navigating the generative AI journey: The Path-to-Value framework from AWS
AWS tries to put some structure around the chaos with a “Path-to-Value” framework. It’s less of a step-by-step guide and more of a reminder that AI adoption is messy, multidimensional, and mostly about tradeoffs between value, risk, and organizational reality.

Moving past bots vs. humans
The bot vs human model is breaking down fast. Cloudflare is leaning into intent over identity, which feels like the right direction as agents start acting more like users and users start looking more like bots.

The AI engineering stack we built internally — on the platform we ship
Always interesting when a company dogfoods its own stack at scale. Cloudflare’s setup is a good look at what a modern AI platform actually needs when you’re pushing billions of tokens and not just running demos.

Multi-Agent AI in Production | Taskade Engineering (2026)
Three years into multi-agent systems and the same problems keep showing up. Memory, coordination, and agents getting stuck in loops. Good practical patterns here, especially if you’ve already hit these walls.

Why AWS Certified GenAI Developer stands apart from other AWS certs
Anwaar Hussain points out that this cert is less about knowing AI and more about wiring it into real systems. Which is probably the right shift, because building with AI is quickly becoming more of an architecture problem than a modeling one.

Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime
Memory is still the hardest part of agent design. Amit Kayal gives us a solid walkthrough of scoping, lifecycle, and not blowing up your prompts while trying to make agents feel stateful.

Learnings from conducting ~1,000 interviews at Amazon
Steve Huynh shares a good reminder that hiring is its own system with its own signals. If you don’t understand what a company actually optimizes for, you’re probably optimizing for the wrong thing.

Podcasts, Videos, and more

Serverless Apache Airflow | Serverless Office Hours
Airflow, but make it serverless. John Jackson and Kamen Sharlandjiev breakdown when MWAA actually makes sense versus just reaching for Step Functions, especially once you factor in cost, scaling, and how much orchestration complexity you really need.

Building, Managing & Governing APIs on AWS
APIs aren’t just for humans anymore. Giedrius Praspaliauskas covers the full lifecycle on AWS, but the interesting part is how API strategies are evolving to support agents, not just apps. Same primitives, very different consumers.

New from AWS

Developer Tools

brognilucas/sls-testing by Lucas Brogni
Typed, composable testing utilities for AWS Lambda from Lucas Brogni that provides event builders and Jest matchers for Lambda functions.

pujaaan/simple-cdk by pujaaan
A thin runtime over CDK that scans your folders, runs adapters in a deterministic three-phase pipeline (discover → register → wire), and emits real CDK constructs.

ToolSimulator: scalable tool testing for AI agents by Darren Wang
An LLM-powered tool simulation framework within Strands Evals to thoroughly and safely test AI agents that rely on external tools, at scale.

Final Thoughts 🤔

It’s getting a lot easier to build powerful systems.

Models are getting better at real work. Agents are starting to handle meaningful workflows. And the infrastructure around all of this is finally catching up, from cost visibility to orchestration to deployment patterns.

But the gaps are still there.

We’re wiring these capabilities into systems that were never designed for autonomous behavior. Giving tools access to internal systems. Letting agents make decisions across boundaries that used to be tightly controlled. And in some cases, we’re doing it faster than we understand the implications.

That’s where things start to break.

The Vercel incident isn’t an outlier. It’s a preview. A glimpse into what happens when powerful models meet loosely defined boundaries and third-party integrations. The tooling is evolving quickly, but the assumptions behind our systems haven’t fully caught up yet.

At the same time, you can see the industry starting to respond.

Better cost attribution. More structured agent workflows. Dedicated primitives for memory, search, and control. Even multicloud connectivity is starting to blur the lines between platforms. It’s not just about building faster anymore, it’s about building systems that can actually support what we’re asking them to do.

Still early. Still messy. But the pattern is emerging. More power, more abstraction, and more responsibility to get the boundaries right.

Because “Mo’ Models, Mo’ Problems” isn’t really a joke. It’s just the beginning.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.