June 30, 2026
In our previous issue, AWS Summit NYC unsurprisingly went heavy on AI agents, Lambda picked up MicroVMs for isolated sandboxes, and AWS Blocks brought IfC back into the conversation. This week, Anthropic launches Claude Sonnet 5, CloudFormation gets much faster, and OpenAI starts making Jalapeños. Plus, we have plenty of excellent cloud, serverless, and AI content from the community.
I'm hanging out at the AI Engineer World's Fair this week in San Francisco, and the vibe here is amazing. We'll get to that in a minute. The big headline is Claude Sonnet 5. Anthropic shipped it with a real jump in coding and agentic performance at Sonnet pricing, with introductory rates of $2 and $10 per million tokens through August 2026. It's already on AWS via Bedrock and the Claude Platform, and Aamna Najmi's walkthrough has the SDK and Converse examples.
Anthropic paired that with platform news: Claude is GA in Microsoft Foundry (Opus 4.8 and Haiku 4.5, Azure- or Anthropic-hosted), a self-hosted Claude apps gateway that adds SSO and centralized policy to Claude Code on Bedrock and GCP, and agent identity, which finally gives Claude Tag its own workspace account instead of borrowing a human's.
On AWS, CloudFormation got faster with Express mode speeding up stack operations up to 4x by returning once a configuration is applied (Channy's writeup covers the tradeoffs), and pre-deployment validation now catches quota, Config, and ECR issues before provisioning. Faster loops and fewer failed deploys is what the codegen era needs out of IaC. ElastiCache added Valkey 9.1 with a new I/O threading model and commands like HGETDEL (more details here).
Also note that AWS is moving several services to maintenance mode on July 30, including Bedrock Agents (now Bedrock Agents Classic), Kendra, and Q Business. Existing customers can stay, but if you just haven't gotten around to trying out myApplications on the AWS Console yet, you're out of luck. 😉 Glad to see that AWS continues to trim some fat, but as I've said before, they should have taken the whole leg at once instead of just cutting out the rot. That's a horrible visual, but an apt analogy IMO.
Outside AWS, Oracle opened up MySQL governance with AWS and Google Cloud on the steering committee, OpenAI and Broadcom unveiled Jalapeño, a from-scratch LLM inference chip already running GPT-5.3-Codex-Spark, and OpenAI previewed GPT-5.6 in three variants with a phased rollout. I know some people with access and now I'm super jealous.
Lessons learned from scaling to 1 million Lambda functions by Ben Freiberg
A million Lambda functions across thousands of accounts is the kind of scale that breaks tooling nobody expects to break, and CloudFormation StackSets is right at the top of that list. Read it for the observability cost lessons, which is where most teams get caught off guard.
PSA: That probably doesn't need to be SaaS | Ready, Set, Cloud! by
Allen's point about builders shipping products now instead of writing up what they learned is one I've felt myself lately. The tradeoff he names, easier building at the cost of shared knowledge, is real and has stuck with me.
Make AI Boring Again by Charity Majors
Charity's case for learning AI so you understand how it fails, rather than opting out, is the right instinct for engineers. She doesn't wave away the real problems around training data, labor, and energy, which is what makes the argument land.
AWS Fargate vs Lambda: When Does Lambda Stop Being Cheaper? by Matt S
The useful reframe here is that Lambda's breakeven is driven by execution duration rather than request volume, which is backwards from how most people reason about it. A 200ms API staying cheaper up to 6-8M calls a month is a handy number to keep in your back pocket.
Why We Built Our Own CRM for Under $5 using AWS Kiro by Lee Gilmore
Lee provides a solid writeup advocating build versus buy, with a clever DynamoDB-to-Aurora DSQL sync using change data capture. Just remember the few dollars a month doesn't include the time you'll spend maintaining it, which is the part that'll bite you. But I'd probably build this myself too. 😂
Why I still approve my memory by hand by Javier Villanueva
The argument that human-in-the-loop curation beats automated validation for a single-user knowledge base is the pattern I recommend. Automated approval mostly gives you correlated bias dressed up as confirmation, which is a trap you don't want at this scale.
A return to two-pizza culture by Dr. Werner Vogels
Werner tying two-pizza teams to AI agents is a sharp framing, and the Quick Desktop story of an overnight prototype reshaping how the team planned is the example that sells it. The part I'd watch is what happens to documentation once prototypes get this cheap.
Impressions from visiting OpenAI, Anthropic, & Cursor by Gergely Orosz
Gergely's four trends are a good pulse check, especially cloud agents going mainstream and engineers optimizing their code for agent efficiency. The cost-reduction pressure he describes is what I'd keep an eye on, since it shapes what these tools become next.
What you need to know about Lambda MicroVMs by Yan Cui
Yan's framing is the clearest I've seen: MicroVMs sit closer to EC2 than Lambda, since you're running persistent VMs and managing the fleet yourself. If you came in expecting request/response Lambda ergonomics, read this first to reset your expectations.
I Tried AWS Blocks on a Real Amplify Gen2 Project — Local DynamoDB, No AWS Account, 1-Second Loops by Kohei Aoki
A hands-on look at AWS Blocks with simulated local DynamoDB and one-second feedback loops instead of cloud deploy cycles. The fast feedback is a nice side-benefit, but the real win is not having to bifurcate your business logic into IaC.
Introducing AWS Lambda MicroVMs | Serverless Office Hours
The Serverless Office Hours crew demos MicroVMs live, which is the fastest way to see snapshot launches and suspend/resume in action. If you want the mechanics behind the sandbox-per-session pattern, start here before the docs.
GLM 5.2: why I’m replacing Opus in Claude Code with this new model
Claire's walkthrough of dropping GLM 5.2 into Claude Code is a useful look at what open-weight actually buys you on cost and vendor independence. She's clear about where it fell short too, which keeps it from being a hype piece.
If 2025 was the year of agents, 2026 is the year of loops and software factories. That was the throughline at the AI Engineer World's Fair today, where more than 7,000 engineers gathered in San Francisco to compare notes and trade war stories. Shawn "swyx" Wang set the stage with a talk about Loopcraft, tracing how the loops keep compounding until you reach the highest one: engineers learning from each other. Peter Steinberger, the creator of Open Claw, and probably several steps ahead of most, put the operational edge on it: keeping ten terminals open to babysit your agents is already the old way. An agent manager that lets you drop into session when you need to take control is what comes next.

The energy around that vision is hard to miss, and the pace backs it up. Romain Huet and Alexander Embiricos from OpenAI mentioned they're shipping new models roughly every six weeks now, a cadence that would have sounded absurd a year ago. And it isn't only the frontier labs. Zixuan Li from Z.ai dialed in to show off GLM 5.2, an open-weight model built for long-horizon tasks that's pretty close to Opus 4.8 and GPT 5.5 on the benchmarks. Claire Vo's video on swapping Opus for GLM 5.2 in Claude Code says it holds up in real work too, and it has me tempted to throw an RTX 5090 in an Ubuntu box and run my own local AI lab. Capability is getting faster, cheaper, and more portable by the month.
But the cracks are starting to show, and it's in the same place it always is: software maintenance. The software-factory pitch was great for greenfield, and one-shotting a brand new app is exactly where these current models shine. Dexter Horthy hammered on this in his harness talk, that maintaining all the AI slop we're generating starts to break down after only a few months, and there's still a stubborn list of problems the agents can't solve without human intervention. The models are getting great at producing something from nothing, yet still struggle the moment you point them at a large, living codebase, including the ones it generated from scratch.
That's the gap I keep coming back to. Faster models, open weights, even a local lab of my own, none of it touches the part where the code you wrote three months ago now needs maintenance, new features, and security/performance upgrades. Loops are fantastic at the start of a project and shaky in the messy middle, where most useful software lives. If 2026 really is the year of software factories, the interesting work is less about building faster and more about whether the loop can survive contact with a real world software lifecycle, because that's the part nobody has solved yet.
See you next week,
Jeremy
I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.
Stay up to date on using serverless to build modern applications in the cloud. Get insights from experts, product releases, industry happenings, tutorials and much more, every week!
We share a lot of links each week. Check out the Most Popular links from this week's issue as chosen by our email subscribers.
Check out all of our amazing sponsors and find out how you can help spread the #serverless word by sponsoring an issue.
Jeremy is the founder of Ampt, a Cloud & AI consultant, and an AWS Serverless Hero that has a soft spot for helping people
solve problems using the cloud. You can find him ranting about serverless, cloud, and AI on Bluesky, LinkedIn, X, and at
conferences around the world.
Off-by-none is committed to celebrating the diversity of the serverless community and recognizing the people who make it awesome. If you know of someone doing amazing things with serverless, please nominate them to be a Serverless Star ⭐️!