Off-by-none: Issue #368

June 16, 2026

Claude Fable 5 is currently unavailable 🚫

In our previous issue, Anthropic shipped two major models, DynamoDB got "extended" to run locally on Postgres, and Aurora DSQL added JSONB support. In this issue, the US government orders Anthropic to pull Fable 5 and Mythos 5, AWS WAF starts charging AI bots for content, and Bedrock adds Grok, Gemma, and a pair of GPTs. Plus, we've got lots of great content from the cloud, serverless, and AI communities.

News & Announcements

The biggest story from last week is a takedown, not a product launch. Anthropic published a statement responding to a US government directive to suspend access to Fable 5 and Mythos 5 over "national security concerns." Anthropic walks through its defense-in-depth approach and argues the jailbreak vulnerabilities that triggered the order are about the same as the ones in models that are still happily serving traffic to North Korea (I may have made that last part up). Two weeks ago Fable 5 was the first generally available Mythos-class model on AWS. Now it's gone. Whatever you think of the merits, watching a government switch off a frontier model overnight is a preview of a world many of us haven't planned for. The residency assumptions baked into your architecture diagrams may be softer than you think.

Speaking of Bedrock, the model menu keeps growing. Grok 4.3 from xAI is now available on Amazon Bedrock with configurable reasoning effort levels, running on Mantle, the new inference engine AWS built for price performance. Google DeepMind's Gemma 4 family landed too, three open-weight variants with reasoning, multimodal understanding across text, image, video, and audio, native function calling, 35+ languages, and 256K-token context windows (the AWS ML blog has the deeper writeup covering the bedrock-mantle endpoint and OpenAI-compatible APIs). And OpenAI's GPT-5.4 and GPT-5.5 are now in US East (N. Virginia), both with 272K-token context and Responses API streaming, GPT-5.5 aimed at coding and research and GPT-5.4 at production reasoning. Three model families through one endpoint is great until the bill shows up, which is why the cost attribution work AWS has been doing will eventually pay off.

The item I keep coming back to is the one that turns your bot traffic into a revenue line. AWS WAF announced AI traffic monetization using the x402 protocol for machine-to-machine payments, letting publishers set differentiated pricing for AI bots and collect stablecoin payouts through Coinbase. The AWS blog has the mechanics: WAF returns an HTTP 402 with a machine-readable JSON price manifest, works with CloudFront distributions, and settles through Coinbase's x402 Facilitator. It's not happening in a vacuum, either. Visa invested in Replit to power agentic payments for developers, including work on Visa's Trusted Agent Protocol, so the plumbing for agents that pay for things is getting built on multiple fronts.

Agent platforms keep on maturing as well. Claude Managed Agents added scheduled deployments and environment vaults, with Rakuten and Notion already running recurring spreadsheet analysis and report generation, plus Browserbase and KERNEL integrations for browser work. OpenAI is acquiring Ona to give Codex persistent cloud execution, so agents can grind on a task for hours or days inside a customer-controlled environment. OpenAI also struck a deal to let OCI customers reach its models and Codex through Oracle Universal Credits, wiring AI spend into existing enterprise purchasing. And Amazon OpenSearch Service launched MCP Apps for agentic observability, letting agents dig into logs, traces, metrics, and alerts for root cause analysis from inside Claude Desktop or VS Code.

On the data and ops side, Amazon Bedrock AgentCore Memory now supports strictly consistent metadata for long-term memory, so you can attach values from your application that pass through without LLM inference. That gives you department-scoped retrieval, compliance boundaries, and multi-tenant memory where each tenant gets processed on its own, which is the kind of thing that sounds boring until you try to build memory for more than one customer. And Amazon CloudWatch added cross-account metrics centralization through AWS Organizations, replicating metrics from many accounts and regions into one destination account for unified monitoring and governance.

A few more worth your attention. AWS and Snowflake released a joint Custom Lens for the Well-Architected Framework, folding both platforms' best practices into one review across seven pillars, so you can stop juggling two separate sets of guidance. AWS CLI v1 is entering maintenance mode in July 2026, with botocore and s3transfer vendored directly into the codebase, which means if you're running CLI v1 and boto3 side by side, they'll each carry their own copies from here on out. And Kiro shipped a $100/mo Pro Max tier with more credits and access to all premium models. The jump from $40 to $200 was definitely a bit much for your average user, so dropping a tier right in the middle is a smart read of who actually churns.

Finally, I shipped a new Prisma 7 adapter in the Data API Client v2.4, so now you can point Prisma, Knex, Drizzle, or Kysely at the RDS Data API for Provisioned or Serverless Aurora clusters without a connection pool or VPC.

Tutorials

AI Agent Failure Detection and Root Cause Analysis with Strands Evals by Po-Shin Chen
Evaluate AI agents systematically with Agent-EvalKit by Ishan Singh
How Samsung achieved real-time pricing with AWS Lambda Response Streaming by Vijay Naik
Serverless applications on AWS with Lambda using Java 25, API Gateway and Aurora DSQL - Lambda performance optimization approaches by Vadym Kazulkin
Run Your Email Agent on Serverless by Qasim Muhammad
The Death of /tmp: S3 Mounting for Lambda is a Game-Changer by Yogesh Gupta
Cut Your AWS Fargate Bill by 40% — 10 Waste Patterns I Fixed in Production by Chirag Mehta
MCP Apps: Because Your Users Deserve More Than a Wall of Text by Maciej Sodkiewicz

Reads

How frontier teams are reinventing AI-native development
Swami details three approaches AWS used to test AI-native workflows, including pathfinder initiatives and structured sprints, and lays out five practices for teams restructuring around autonomous agents. If you're still treating AI as a fancier autocomplete, this is a nudge to think bigger about how the work itself changes.

The Review Bottleneck: Rethinking Software and Infrastructure Design for the Agent Era
A look at how coding agents moved the delivery bottleneck from writing code to reviewing and coordinating it. The proposed fixes, bounded contexts, contract-driven development, and pushing review upstream to intent instead of output, line up with what a lot of teams are feeling right now but haven't named yet.

AI demands more engineering discipline. Not less.
Charity Majors makes the case for what she calls Phoenix Architectures, where code becomes a materialized view you can regenerate once it goes stale. She draws the line from immutable infrastructure to treating AI-generated code as disposable, with validation moving to production. Classic Charity, and definitely worth your time.

The evolution of agentic surfaces: building with Claude Managed Agents
Anthropic introduces Claude Managed Agents as a set of composable APIs for production agents, handling orchestration, session management, credential isolation, and observability so teams can spend their time on context management instead of babysitting execution harnesses. Pairs well with the scheduling and vaults news above.

Takeaways from AWS Generative AI Lens
Amit Kayal breaks down the AWS Generative AI Lens with a focus on controlled AI-assisted workflows versus fully autonomous agents, walking through when AI should classify, when it should recommend, and when it should actually execute. The data governance and multi-tenant sections are the parts I'd read twice.

Lambda in a VPC Is Fine
Michael Walmsley walks through the evolution of Lambda VPC networking, from the painful 2016 days of on-demand ENI creation to today's Hyperplane implementation. If you're still repeating the old "never put Lambda in a VPC" advice, this explains why it stopped being true years ago.

Why AWS scrapped OpenSearch's architecture to chase agent workloads
Frederic Lardinois of The New Stack covers AWS's near-complete rebuild of OpenSearch Serverless, with separated storage and compute that scales to zero when idle and auto-scales 20x faster than before. It's built for the burst-and-idle usage that agent workloads generate, with log analytics arriving in June and agent memory features in H2 2026.

New from AWS

Security

AWS Destroyed the Value Proposition for Bedrock by Chris Farris
Chris digs into the part of the Fable 5 and Mythos 5 launch nobody put in the headline: the only allowed retention mode for these models on Bedrock is provider_data_share. Using them means your prompts and outputs leave the AWS boundary, land with Anthropic for 30 days, and become subject to human review. That breaks the neutral-broker guarantee that sent regulated and European shops to Bedrock in the first place. He walks through the compliance fallout and the SCP you should deploy today to deny anything other than none. Read this before you point a workload at either model, assuming they get turned back on.

From Socials

Just spent the last two weeks reworking my local Agent Hub system to use @opencode as the harness with qwen, gemma4, and mistral local models. Then I get this at 7:01pm. 😑 pic.twitter.com/T9as70aqdQ
— Jeremy Daly (@jeremy_daly) June 16, 2026

I'm not sure whether to be excited by this message, or if I should prepare for another rug pull. Either way, it forced me down an interesting multi-harness orchestration path.

Final Thoughts 🤔

HTTP 402 had been sitting in the spec since the early 90s with a note that said "reserved for future use." For three decades it was the status code nobody got to use, a placeholder for a payment layer the web never seemed to materialize. Then about a year ago, Coinbase introduced "x402: An open standard for
internet-native payments." Wait, did the crypto bros get it right? 😬 (fyi, I'm still a hard no on that)

AWS WAF now returns a 402 with a machine-readable price manifest when an AI bot asks for your content. The bot's agent reads the manifest, pays in stablecoin through Coinbase's x402 facilitator, and gets the content. No human in the loop and no checkout page. At the same time, Visa is putting money into Replit to build agentic payments and pushing its Trusted Agent Protocol, so the same machinery is getting assembled by the incumbents who actually move money for a living. When a 30-year-old dead status code and a Visa investment point in the same direction, that's usually a signal worth paying attention to.

What's happening here is a shift in how we treat bots. For most of the web's history, automated traffic was something you blocked, rate-limited, or grudgingly tolerated. The robots.txt era assumed crawlers were either friendly enough to respect a text file or hostile enough to fight. Now there's a third option: charge them. If an agent wants your content badly enough to pay for it, you can let it, and you can put a number on exactly how much that access is worth.

I'm not sure this scales, and there are real reasons for skepticism. Stablecoin payouts assume a settlement story most finance teams haven't signed off on. Differentiated pricing for bots assumes agents will agree to pay instead of routing around you, and the whole thing has a chicken-and-egg problem where it only matters once enough agents speak the protocol and enough publishers demand payment. None of that is solved. But the direction is clear, and for the first time the economics of serving an AI bot aren't automatically negative.

There's a question worth thinking about if you run content or an API. "Block all bots" is no longer the only defensive move available to you. The more interesting question is which agents you'd actually want to charge, which ones you'd serve for free because they send value back, and what your content is worth to a machine that has a budget and no patience for a paywall modal. That's a pricing exercise, not a security one, and most of us have never had to think about it. We probably should start.

See you next week,
Jeremy

I hope you enjoyed this newsletter. We're always looking for ideas and feedback to make it better and more inclusive, so please feel free to reach out to me via Bluesky, LinkedIn, X, or email.

Previous Issue

Issue #367 • June 9, 2026

What did I miss? 🎓

This Week's Top Links

We share a lot of links each week. Check out the Most Popular links from this week's issue as chosen by our email subscribers.

This Week's Sponsor

Check out all of our amazing sponsors and find out how you can help spread the #serverless word by sponsoring an issue.

About the Author

Jeremy is the founder of Ampt, a Cloud & AI consultant, and an AWS Serverless Hero that has a soft spot for helping people solve problems using the cloud. You can find him ranting about serverless, cloud, and AI on Bluesky, LinkedIn, X, and at conferences around the world.

Nominate a Serverless Star

Off-by-none is committed to celebrating the diversity of the serverless community and recognizing the people who make it awesome. If you know of someone doing amazing things with serverless, please nominate them to be a Serverless Star ⭐️!