• Open

    Build a protein research copilot with Amazon Bedrock AgentCore
    This post shows you how to build a conversational protein research assistant that combines three capabilities: Natural language query parsing to extract structured search parameters, vector similarity search over protein embeddings using a specialized language model and ai-generated scientific summaries of search results.  ( 116 min )
    Shared infrastructure, isolated tenants: Pool model multi-tenancy with Amazon Bedrock AgentCore
    In this post, you will learn patterns for implementing production-ready multi-tenant systems using Amazon Bedrock AgentCore. You will see these patterns demonstrated through healthcare AI agents that serve multiple clinics and hospitals.  ( 116 min )

  • Open

    Building pay-per-intelligence for AI agents: How Ampersend uses Amazon Bedrock AgentCore Payments
    In this post, you will learn how Ampersend built a pay-per-intelligence routing layer on top of Amazon Bedrock AgentCore Payments. AI agents autonomously route tasks to the most effective model, pay per request, and operate within spending budgets. You will also see how the two-hop payment pattern works end-to-end and how to get started with your own implementation.  ( 111 min )
    Embed the world: Multimodal AI for searchable aerial imagery at scale
    In this post, we walk through the problem space, our architecture on Amazon Bedrock and Amazon OpenSearch Serverless, the evaluation methodology we built on OpenStreetMap ground truth, four experiments that compared embedding models, fusion strategies, captioning, and search methods, and the practical guidance you can apply when building a similar system. You’ll learn which design choices move the needle for geospatial semantic search, including why Amazon Nova Multimodal Embeddings delivered the highest F1 scores across both benchmark queries in our evaluation. The work described here evolved into Vexcel Intelligence, a searchable imagery product.  ( 123 min )
    Running ComfyUI workflows on Amazon SageMaker AI processing jobs
    In this post, we walk you through how to deploy ComfyUI workflows on Amazon SageMaker AI processing jobs to generate hundreds of high-quality images in a single batch. You learn how to set up the infrastructure using AWS Cloud Development Kit (AWS CDK), configure GPU-accelerated processing, and automate image generation at scale. You can then adapt this solution to your ComfyUI workflows specific to your needs. We will guide you through a practical, step-by-step process to automate ComfyUI workflows to generate hundreds of high-quality images in a single batch empowering you to scale your creative pipeline.  ( 114 min )
  • Open

    SRE Weekly Issue #522
    View on sreweekly.com A message from our sponsor, Bronto: What would an AI SRE choose for their observability stack?We asked AWS DevOps Agent to run a live test comparing Bronto, Grafana Loki, and Elasticsearch against the same OpenTelemetry dataset. Bronto scored highest (9.4/10) and was the only tool that didn’t return silent failures. Curious why? […]  ( 4 min )

  • Open

    悟不是灵光乍现
    悟不是灵光乍现,是慢慢磨出来的 我们常以为,“悟”是灵光一闪的顿悟,是天才的专属时刻。 但回望历史与当下,你会发现一个残酷的真相:真正的“悟”,往往伴随着痛苦的自我否定,是一场漫长而精细的“打磨”。 它不是一味做加法,而是加加再减减;不是一味获得,而是抽丝剥茧,曲阜纯真。 01. 子贡的举一反三 在《论语》中,子贡曾自信地问孔子:“贫穷时不巴结,富贵时不骄傲,这境界不错吧?” 孔子却泼了一盆冷水:“这只是及格线。不如贫穷却快乐,富贵却好礼。” 如果是普通人,可能觉得面子挂不住。但子贡立刻联想到《诗经》里的“如切如磋,如琢如磨”。 他“悟”了:道德修养不是静止的状态,而是像加工玉石一样,需要不断的切割、磋磨,甚至要切除原本属于自己的部分,才能成器。 孔子大赞他“告诸往而知来者”,能够举一反三。这里的“悟”,不是凭空而来的灵感,而是通过极致的观察与思考,将外在的知识内化为自己的智慧。没有之前的积累与反思,就没有那一刻的通透。 02. 秦观的炼字:山抹微云 北宋词人秦观写下千古名句“山抹微云”。一个“抹”字,为何能流传千年? 因为它不是简单的描写,而是诗人对自然深度观察和思考后的结晶。普通人看云,只知其在飘;秦观“悟”到了山与云之间那种拟人化的、充满情感的互动。 这个字,是他反复推敲、剔除平庸词汇后的结果。“悟”在这里,表现为对细节的极致敏感和对表达的精准把控。 03. Whimsical的断舍离 今天读到一篇文章 https://whimsical.com/blog/choosing-depth-over-breadth。 Whimsical 始于 2017 年,他们的愿景是旨在打造一个全能型团队协作中心,最初凭借流程图、线框图、思维导图等专注型工具取得了成功。 他们曾执着于打造“全能协作中心”,近期推出的“项目(Projects)”和“帖子(Posts)”等宽泛功能未能获得预期的关注。产品范围的扩大分散了资源,威胁到整体产品质量,产品团队意识到用户更看重核心工具的深度,而不是一个大一统的套件。他们没有固执己见,而是像工匠审视瑕疵品一样,通过分析数据、倾听用户,完成了深刻的自我反省。 最终,他们做出了艰难的决定:砍掉 Projects 和 Tasks 功能,重新聚焦并专注于其最具优势的核心产品:白板体验(Boards)。 这不就是“如切如磋”吗?切掉多余的欲望,磨去浮躁的广度,只留下最核心的价值。 这种“悟”,是敢于承认错误的勇气,更是从“贪多”走向“精深”的境界升华。 无论是子贡的修身、秦观的炼字,还是 Whimsical 的产品迭代,它们都指向同一个真理: “悟”不在深山,而在事上磨。在这个喧嚣的时代,愿我们都能拥有这份“琢磨”的耐心,在不断的自我重塑中,遇见那个更通透的自己。  ( 1 min )

  • Open

    Introducing Web Search on Amazon Bedrock AgentCore
    Web Search on Amazon Bedrock AgentCore is now generally available. In this post, we walk through what makes Web Search on Amazon Bedrock AgentCore different, why it matters, and how to wire it in with a few lines of code.  ( 113 min )
    Accelerate campaign workflow with insights from Adobe Marketing Agent for Amazon Quick
    This post shows how to enable Adobe Marketing Agent for Amazon Quick using a Model Context Protocol (MCP). We walk you through how to configure the integration, authenticate using your Adobe credentials, and get the latest insights in Amazon Quick. The sample workflow returns audience rankings, loyalty segment summaries, journey usage, and conflict recommendations.  ( 115 min )

  • Open

    Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch
    Amazon SageMaker AI provides fully managed real-time inference hosting for machine learning models. You deploy a model to a SageMaker endpoint backed by one or more compute instances, and SageMaker handles provisioning and scaling. SageMaker supports multiple endpoint architectures. This post focuses on the two most relevant to generative AI workloads with detailed observability: Single-model endpoints (SME) and Inference component (IC) endpoints.  ( 115 min )
    Amazon Bedrock AgentCore harness is now generally available: Go from idea to production-grade agent in minutes
    Today, Amazon Bedrock AgentCore harness is generally available. Two API calls (CreateHarness to define an agent, and InvokeHarness to run it), and you have an agent running in seconds. The agent runs in its own isolated environment with a filesystem and shell, so it can read files, run commands, and write code safely. It remembers users and conversations across sessions, picks up skills you point it at (including the AWS-curated catalog), browses the web, calls your tools through gateway or MCP, and switches model providers mid-session without losing context. Every step streams back to you in real time and is automatically traced to Amazon CloudWatch. You don’t need to write orchestration code or build a container, unless you want to.  ( 120 min )

  • Open

    Amazon SageMaker AI Async Inference now supports inline request payloads
    Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inference. Customers can now send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon Simple Storage Service (Amazon S3) before each invocation.  ( 110 min )
    Get back hours every day with autonomous agents in Amazon Quick
    Today, Quick gets even more powerful: new autonomous agents that work continuously on your behalf, an activity feed that helps you prioritize your most important work, and the ability to find insights across every data source your business runs on from a single question.  ( 110 min )
    Context intelligence for your data and AI agents at scale
    Agents are only as intelligent as the context they can reason over. Today, that context is scattered across data lakes, data warehouses, lakehouses, databases, and streams, and in institutional knowledge that has never been written down. You want to trust the decisions made by your AI agents, but that can't happen until agents have context. Imagine what becomes possible when we give agents a safe way to access the context they need to deliver trusted decisions. This is why at the AWS Summit New York City, we’re announcing a series of innovations that deliver intelligence for your data and AI agents at scale.  ( 111 min )
    New in Amazon Bedrock AgentCore: Build agents with broader knowledge and continuous learning
    Today we're introducing new capabilities on Amazon Bedrock AgentCore, the platform to build, connect, and optimize agents. In this post, we cover how these capabilities close each gap: connecting agents to organizational, web, and paid knowledge; helping teams find and fix what's going wrong in production; and enforcing controls that scale as agents grow more capable. Together, they help you build more capable agents faster, govern them with controls that scale, and improve them continuously.  ( 115 min )

  • Open

    Safeguard your agentic AI applications with the Amazon Bedrock Guardrails InvokeGuardrailChecks API
    Today, we’re announcing a new API with Amazon Bedrock Guardrails. With this API, you can apply individual safeguards, also referred to as safety checks, at any point in your agentic AI applications without creating guardrail resources. In this post, we walk through how the InvokeGuardrailChecks API works and how to use it to build safe, multi-turn agentic AI applications.  ( 115 min )
    Introducing container caching in Amazon SageMaker AI for faster model scaling
    Today, we’re excited to announce container image caching for Amazon SageMaker AI inference, the next major advancement in our faster scaling optimization journey. This speeds up end-to-end latency by up to 2x for generative AI models during scale-out events.  ( 111 min )
    Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI
    This post walks you through how to use P-EAGLE directly within Amazon SageMaker AI. It will demonstrate how to select a compatible model from the SageMaker JumpStart catalog, configure the parallel drafting specifications, and deploy a highly optimized real-time SageMaker AI endpoint to accelerate your generative AI applications.  ( 115 min )

  • Open

    Introducing Gemma 4 models on Amazon Bedrock
    Today, we are announcing the availability of the Gemma 4 family on Amazon Bedrock. Built by Google DeepMind and released under the Apache 2.0 license, Gemma 4 is a family of open-weight models designed with a focus on intelligence-per-parameter across a broad range of deployment scenarios. The family includes three instruction-tuned variants: Gemma 4 31B, Gemma 4 26B-A4B, and Gemma 4 E2B. These cover dense and mixture-of-experts (MoE) architectures, where only a fraction of the model’s parameters activate per request. The variants offer built-in reasoning, native function calling, and multimodal input across text and image.  ( 120 min )
    AI Agent Failure Detection and Root Cause Analysis with Strands Evals
    In this post, we walk you through calling the detector functions to diagnose real agent failures. You learn how to interpret their structured output: categorized failures with confidence scores, causal chains linking root causes to downstream symptoms, and fix recommendations specifying whether a change belongs in your system prompt or tool definitions. You also learn how to integrate detection into your evaluation pipeline for automated diagnosis on every test run.  ( 114 min )
    Build context-rich research agents with Deep Agents and Bedrock AgentCore
    In this post, you'll build a competitive research agent that demonstrates this pattern end to end. This walkthrough targets developers building multi-step AI workflows who need isolated execution environments for their agents. In Part 2 of the notebook, you can deploy this same agent to Bedrock AgentCore Runtime using the AgentCore CLI, so it runs as a managed, session-isolated service.  ( 113 min )
  • Open

    SRE Weekly Issue #521
    View on sreweekly.com A message from our sponsor, Bronto: Stuck with slow queries and scattered logs? What if you could easily retain all of your telemetry data in one place for a full year without sky-high bills? Now with Bronto, it’s possible. Connect the dots faster across TBs of always hot, full fidelity data. Try […]  ( 4 min )

  • Open

    Building Supercharger: How Rocket Close optimized title operations with agentic AI
    In this post, we explore how Rocket Close built a solution using Strands Agents, large language models (LLMs), Amazon Bedrock, Amazon Bedrock Knowledge Bases, and Model Context Protocol (MCP) tools. We cover solution features, the rationale for the technology stack, lessons learned, and the business impact at Rocket Close.  ( 113 min )
    Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers
    This post shows how to build a custom meeting prep and follow-up assistant using Amazon Quick and Cisco Webex MCP servers. From a single prompt, the agent finds an upcoming Webex meeting, reviews prior meeting summaries and transcripts, and pulls related Vidcast highlights and transcript context. It then searches Webex message threads for unresolved follow-ups and creates a concise prep brief. After the meeting, the same assistant can summarize the discussion and identify action items. It can also find related Vidcast updates and draft a follow-up message for the right Webex space.  ( 116 min )
    From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services
    This post outlines the development of a cost-effective and scalable intelligent document processing pipeline on AWS, powered by Amazon Bedrock and its features. BDA is a managed service within Amazon Bedrock that automates the extraction of insights from documents. We demonstrate how BDA extracts and analyzes document content, while Strands Agent hosted on Amazon Bedrock AgentCore Runtime coordinate specialized processing tasks, and Amazon Bedrock Knowledge Base enable contextual understanding across multiple documents. By combining these capabilities within a unified architecture, organizations can transform their document processing workflows with minimal development effort.  ( 115 min )
    Built from the inside out: How AWS Professional Services became a frontier team first
    AWS Professional Services (AWS ProServe) compressed engagement timelines from months to days, not by adding artificial intelligence (AI) tools to an existing process, but by fundamentally rebuilding how we deliver from the inside out. In this post, we share how AWS ProServe became a frontier team, the practices that enabled it, and what your engineering organization can take from our experience.  ( 111 min )

  • Open

    Extract Data with On-demand and Batch Pipelines Dynamically
    This post demonstrates an intelligent document processing pipeline that consists of both on-demand inference and batch inference options on Amazon Bedrock to enable the flexibility on the document processing time and cost.  ( 114 min )
    Evaluate AI agents systematically with Agent-EvalKit
    Agent-EvalKit is an open-source toolkit (Apache 2.0) that makes this evaluation infrastructure available by integrating with AI coding assistants, including Claude Code, Kiro CLI, and Kilo Code. This post walks through how Agent-EvalKit works across its six evaluation phases, using a travel research agent built with the Strands Agents SDK and Amazon Bedrock as a running example.  ( 115 min )
    Spot trends faster, sort smarter: Unlocking Sparklines and Custom Sort in Amazon Quick
    Today, we’re excited to announce two new capabilities that make Quick Sight dashboards even more expressive and business-aligned: sparklines and custom sort for controls. In this post, we walk through both features, what they are, when to use them, and how to configure them, with real-world scenarios that bring them together in a practical, decision-ready dashboard.  ( 118 min )
    Optimize blueprint extraction accuracy in Amazon Bedrock Data Automation
    Blueprint instruction optimization is a BDA feature that automatically refines your extraction instructions to address this challenge directly. You provide three to ten example documents with expected values, and BDA refines your blueprint instructions to improve accuracy in minutes, not weeks. No separate model fine-tuning is required. By the end of this post, you can optimize your blueprints to improve accuracy, run the optimization workflow through the Amazon Bedrock console or the API, and apply best practices for selecting examples and ground truth.  ( 116 min )
    How frontier teams are reinventing AI-native development
    Frontier teams are not just using AI to code faster. They’re redesigning how software gets built. The result is 4.5x productivity gains, in some cases more than 10x.  ( 112 min )

  • Open

    Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations
    Today, we’re announcing the Neuron Agentic Development capabilities: a collection of AI agents and skills that make this possible for developers building on AWS Trainium and AWS Inferentia. In this post, we explain how the Neuron Agentic Development capabilities accelerate the kernel development workflow.  ( 114 min )
    Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore
    In this post, you build an AI-powered equipment repair assistant using Amazon Bedrock AgentCore that helps farmers and field technicians diagnose equipment problems, identify required parts, and access manufacturer-approved repair procedures through natural language. The solution uses AgentCore Runtime with the Strands Agents SDK, Amazon Nova 2 Lite as the foundation model, Amazon Bedrock Knowledge Base for retrieval-augmented generation (RAG), and AgentCore Memory for conversation persistence.  ( 115 min )

  • Open

    Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
    In this post, we show how to train robot policies for the Unitree H1 humanoid with NVIDIA Isaac Lab on Amazon SageMaker AI across two compute options: Amazon SageMaker HyperPod and Amazon SageMaker Training Jobs.  ( 122 min )
    Hands-free first notice of loss: Using Strands Agents and Amazon Bedrock AgentCore Browser Tool for intelligent claims intake
    In this post, we demonstrate how a hands-free FNOL intake system combines agents built with the Strands Agents SDK for domain reasoning with Amazon Bedrock AgentCore Browser Tool for live portal interaction. This approach preserves human expertise while removing repetitive screen work.  ( 121 min )
    Build an agentic incident triage assistant with Amazon Quick and New Relic
    This post shows engineering teams how to apply that principle to one of the most time-sensitive workflows in engineering: incident triage. You will build a custom incident triage assistant agent using Amazon Quick that orchestrates a response with the New Relic Model Context Protocol (MCP) Server and Asana through native integrations. From a single prompt, the Amazon Quick agent investigates the incident, assembles a root cause analysis (RCA) brief with evidence links, and creates a tracked Asana task ready for handoff.  ( 113 min )

  • Open

    Unlocking AI flexibility in Europe: A guide to cross-region inference for EU data processing and model access
    With access to the latest generative AI models and high-performance accelerated compute in high global demand, AWS customers need tools to take advantage of model availability and capacity across multiple AWS Regions, while still meeting their security and privacy requirements. cross-Region Inference (CRIS) on Amazon Bedrock meets these needs by automatically routing requests across multiple […]  ( 114 min )
    It’s safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore
    Amazon Bedrock AgentCore Runtime gives each agent session its own isolated microVM with a persistent workspace, secure tool access through Gateway, and built-in observability—so you can run Claude Code, Codex, Kiro, and Cursor in parallel without sharing secrets, ports, or filesystems. Close the lid, go to dinner, and pick up where you left off tomorrow.  ( 122 min )
    Better decisions at scale: How mathematical optimization delivers where intuition fails
    In this post, we introduce mathematical optimization, explain how it fits within the broader AI landscape, and showcase real-world success stories where the Innovation Center has partnered with customers to deliver concrete results.  ( 111 min )
    End-to-end encrypted ML inference with Amazon SageMaker AI and FHE
    This blog has previously discussed FHE for ML inference in the post Enable fully homomorphic encryption with Amazon SageMaker endpoints for secure, real-time inferencing, but this post goes a little further. That previous post showed how to implement FHE-based inference 'from scratch' by hand-crafting a linear-regression algorithm using a low-level library called SEAL. Instead, this post shows a much more flexible and higher-level approach based on concrete-ml, a high-level library built specifically for FHE-based inference. It supports several common types of models 'out of the box' and is even API compatible with the well-known ML library scikit-learn.  ( 119 min )
    Amazon Quick ARNs: Cross-account migration and namespace permissions
    In this post, we cover the structure of Amazon Quick ARNs and provide a practical mental model for working with them. By the end, you can look at an ARN and immediately understand what it means for your migration strategy, diagnose permission issues faster, and design multi-tenant architectures with confidence.  ( 115 min )
    Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required
    In this post, we walk you through the Nova Sonic Test Harness, an open source framework that we built to solve both problems. It serves as a rapid iteration tool for tuning system prompts and tool configurations (run a conversation, see results, adjust, repeat) and as a comprehensive evaluation framework for validating voice agent quality at scale. It runs complete multi-turn conversations with Amazon Nova Sonic automatically, evaluates them using LLM-as-judge techniques, and can even detect cases where the model’s audio output doesn’t match its text output (audio hallucinations). No microphone required.  ( 114 min )
  • Open

    SRE Weekly Issue #520
    View on sreweekly.com A message from our sponsor, BigPanda: Your team solved this incident last month. Why is it back? Because you fixed the symptom, not the cause. BigPanda surfaces the pattern behind repeat incidents and tells you what to fix so the next on-call doesn’t fight the same P1. Prevent incidents proactively AI Agents […]  ( 4 min )

  • Open

    Miell’s Law and Token Budgets
    TL;DR Conway’s Law tells us that organisations create systems that mirror their communication systems. Jamie Dobson coin’s ‘Miell’s Law’ in a post about the work of our mutual friend (and his colleague) Ian Miell in his forthcoming book ‘Follow the Money‘: Organisations that design systems are constrained to produce systems that reflect the financial structures […]  ( 14 min )
    Miell’s Law and Token Budgets
    TL;DR Conway’s Law tells us that organisations create systems that mirror their communication systems. Jamie Dobson coin’s ‘Miell’s Law’ in a post about the work of our mutual friend (and his colleague) Ian Miell in his forthcoming book ‘Follow the Money‘: Organisations that design systems are constrained to produce systems that reflect the financial structures […]  ( 15 min )
    May 2026
    Pupdate It was Milo’s 5th birthday on the 12th, which meant a post about how he’s getting on. Sadly Max has also needed to visit the vets, with (we think) back ache, which might be the dreaded Intervertebral Disc Disease (IVDD). He’s been on reduced activity, so shorter walks, but thankfully seems pretty much back […]  ( 15 min )
    May 2026
    Pupdate It was Milo’s 5th birthday on the 12th, which meant a post about how he’s getting on. Sadly Max has also needed to visit the vets, with (we think) back ache, which might be the dreaded Intervertebral Disc Disease (IVDD). He’s been on reduced activity, so shorter walks, but thankfully seems pretty much back […]  ( 15 min )

  • Open

    NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart
    Deploy NVIDIA Nemotron 3 Ultra on Amazon SageMaker JumpStart. Get 5x faster inference and 30% lower cost for agentic AI workloads with this frontier reasoning model.  ( 109 min )

  • Open

    Automate model quota request and operational issue triage on Amazon Bedrock
    In this post, we introduce Amazon Bedrock Ops Alert, a three-layer automated monitoring solution that proactively detects operational issues, dynamically adjusts alarm thresholds, classifies alarms by category, automatically creates context-aware support cases, helps prevent duplicate cases when an unresolved case of the same alarm category is already active, and delivers contextualized notifications to AI SRE teams. We walk through the solution architecture and how you can deploy it in your own environment.  ( 120 min )
    How to build self-driving AI operations on Amazon Bedrock at scale
    In this post, we introduce Amazon Bedrock Ops Alert, a three-layer automated monitoring solution that proactively detects operational issues, dynamically adjusts alarm thresholds, classifies alarms by category, automatically creates context-aware support cases, helps prevent duplicate cases when an unresolved case of the same alarm category is already active, and delivers contextualized notifications to AI SRE teams. We walk through the solution architecture and how you can deploy it in your own environment.  ( 120 min )
    Fundamental’s Large Tabular Model NEXUS is now available on Amazon SageMaker JumpStart
    In this post, we show you how to get started with NEXUS on Amazon SageMaker JumpStart, walk through the deployment process, and demonstrate how to run predictions against your enterprise datasets.  ( 111 min )
    Reducing container cold start times using SOCI index on DLAMI and DLC
    In this post, we look at how to use SOCI on publicly available Deep Learning AMIs and Containers, when to use the various SOCI modes provided by the tool, and how to quickly and efficiently use this tool in your workloads today.  ( 112 min )
    Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
    In this post, you learn how to use Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) together to improve the tool-calling accuracy of a small language model (SLM). The example uses Amazon SageMaker AI training jobs, so you can focus on training code instead of managing your own training infrastructure. You also learn how to evaluate tool-calling accuracy and compare a base model to several fine-tuned variants, so you can make data-driven decisions about model quality.  ( 117 min )

  • Open

    The art and science of hyperparameter optimization on Amazon Nova Forge
    Fine-tuning for domain-specific tasks means improving performance in one area without degrading the model’s general capabilities, and getting that balance right is harder than it looks. This post walks through how to navigate that balance, from selecting the right customization strategy for your data and task, to configuring the training parameters that most influence outcomes, like learning rate, batch size, and checkpointing. We also cover the common mistakes that lead to wasted training runs and how to catch them early, so you can improve domain performance without degrading general capabilities or burning through compute on avoidable failures. By the end, you will know how to improve domain performance without degrading general capabilities and how to avoid the expensive failures that come from getting the balance wrong.  ( 122 min )
    Object detection with Amazon Nova 2 Lite
    In this post, we'll walk through implementing object detection with Amazon Nova 2 Lite. You'll learn how to deploy an object detection application using Amazon Bedrock, AWS Lambda, and Amazon API Gateway. You'll also learn how to craft effective prompts, process structured JSON output, and visualize results. We explore practical applications across manufacturing, agriculture, and logistics.  ( 112 min )
    How Baz improved its AI Agent Code Review accuracy using Amazon Bedrock AgentCore
    This post walks through how Baz built their Spec Review agent using Amazon Bedrock and Amazon Bedrock AgentCore. We'll cover the architecture decisions, implementation details, and the business outcomes they achieved by leveraging these AWS services to automate their code review process  ( 110 min )
    Building a secure auth code flow setup using AgentCore Gateway with MCP clients
    This post demonstrates how to implement Open Authorization (OAuth) Code flow as an inbound authorization mechanism for MCP servers hosted on Amazon Bedrock AgentCore Gateway. By the end of this guide, you will have a production-ready setup where each AI assistant request is authenticated with a valid user identity token issued from your organization’s identity provider.  ( 115 min )

  • Open

    Reference your own AWS Secrets Manager secrets in Amazon Bedrock AgentCore Identity
    Today, we’re excited to announce the ability to reference a secret in AWS Secrets Manager for AgentCore Identity, so you can reference your own preconfigured secret from Secrets Manager and retain full control over how it is managed. With this ability, you can extend your organization’s existing secrets governance processes to AgentCore. You can provide an existing, preconfigured AWS Secrets Manager secret to use with your credential provider resources. You retain full control over its encryption configuration, rotation, replication, tags, and resource policies, just as you would manage other secrets in Secrets Manager. You can also choose a secret from another AWS account within the same AWS Region, though cross-Region secret sharing isn’t supported. This also supports secrets brought in through AWS Secrets Manager external connectors, enabling integration with third-party secret managers.  ( 111 min )
    Transforming rare cancer research with Amazon Quick: Integrating biomedical databases for breakthrough discoveries
    In this post, we walk through how to use Amazon Quick Research to integrate biomedical data sources for rare cancer research. The walkthrough uses pediatric sarcoma as the research domain and draws on publicly available datasets from PubMed and other open biomedical repositories. It covers the end-to-end workflow: defining a research objective, configuring data sources, reviewing the AI-generated research plan, running the investigation, and iterating on results using the revision and versioning system.  ( 113 min )
    OpenAI models and Codex on Amazon Bedrock are now generally available
    GPT-5.5, GPT-5.4, and Codex are now generally available on Amazon Bedrock. Deploy them in production applications and agents today, on Bedrock’s high performance inference engine.  ( 109 min )
    Extending MCP support for Amazon Bedrock AgentCore Gateway
    While deploying Model Context Protocol (MCP) servers in production, enterprises need fine-grained access control across servers, observability into which teams use which tools, security guarantees against data exfiltration, and centralized credential management, all at scale. Amazon Bedrock AgentCore Gateway sits between MCP servers and the clients that consume them, centralizing credential management, observability, and secure […]  ( 116 min )
    Secure AI agents with Policy and Lambda interceptors in Amazon Bedrock AgentCore gateway
    In this post, we use a lakehouse data agent to demonstrate how you can use Policy for deterministic access control and Lambda interceptors for dynamic validation. We then show how to combine Lambda interceptors and Policy to implement a geography-based access control which requires both dynamic validation and deterministic access control.  ( 119 min )
    Enable safe agentic payments with built-in guardrails using Amazon Bedrock AgentCore payments
    In this post, we address several key risks that surface when designing an agentic payment system, and how to address them with the capabilities of AgentCore payments.  ( 114 min )
    AgentOps: Operationalize agentic AI at scale with Amazon Bedrock AgentCore
    When you build agentic AI solutions, you face unique operational challenges. Agents make unpredictable decisions, costs spiral unexpectedly, and debugging non-deterministic failures seems impossible. Agentic AI applications don't just execute predetermined workflows. They reason, adapt, and make autonomous decisions, and DevOps practices need to be adapted. That's where AgentOps comes in, the operational discipline for deploying, managing, and continuously improving AI agents in production.  ( 123 min )
    Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant
    If you’re iterating on deploying large language models (LLMs) on AWS GPU instances, you’ve probably noticed the larger the model to be loaded into GPU High Bandwidth Memory (HBM), the longer the painful wait until the GPUs are ready for inference. As models grow to hundreds of billions of parameters and GPU environments grow ever […]  ( 119 min )
    Amazon Quick integration with time-series databases for market intelligence using MCP
    In this post, we walk through a practical implementation using KDB-X MCP server integration with Amazon Quick, demonstrating how traders and analysts can ask questions using conversational language and receive actionable insights from datasets. You can apply this same integration pattern across various domains, from financial market analysis to IoT sensor monitoring to DevOps performance dashboards, where you need to simplify access to time series insights.  ( 115 min )
  • Open

    SRE Weekly Issue #519
    View on sreweekly.com A message from our sponsor, BigPanda: What if you could predict which changes will cause incidents? BigPanda analyzes every change, including ones marked safe, to surface the real risk and impact before deployment. Next time, routine changes don’t become your next P1. See BigPanda for SREs The Problem with AI-Generated Post-Incident Reviews […]  ( 4 min )

  • Open

    Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality
    This post demonstrates a comprehensive observability solution using Amazon Managed Grafana dashboards that provides a holistic view of both quality and quantity for LLMs served on Amazon SageMaker AI endpoints with inference components.  ( 113 min )

  • Open

    Training Azerbaijani language models on Amazon SageMaker AI
    Azercell Telecom LLC, Azerbaijan's leading telecommunications provider, wanted to build an Azerbaijani large language model (LLM) on Amazon SageMaker AI for telecom use cases and a customer-facing chatbot. The challenge: adapting foundation models (FMs) to a morphologically rich language with limited training data and no existing blueprint for efficient LLM training in Azerbaijani. In a six-week collaboration, Azercell worked with the AWS Generative AI Innovation Center to establish a production-ready framework on Amazon SageMaker AI.  ( 115 min )
    Build a custom portal with embedded Amazon SageMaker AI MLflow Apps
    In this post, you learn how to build a custom portal with embedded SageMaker AI MLflow Apps UI. You walk through the architecture pattern behind a React front end paired with a Flask reverse proxy that handles AWS Signature Version 4 (SigV4) authentication, deploy the entire stack through the AWS Cloud Development Kit (AWS CDK), validate the deployment, and review security considerations and cleanup procedures.  ( 114 min )
    Streamline external access to Amazon SageMaker MLflow using a REST API proxy
    In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation who want to preserve their existing ML workflows while adopting cloud-native services.  ( 112 min )
    Evaluating Deep Agents using LangSmith on AWS
    This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.  ( 119 min )
    Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore
    Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need a fixed benchmark alongside your changing real-world traffic. Managing test cases for evaluation baselines as a dataset in Amazon Bedrock AgentCore brings the discipline of versioned test fixtures […]  ( 116 min )
    Claude Opus 4.8 is now available on AWS
    This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.  ( 109 min )
    Automate AML alert triage with Amazon Quick and Snowflake Cortex AI
    This post demonstrates that integration in action by automating one of the most labor-intensive workflows in financial services: anti-money laundering (AML) alert triage. You will build a triage workflow using Amazon Quick Flows and Snowflake Cortex, connected through the Amazon Quick Model Context Protocol (MCP) integration. In our testing environment, automated workflows built using Amazon Quick reduced alert investigation time from 30-90 minutes to under 5 minutes. Actual results may vary based on alert complexity and data volume.  ( 119 min )

  • Open

    Process financial documents using Amazon Bedrock Data Automation
    In this post, we explore how Amazon Bedrock Data Automation can accurately extract information from four common types of financial documents: bank statements, W-2 forms, 1099-B tax forms, and vendor contracts. We highlight the complexity in the documents, detail the custom extraction created in Amazon Bedrock Data Automation, and describe the outcomes of the extraction process.  ( 112 min )
    Building AI agents for business support using Amazon Bedrock AgentCore
    In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore. We discuss the challenges encountered and the solutions that reduced costs by up to 97% while improving operational efficiency.  ( 111 min )
    From data overload to actionable insights: How Verizon Connect scaled agentic AI to 100,000 users
    In this post, we show you how Verizon Connect built and scaled an agentic AI solution to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily. We walk you through the architectural decisions, implementation challenges, and measurable results that can guide your own data-to-insights transformation.  ( 114 min )
    How AWS SMGS uses an AI-powered conversational assistant to transform business management with Amazon Bedrock AgentCore
    In this post, we share how we built NarrateAI using Amazon Bedrock AgentCore to deliver business intelligence at scale for the AWS SMGS (Sales, Marketing and Global Services) organization. You will learn about: the two-layer architecture that separates batch processing from real-time interaction, the specialized AI agents that power intelligent routing and validation, key engineering patterns for production deployment, and how to build similar solutions with AWS services.  ( 115 min )
    Powering agentic AI sales strategy with Amazon Bedrock AgentCore
    As agent adoption scaled, we saw a common pattern emerge across enterprises, including our own sales organization: specialized agents deliver value, but without orchestration, users carry the cognitive load of choosing between them. At AWS Sales, this meant more than 20 domain-specific agents deployed across the global organization, with representatives context-switching between systems instead of […]  ( 118 min )

  • Open

    Technical deep dive: AgentCore payments and innovation in agentic commerce
    Amazon Bedrock AgentCore payments is now available in preview, it provides instant payments to paid external services with no manual billing setup per provider, stablecoin support for cost-effective microtransactions that make sub-cent transactions economically viable, and configurable spending guardrails that give you fine-grained control over agent budgets and transaction limits. In this post, we walk you through a technical deep dive of AgentCore payments.  ( 118 min )
    Build highly scalable serverless LangGraph multi-agent systems in AWS with Amazon Bedrock AgentCore
    In this post, we provide a solution to build highly scalable, serverless multi-agent generative AI systems on AWS using LangGraph Agents as orchestrators integrated with Amazon Bedrock AgentCore Memory and Amazon Bedrock AgentCore Observability.  ( 111 min )
    Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore
    In this post you'll learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integrated architecture that combines NVIDIA NIM for GPU-accelerated inference. Amazon Bedrock AgentCore provides managed runtime, shared memory and built-in observability and Strands Agents provide serverless multi-agent orchestration. This approach supports performance, scalability, and operational insight in production environments. While the example focuses on marketing content review, the same pattern applies to digital assistants, review automation, and retrieval-augmented generation pipelines.  ( 111 min )
    AgentWatch: Proactive AWS monitoring with ambient agents
    In this post, we demonstrate the capabilities of AgentWatch through practical implementation. You will see how the solution performs infrastructure checks every 15 minutes, summarizing CloudWatch metrics, logs, and alarms across multiple AWS accounts. The agent delivers actionable reports directly to Slack and responds to natural language queries about your infrastructure state. Throughout, we explore three human-in-the-loop patterns that maintain appropriate oversight while maximizing automation.  ( 115 min )
    From idea to AI app: Creating intelligent research assistants with Strands
    Building an AI app shouldn’t require a PhD in machine learning (ML) or months of wrestling with complex architectures. Yet that’s exactly what happens when you try to orchestrate multiple API calls, manage conversation state, and create agents that can reason on their own. I’ve seen straightforward AI ideas balloon into sprawling projects that demand […]  ( 114 min )
    Build an enterprise observability solution for Amazon Quick
    When hundreds to thousands of users are onboarded to an enterprise AI platform, business leaders and platform owners need visibility into who is using the platform, whether users are satisfied with the answers they receive, and which capabilities are driving the most engagement. Without a centralized observability solution, this data is scattered across multiple AWS […]  ( 111 min )
    Transforming professional work: How Amazon Quick turns document creation from hours into minutes
    In this post, we explore how the Amazon Quick document and visualization creation capabilities work, what you can build with them, and how professionals across roles are using them to reclaim hours of their workweek. From technical execution to strategic judgment Most professional roles carry an unspoken assumption that a significant portion of your time […]  ( 113 min )
2026-06-23T17:50:44.999Z osmosfeed 1.15.1