• Open

    Securing Amazon Bedrock cross-Region inference: Geographic and global
    In this post, we explore the security considerations and best practices for implementing Amazon Bedrock cross-Region inference profiles. Whether you're building a generative AI application or need to meet specific regional compliance requirements, this guide will help you understand the secure architecture of Amazon Bedrock CRIS and how to properly configure your implementation.  ( 116 min )

  • Open

    How Omada Health scaled patient care by fine-tuning Llama models on Amazon SageMaker AI
    This post is co-written with Sunaina Kavi, AI/ML Product Manager at Omada Health. Omada Health, a longtime innovator in virtual healthcare delivery, launched a new nutrition experience in 2025, featuring OmadaSpark, an AI agent trained with robust clinical input that delivers real-time motivational interviewing and nutrition education. It was built on AWS. OmadaSpark was designed […]  ( 110 min )
  • Open

    SRE Weekly Issue #505
    View on sreweekly.com A message from our sponsor, Hopp: Paging at 2am? 🚨 Make incident triage feel like you’re at the same keyboard with Hopp. crisp, readable screen-sharing no more “can you zoom in?” click + type together bring the incident bridge into one session Start pair programming: https://www.gethopp.app/?via=sreweekly 2013–09–17 Outage Postmortem An incident write-up […]  ( 4 min )
  • Open

    生活的意义
    AI 不能取代我们寻找生活的意义 世人皆爱捷径。AI 出世,人人惊呼:三十秒一幅画,五分钟一篇文,代码也能一键生成。好似天上掉馅饼,省了功夫,省了心思。可若真把人生都交给捷径,岂不成了“买纪念品”的游客?走马观花,拍几张照片,回家翻看,心里空空。 据说南宋诗人杨万里一生写过两万多首诗,流传至今的也有4200多首,而我们真正耳熟能详的,只有几句: “小荷才露尖尖角,早有蜻蜓立上头”,“接天莲叶无穷碧,映日荷花别样红”。可是据说他可是个开宗立派的人物,据说他的诗叫“诚斋体”,不雕不饰,信手拈来,却有真趣。 人生亦然。写字、画画、做事,若只求结果,便失了过程里浸泡的滋味。正如煮茶,水滚了才下叶,慢火细煎,香气才会氤氲。若一键冲泡,虽也能喝,却少了那份“等候”的闲情。 AI 的本事,是快。快到让人心慌。可人心的价值,偏在慢。写一首诗,推敲字句,忽而灵光闪现,那一刻的喜悦,岂是机器能替?画一幅画,笔走龙蛇,墨香四溢,心手相应,那份畅快,岂是算法能给? 过程是磨刀石。它让人耐心,让人专注,让人懂得失败的滋味,也懂得成功的来之不易。若只要结果,便失了锻炼心性的机会。正如登山,若坐缆车直达顶峰,风景虽在眼前,却少了汗水与心跳的记忆。真正的风景,往往在半山腰怨自己“没苦硬吃”。 意义也在过程里。人类工作,不止为温饱。归属感、自尊心、自我实现,寻求意义皆在其中。正如我现在攒这篇闲谈,不只是为了发布,更是为了与人心相通。 所以说,AI 能给结果,却给不了故事。故事要人来讲,过程要人来走。捷径固然诱人,但若人生全是捷径,便成了空壳。真正的价值,在于一步一脚印,在于慢火细煎,在于那份“诚斋”的真趣。结果是果,过程是花。花开花谢,才有四季。  ( 1 min )

  • Open

    Crossmodal search with Amazon Nova Multimodal Embeddings
    In this post, we explore how Amazon Nova Multimodal Embeddings addresses the challenges of crossmodal search through a practical ecommerce use case. We examine the technical limitations of traditional approaches and demonstrate how Amazon Nova Multimodal Embeddings enables retrieval across text, images, and other modalities. You learn how to implement a crossmodal search system by generating embeddings, handling queries, and measuring performance. We provide working code examples and share how to add these capabilities to your applications.  ( 113 min )

  • Open

    Accelerating LLM inference with post-training weight and activation using AWQ and GPTQ on Amazon SageMaker AI
    Quantized models can be seamlessly deployed on Amazon SageMaker AI using a few lines of code. In this post, we explore why quantization matters—how it enables lower-cost inference, supports deployment on resource-constrained hardware, and reduces both the financial and environmental impact of modern LLMs, while preserving most of their original performance. We also take a deep dive into the principles behind PTQ and demonstrate how to quantize the model of your choice and deploy it on Amazon SageMaker.  ( 125 min )
    How Beekeeper optimized user personalization with Amazon Bedrock
    Beekeeper’s automated leaderboard approach and human feedback loop system for dynamic LLM and prompt pair selection addresses the key challenges organizations face in navigating the rapidly evolving landscape of language models.  ( 114 min )
    Sentiment Analysis with Text and Audio Using AWS Generative AI Services: Approaches, Challenges, and Solutions
    This post, developed through a strategic scientific partnership between AWS and the Instituto de Ciência e Tecnologia Itaú (ICTi), P&D hub maintained by Itaú Unibanco, the largest private bank in Latin America, explores the technical aspects of sentiment analysis for both text and audio. We present experiments comparing multiple machine learning (ML) models and services, discuss the trade-offs and pitfalls of each approach, and highlight how AWS services can be orchestrated to build robust, end-to-end solutions. We also offer insights into potential future directions, including more advanced prompt engineering for large language models (LLMs) and expanding the scope of audio-based analysis to capture emotional cues that text data alone might miss.  ( 114 min )
    Architecting TrueLook’s AI-powered construction safety system on Amazon SageMaker AI
    This post provides a detailed architectural overview of how TrueLook built its AI-powered safety monitoring system using SageMaker AI, highlighting key technical decisions, pipeline design patterns, and MLOps best practices. You will gain valuable insights into designing scalable computer vision solutions on AWS, particularly around model training workflows, automated pipeline creation, and production deployment strategies for real-time inference.  ( 114 min )

  • Open

    Scaling medical content review at Flo Health using Amazon Bedrock (Part 1)
    This two-part series explores Flo Health's journey with generative AI for medical content verification. Part 1 examines our proof of concept (PoC), including the initial solution, capabilities, and early results. Part 2 covers focusing on scaling challenges and real-world implementation. Each article stands alone while collectively showing how AI transforms medical content management at scale.  ( 114 min )
    Detect and redact personally identifiable information using Amazon Bedrock Data Automation and Guardrails
    This post shows an automated PII detection and redaction solution using Amazon Bedrock Data Automation and Amazon Bedrock Guardrails through a use case of processing text and image content in high volumes of incoming emails and attachments. The solution features a complete email processing workflow with a React-based user interface for authorized personnel to more securely manage and review redacted email communications and attachments. We walk through the step-by-step solution implementation procedures used to deploy this solution. Finally, we discuss the solution benefits, including operational efficiency, scalability, security and compliance, and adaptability.  ( 116 min )
    Speed meets scale: Load testing SageMakerAI endpoints with Observe.AI’s testing tool
    Observe.ai developed the One Load Audit Framework (OLAF), which integrates with SageMaker to identify bottlenecks and performance issues in ML services, offering latency and throughput measurements under both static and dynamic data loads. In this blog post, you will learn how to use the OLAF utility to test and validate your SageMaker endpoint.  ( 113 min )

  • Open

    Milo cancer diary part 22 – remission again
    CHOP #4 has worked, and Milo’s scan today shows that he’s in remission again (before even getting his Epirubicin). This cycle of chemo seemed to go better than previous protocols, until we got to the planned Epirubicin last week, and his neutrophils were too low. So we were back at North Downs Specialist Referals (NDSR) […]  ( 14 min )
  • Open

    SRE Weekly Issue #504
    View on sreweekly.com Finding the grain of sand in a heap of Salt Salt is Cloudflare’s configuration management tool. How do you find the root cause of a configuration management failure when you have a peak of hundreds of changes in 15 minutes on thousands of servers? The result of this has been a reduction […]  ( 4 min )

  • Open

    December 2025
    Pupdate It’s been quite dry over the Christmas break, which has encouraged some longer than usual walks that the boys have enjoyed. After a scan at the start of the month Milo has now almost completed the first cycle of his 4th modified ‘CHOP’ chemotherapy protocol. As before, low neutrophils mean we’re a little behind […]  ( 16 min )
    December 2025
    Pupdate It’s been quite dry over the Christmas break, which has encouraged some longer than usual walks that the boys have enjoyed. After a scan at the start of the month Milo has now almost completed the first cycle of his 4th modified ‘CHOP’ chemotherapy protocol. As before, low neutrophils mean we’re a little behind […]  ( 16 min )

  • Open

    Migrate MLflow tracking servers to Amazon SageMaker AI with serverless MLflow
    This post shows you how to migrate your self-managed MLflow tracking server to a MLflow App – a serverless tracking server on SageMaker AI that automatically scales resources based on demand while removing server patching and storage management tasks at no cost. Learn how to use the MLflow Export Import tool to transfer your experiments, runs, models, and other MLflow resources, including instructions to validate your migration's success.  ( 111 min )
    Build an AI-powered website assistant with Amazon Bedrock
    This post demonstrates how to solve this challenge by building an AI-powered website assistant using Amazon Bedrock and Amazon Bedrock Knowledge Bases.  ( 110 min )
  • Open

    Silent PC GPU upgrade
    TL;DR Nvidia have ended Linux support for my ‘Pascal’ GTX 1050 Ti GPU. I’ve been able to fit an RTX 5050 card in its place, though the process was problematic due to driver issues. And I’m still concerned that it can only be limited to 110W when my passive cooling is rated up to 75W. […]  ( 16 min )
    Silent PC GPU upgrade
    TL;DR Nvidia have ended Linux support for my ‘Pascal’ GTX 1050 Ti GPU. I’ve been able to fit an RTX 5050 card in its place, though the process was problematic due to driver issues. And I’m still concerned that it can only be limited to 110W when my passive cooling is rated up to 75W. […]  ( 16 min )
  • Open

    SRE Weekly Issue #503
    View on sreweekly.com The Abstraction Debt in Infrastructure as Code Abstraction is meant to encapsulate complexity, but when done poorly, it creates opacity—a lack of visibility into what’s actually happening under the hood.   RoseSecurity Fun with incident data and statistical process control This article uses publicly available incident data and an open source tool to […]  ( 4 min )

  • Open

    Programmatically creating an IDP solution with Amazon Bedrock Data Automation
    In this post, we explore how to programmatically create an IDP solution that uses Strands SDK, Amazon Bedrock AgentCore, Amazon Bedrock Knowledge Base, and Bedrock Data Automation (BDA). This solution is provided through a Jupyter notebook that enables users to upload multi-modal business documents and extract insights using BDA as a parser to retrieve relevant chunks and augment a prompt to a foundational model (FM).  ( 108 min )
    AI agent-driven browser automation for enterprise workflow management
    Enterprise organizations increasingly rely on web-based applications for critical business processes, yet many workflows remain manually intensive, creating operational inefficiencies and compliance risks. Despite significant technology investments, knowledge workers routinely navigate between eight to twelve different web applications during standard workflows, constantly switching contexts and manually transferring information between systems. Data entry and validation tasks […]  ( 109 min )
    Agentic QA automation using Amazon Bedrock AgentCore Browser and Amazon Nova Act
    In this post, we explore how agentic QA automation addresses these challenges and walk through a practical example using Amazon Bedrock AgentCore Browser and Amazon Nova Act to automate testing for a sample retail application.  ( 109 min )
    Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer
    In this post, we demonstrate how to optimize large language model (LLM) inference on Amazon SageMaker AI using BentoML's LLM-Optimizer to systematically identify the best serving configurations for your workload.  ( 116 min )

  • Open

    Exploring the zero operator access design of Mantle
    In this post, we explore how Mantle, Amazon's next-generation inference engine for Amazon Bedrock, implements a zero operator access (ZOA) design that eliminates any technical means for AWS operators to access customer data.  ( 107 min )
    AWS AI League: Model customization and agentic showdown
    In this post, we explore the new AWS AI League challenges and how they are transforming how organizations approach AI development. The grand finale at AWS re:Invent 2025 was an exciting showcase of their ingenuity and skills.  ( 109 min )
    Accelerate Enterprise AI Development using Weights & Biases and Amazon Bedrock AgentCore
    In this post, we demonstrate how to use Foundation Models (FMs) from Amazon Bedrock and the newly launched Amazon Bedrock AgentCore alongside W&B Weave to help build, evaluate, and monitor enterprise AI solutions. We cover the complete development lifecycle from tracking individual FM calls to monitoring complex agent workflows in production.  ( 111 min )
    How dLocal automated compliance reviews using Amazon Quick Automate
    In this post, we share how dLocal worked closely with the AWS team to help shape the product roadmap, reinforce its role as an industry innovator, and set new benchmarks for operational excellence in the global fintech landscape.  ( 110 min )
    Advancing ADHD diagnosis: How Qbtech built a mobile AI assessment Model Using Amazon SageMaker AI
    In this post, we explore how Qbtech streamlined their machine learning (ML) workflow using Amazon SageMaker AI, a fully managed service to build, train and deploy ML models, and AWS Glue, a serverless service that makes data integration simpler, faster, and more cost effective. This new solution reduced their feature engineering time from weeks to hours, while maintaining the high clinical standards required by healthcare providers.  ( 112 min )
    Accelerating your marketing ideation with generative AI – Part 1: From idea to generation with the Amazon Nova foundation models
    In this post, the first of a series of three, we focus on how you can use Amazon Nova to streamline, simplify, and accelerate marketing campaign creation through generative AI. We show how Bancolombia, one of Colombia’s largest banks, is experimenting with the Amazon Nova models to generate visuals for their marketing campaigns.  ( 115 min )
    Introducing Visa Intelligent Commerce on AWS: Enabling agentic commerce with Amazon Bedrock AgentCore
    In this post, we explore how AWS and Visa are partnering to enable agentic commerce through Visa Intelligent Commerce using Amazon Bedrock AgentCore. We demonstrate how autonomous AI agents can transform fragmented shopping and travel experiences into seamless, end-to-end workflows—from discovery and comparison to secure payment authorization—all driven by natural language.  ( 115 min )

  • Open

    Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock
    This post explores Chain-of-Draft (CoD), an innovative prompting technique introduced in a Zoom AI Research paper Chain of Draft: Thinking Faster by Writing Less, that revolutionizes how models approach reasoning tasks. While Chain-of-Thought (CoT) prompting has been the go-to method for enhancing model reasoning, CoD offers a more efficient alternative that mirrors human problem-solving patterns—using concise, high-signal thinking steps rather than verbose explanations.  ( 114 min )
    Deploy Mistral AI’s Voxtral on Amazon SageMaker AI
    In this post, we demonstrate hosting Voxtral models on Amazon SageMaker AI endpoints using vLLM and the Bring Your Own Container (BYOC) approach. vLLM is a high-performance library for serving large language models (LLMs) that features paged attention for improved memory management and tensor parallelism for distributing models across multiple GPUs.  ( 112 min )
    Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator
    To address the need for businesses to quickly analyze information and unlock actionable insights, we are announcing Analytics Agent, a new feature that is seamlessly integrated into the GenAI IDP Accelerator. With this feature, users can perform advanced searches and complex analyses using natural language queries without SQL or data analysis expertise. In this post, we discuss how non-technical users can use this tool to analyze and understand the documents they have processed at scale with natural language.  ( 113 min )
    Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock
    In this post, we demonstrate how to implement a predictive maintenance solution using Foundation Models (FMs) on Amazon Bedrock, with a case study of Amazon's manufacturing equipment within their fulfillment centers. The solution is highly adaptable and can be customized for other industries, including oil and gas, logistics, manufacturing, and healthcare.  ( 121 min )
  • Open

    SRE Weekly Issue #502
    View on sreweekly.com Eliminating Cold Starts 2: shard and conquer Cloudflare reduced their cold-start rate for Workers requests through sharding and consistent hashing, with an interesting solution for load shedding.   Harris Hancock — Cloudflare Monitoring & Observability: Using Logs, Metrics, Traces, and Alerts to Understand System Failures I appreciate the way this article also shares […]  ( 4 min )

  • Open

    Introducing SOCI indexing for Amazon SageMaker Studio: Faster container startup times for AI/ML workloads
    Today, we are excited to introduce a new feature for SageMaker Studio: SOCI (Seekable Open Container Initiative) indexing. SOCI supports lazy loading of container images, where only the necessary parts of an image are downloaded initially rather than the entire container.  ( 112 min )

  • Open

    Build and deploy scalable AI agents with NVIDIA NeMo, Amazon Bedrock AgentCore, and Strands Agents
    This post demonstrates how to use the powerful combination of Strands Agents, Amazon Bedrock AgentCore, and NVIDIA NeMo Agent Toolkit to build, evaluate, optimize, and deploy AI agents on Amazon Web Services (AWS) from initial development through production deployment.  ( 117 min )
    Bi-directional streaming for real-time agent interactions now available in Amazon Bedrock AgentCore Runtime
    In this post, you will learn about bi-directional streaming on AgentCore Runtime and the prerequisites to create a WebSocket implementation. You will also learn how to use Strands Agents to implement a bi-directional streaming solution for voice agents.  ( 110 min )

  • Open

    Tracking and managing assets used in AI development with Amazon SageMaker AI
    In this post, we'll explore the new capabilities and core concepts that help organizations track and manage models development and deployment lifecycles. We will show you how the features are configured to train models with automatic end-to-end lineage, from dataset upload and versioning to model fine-tuning, evaluation, and seamless endpoint deployment.  ( 108 min )
    Track machine learning experiments with MLflow on Amazon SageMaker using Snowflake integration
    In this post, we demonstrate how to integrate Amazon SageMaker managed MLflow as a central repository to log these experiments and provide a unified system for monitoring their progress.  ( 108 min )

  • Open

    Governance by design: The essential guide for successful AI scaling
    Picture this: Your enterprise has just deployed its first generative AI application. The initial results are promising, but as you plan to scale across departments, critical questions emerge. How will you enforce consistent security, prevent model bias, and maintain control as AI applications multiply?  ( 109 min )
    How Tata Power CoE built a scalable AI-powered solar panel inspection solution with Amazon SageMaker AI and Amazon Bedrock
    In this post, we explore how Tata Power CoE and Oneture Technologies use AWS services to automate the inspection process end-to-end.  ( 112 min )
    Unlocking video understanding with TwelveLabs Marengo on Amazon Bedrock
    In this post, we'll show how the TwelveLabs Marengo embedding model, available on Amazon Bedrock, enhances video understanding through multimodal AI. We'll build a video semantic search and analysis solution using embeddings from the Marengo model with Amazon OpenSearch Serverless as the vector database, for semantic search capabilities that go beyond simple metadata matching to deliver intelligent content discovery.  ( 111 min )
2026-01-14T18:30:45.359Z osmosfeed 1.15.1