• Open

    Programmatically creating an IDP solution with Amazon Bedrock Data Automation
    In this post, we explore how to programmatically create an IDP solution that uses Strands SDK, Amazon Bedrock AgentCore, Amazon Bedrock Knowledge Base, and Bedrock Data Automation (BDA). This solution is provided through a Jupyter notebook that enables users to upload multi-modal business documents and extract insights using BDA as a parser to retrieve relevant chunks and augment a prompt to a foundational model (FM).  ( 108 min )
    AI agent-driven browser automation for enterprise workflow management
    Enterprise organizations increasingly rely on web-based applications for critical business processes, yet many workflows remain manually intensive, creating operational inefficiencies and compliance risks. Despite significant technology investments, knowledge workers routinely navigate between eight to twelve different web applications during standard workflows, constantly switching contexts and manually transferring information between systems. Data entry and validation tasks […]  ( 109 min )
    Agentic QA automation using Amazon Bedrock AgentCore Browser and Amazon Nova Act
    In this post, we explore how agentic QA automation addresses these challenges and walk through a practical example using Amazon Bedrock AgentCore Browser and Amazon Nova Act to automate testing for a sample retail application.  ( 109 min )
    Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer
    In this post, we demonstrate how to optimize large language model (LLM) inference on Amazon SageMaker AI using BentoML's LLM-Optimizer to systematically identify the best serving configurations for your workload.  ( 116 min )

  • Open

    Exploring the zero operator access design of Mantle
    In this post, we explore how Mantle, Amazon's next-generation inference engine for Amazon Bedrock, implements a zero operator access (ZOA) design that eliminates any technical means for AWS operators to access customer data.  ( 107 min )
    AWS AI League: Model customization and agentic showdown
    In this post, we explore the new AWS AI League challenges and how they are transforming how organizations approach AI development. The grand finale at AWS re:Invent 2025 was an exciting showcase of their ingenuity and skills.  ( 109 min )
    Accelerate Enterprise AI Development using Weights & Biases and Amazon Bedrock AgentCore
    In this post, we demonstrate how to use Foundation Models (FMs) from Amazon Bedrock and the newly launched Amazon Bedrock AgentCore alongside W&B Weave to help build, evaluate, and monitor enterprise AI solutions. We cover the complete development lifecycle from tracking individual FM calls to monitoring complex agent workflows in production.  ( 111 min )
    How dLocal automated compliance reviews using Amazon Quick Automate
    In this post, we share how dLocal worked closely with the AWS team to help shape the product roadmap, reinforce its role as an industry innovator, and set new benchmarks for operational excellence in the global fintech landscape.  ( 110 min )
    Advancing ADHD diagnosis: How Qbtech built a mobile AI assessment Model Using Amazon SageMaker AI
    In this post, we explore how Qbtech streamlined their machine learning (ML) workflow using Amazon SageMaker AI, a fully managed service to build, train and deploy ML models, and AWS Glue, a serverless service that makes data integration simpler, faster, and more cost effective. This new solution reduced their feature engineering time from weeks to hours, while maintaining the high clinical standards required by healthcare providers.  ( 112 min )
    Accelerating your marketing ideation with generative AI – Part 1: From idea to generation with the Amazon Nova foundation models
    In this post, the first of a series of three, we focus on how you can use Amazon Nova to streamline, simplify, and accelerate marketing campaign creation through generative AI. We show how Bancolombia, one of Colombia’s largest banks, is experimenting with the Amazon Nova models to generate visuals for their marketing campaigns.  ( 115 min )
    Introducing Visa Intelligent Commerce on AWS: Enabling agentic commerce with Amazon Bedrock AgentCore
    In this post, we explore how AWS and Visa are partnering to enable agentic commerce through Visa Intelligent Commerce using Amazon Bedrock AgentCore. We demonstrate how autonomous AI agents can transform fragmented shopping and travel experiences into seamless, end-to-end workflows—from discovery and comparison to secure payment authorization—all driven by natural language.  ( 115 min )

  • Open

    Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock
    This post explores Chain-of-Draft (CoD), an innovative prompting technique introduced in a Zoom AI Research paper Chain of Draft: Thinking Faster by Writing Less, that revolutionizes how models approach reasoning tasks. While Chain-of-Thought (CoT) prompting has been the go-to method for enhancing model reasoning, CoD offers a more efficient alternative that mirrors human problem-solving patterns—using concise, high-signal thinking steps rather than verbose explanations.  ( 114 min )
    Deploy Mistral AI’s Voxtral on Amazon SageMaker AI
    In this post, we demonstrate hosting Voxtral models on Amazon SageMaker AI endpoints using vLLM and the Bring Your Own Container (BYOC) approach. vLLM is a high-performance library for serving large language models (LLMs) that features paged attention for improved memory management and tensor parallelism for distributing models across multiple GPUs.  ( 112 min )
    Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator
    To address the need for businesses to quickly analyze information and unlock actionable insights, we are announcing Analytics Agent, a new feature that is seamlessly integrated into the GenAI IDP Accelerator. With this feature, users can perform advanced searches and complex analyses using natural language queries without SQL or data analysis expertise. In this post, we discuss how non-technical users can use this tool to analyze and understand the documents they have processed at scale with natural language.  ( 113 min )
    Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock
    In this post, we demonstrate how to implement a predictive maintenance solution using Foundation Models (FMs) on Amazon Bedrock, with a case study of Amazon's manufacturing equipment within their fulfillment centers. The solution is highly adaptable and can be customized for other industries, including oil and gas, logistics, manufacturing, and healthcare.  ( 121 min )
  • Open

    SRE Weekly Issue #502
    View on sreweekly.com Eliminating Cold Starts 2: shard and conquer Cloudflare reduced their cold-start rate for Workers requests through sharding and consistent hashing, with an interesting solution for load shedding.   Harris Hancock — Cloudflare Monitoring & Observability: Using Logs, Metrics, Traces, and Alerts to Understand System Failures I appreciate the way this article also shares […]  ( 4 min )

  • Open

    Introducing SOCI indexing for Amazon SageMaker Studio: Faster container startup times for AI/ML workloads
    Today, we are excited to introduce a new feature for SageMaker Studio: SOCI (Seekable Open Container Initiative) indexing. SOCI supports lazy loading of container images, where only the necessary parts of an image are downloaded initially rather than the entire container.  ( 112 min )

  • Open

    Build and deploy scalable AI agents with NVIDIA NeMo, Amazon Bedrock AgentCore, and Strands Agents
    This post demonstrates how to use the powerful combination of Strands Agents, Amazon Bedrock AgentCore, and NVIDIA NeMo Agent Toolkit to build, evaluate, optimize, and deploy AI agents on Amazon Web Services (AWS) from initial development through production deployment.  ( 117 min )
    Bi-directional streaming for real-time agent interactions now available in Amazon Bedrock AgentCore Runtime
    In this post, you will learn about bi-directional streaming on AgentCore Runtime and the prerequisites to create a WebSocket implementation. You will also learn how to use Strands Agents to implement a bi-directional streaming solution for voice agents.  ( 110 min )

  • Open

    Tracking and managing assets used in AI development with Amazon SageMaker AI
    In this post, we'll explore the new capabilities and core concepts that help organizations track and manage models development and deployment lifecycles. We will show you how the features are configured to train models with automatic end-to-end lineage, from dataset upload and versioning to model fine-tuning, evaluation, and seamless endpoint deployment.  ( 108 min )
    Track machine learning experiments with MLflow on Amazon SageMaker using Snowflake integration
    In this post, we demonstrate how to integrate Amazon SageMaker managed MLflow as a central repository to log these experiments and provide a unified system for monitoring their progress.  ( 108 min )

  • Open

    Governance by design: The essential guide for successful AI scaling
    Picture this: Your enterprise has just deployed its first generative AI application. The initial results are promising, but as you plan to scale across departments, critical questions emerge. How will you enforce consistent security, prevent model bias, and maintain control as AI applications multiply?  ( 109 min )
    How Tata Power CoE built a scalable AI-powered solar panel inspection solution with Amazon SageMaker AI and Amazon Bedrock
    In this post, we explore how Tata Power CoE and Oneture Technologies use AWS services to automate the inspection process end-to-end.  ( 112 min )
    Unlocking video understanding with TwelveLabs Marengo on Amazon Bedrock
    In this post, we'll show how the TwelveLabs Marengo embedding model, available on Amazon Bedrock, enhances video understanding through multimodal AI. We'll build a video semantic search and analysis solution using embeddings from the Marengo model with Amazon OpenSearch Serverless as the vector database, for semantic search capabilities that go beyond simple metadata matching to deliver intelligent content discovery.  ( 111 min )

  • Open

    Checkpointless training on Amazon SageMaker HyperPod: Production-scale training with faster fault recovery
    In this post, we introduce checkpointless training on Amazon SageMaker HyperPod, a paradigm shift in model training that reduces the need for traditional checkpointing by enabling peer-to-peer state recovery. Results from production-scale validation show 80–93% reduction in recovery time (from 15–30 minutes or more to under 2 minutes) and enables up to 95% training goodput on cluster sizes with thousands of AI accelerators.  ( 117 min )
    Adaptive infrastructure for foundation model training with elastic training on SageMaker HyperPod
    Amazon SageMaker HyperPod now supports elastic training, enabling your machine learning (ML) workloads to automatically scale based on resource availability. In this post, we demonstrate how elastic training helps you maximize GPU utilization, reduce costs, and accelerate model development through dynamic resource adaptation, while maintain training quality and minimizing manual intervention.  ( 114 min )
    Customize agent workflows with advanced orchestration techniques using Strands Agents
    In this post, we explore two powerful orchestration patterns implemented with Strands Agents. Using a common set of travel planning tools, we demonstrate how different orchestration strategies can solve the same problem through distinct reasoning approaches,  ( 119 min )
    Operationalize generative AI workloads and scale to hundreds of use cases with Amazon Bedrock – Part 1: GenAIOps
    In this first part of our two-part series, you'll learn how to evolve your existing DevOps architecture for generative AI workloads and implement GenAIOps practices. We'll showcase practical implementation strategies for different generative AI adoption levels, focusing on consuming foundation models.  ( 122 min )
    Applying data loading best practices for ML training with Amazon S3 clients
    In this post, we present practical techniques and recommendations for optimizing throughput in ML training workloads that read data directly from Amazon S3 general purpose buckets.  ( 116 min )
  • Open

    SRE Weekly Issue #501
    View on sreweekly.com A message from our sponsor, Depot: “Waiting for a runner” but the runner is online? Depot debugs three cases where symptoms misled engineers. Workflow permissions, Azure authentication, and Dependabot’s security context all caused failures that looked like infrastructure problems. AI and the ironies of automation – Part 1 A thoughtful evaluation of […]  ( 4 min )

  • Open

    Building a voice-driven AWS assistant with Amazon Nova Sonic
    In this post, we explore how to build a sophisticated voice-powered AWS operations assistant using Amazon Nova Sonic for speech processing and Strands Agents for multi-agent orchestration. This solution demonstrates how natural language voice interactions can transform cloud operations, making AWS services more accessible and operations more efficient.  ( 109 min )

  • Open

    How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro
    This post walks through how Harmonic Security used Amazon SageMaker AI, Amazon Bedrock, and Amazon Nova Pro to fine-tune a ModernBERT model, achieving low-latency, accurate, and scalable data leakage detection.  ( 115 min )
    How Swisscom builds enterprise agentic AI for customer support and sales using Amazon Bedrock AgentCore
    In this post, we'll show how Swisscom implemented Amazon Bedrock AgentCore to build and scale their enterprise AI agents for customer support and sales operations. As an early adopter of Amazon Bedrock in the AWS Europe Region (Zurich), Swisscom leads in enterprise AI implementation with their Chatbot Builder system and various AI initiatives. Their successful deployments include Conversational AI powered by Rasa and fine-tuned LLMs on Amazon SageMaker, and the Swisscom Swisscom myAI assistant, built to meet Swiss data protection standards.  ( 111 min )
    Scaling MLflow for enterprise AI: What’s New in SageMaker AI with MLflow
    Today we’re announcing Amazon SageMaker AI with MLflow, now including a serverless capability that dynamically manages infrastructure provisioning, scaling, and operations for artificial intelligence and machine learning (AI/ML) development tasks. In this post, we explore how these new capabilities help you run large MLflow workloads—from generative AI agents to large language model (LLM) experimentation—with improved performance, automation, and security using SageMaker AI with MLflow.  ( 108 min )
    Amazon Bedrock AgentCore Observability with Langfuse
    In this post, we explain how to integrate Langfuse observability with Amazon Bedrock AgentCore to gain deep visibility into an AI agent's performance, debug issues faster, and optimize costs. We walk through a complete implementation using Strands agents deployed on AgentCore Runtime followed by step-by-step code examples.  ( 112 min )

  • Open

    Implement automated smoke testing using Amazon Nova Act headless mode
    This post shows how to implement automated smoke testing using Amazon Nova Act headless mode in CI/CD pipelines. We use SauceDemo, a sample ecommerce application, as our target for demonstration. We demonstrate setting up Amazon Nova Act for headless browser automation in CI/CD environments and creating smoke tests that validate key user workflows. We then show how to implement parallel execution to maximize testing efficiency, configure GitLab CI/CD for automatic test execution on every deployment, and apply best practices for maintainable and scalable test automation.  ( 116 min )

  • Open

    Real-world reasoning: How Amazon Nova Lite 2.0 handles complex customer support scenarios
    This post evaluates the reasoning capabilities of our latest offering in the Nova family, Amazon Nova Lite 2.0, using practical scenarios that test these critical dimensions. We compare its performance against other models in the Nova family—Lite 1.0, Micro, Pro 1.0, and Premier—to elucidate how the latest version advances reasoning quality and consistency.  ( 116 min )
    Create AI-powered chat assistants for your enterprise with Amazon Quick Suite
    In this post, we show how to build chat agents in Amazon Quick Suite. We walk through a three-layer framework—identity, instructions, and knowledge—that transforms Quick Suite chat agents into intelligent enterprise AI assistants. In our example, we demonstrate how our chat agent guides feature discovery, use enterprise data to inform recommendations, and tailors solutions based on potential to impact and your team’s adoption readiness.  ( 116 min )

  • Open

    How AWS delivers generative AI to the public sector in weeks, not years
    Experts at the Generative AI Innovation Center share several strategies to help organizations excel with generative AI.  ( 110 min )
    S&P Global Data integration expands Amazon Quick Research capabilities
    Today, we are pleased to announce a new integration between Amazon Quick Research and S&P Global. This integration brings both S&P Global Energy news, research, and insights and S&P Global Market Intelligence data to Quick Research customers in one deep research agent. In this post, we explore S&P Global’s data sets and the solution architecture of the integration with Quick Research.  ( 109 min )
    Streamline AI agent tool interactions: Connect API Gateway to AgentCore Gateway with MCP
    AgentCore Gateway now supports API GatewayAs organizations explore the possibilities of agentic applications, they continue to navigate challenges of using enterprise data as context in invocation requests to large language models (LLMs) in a manner that is secure and aligned with enterprise policies. This post covers these new capabilities and shows how to implement them.  ( 111 min )
    Create an intelligent insurance underwriter agent powered by Amazon Nova 2 Lite and Amazon Quick Suite
    In this post, we demonstrate how to build an intelligent insurance underwriting agent that addresses three critical challenges: unifying siloed data across CRM systems and databases, providing explainable and auditable AI decisions for regulatory compliance, and enabling automated fraud detection with consistent underwriting rules. The solution combines Amazon Nova 2 Lite for transparent risk assessment, Amazon Bedrock AgentCore for managed MCP server infrastructure, and Amazon Quick Suite for natural language interactions—delivering a production-ready system that underwriters can deploy in under 30 minutes .  ( 111 min )
  • Open

    SRE Weekly Issue #500
    View on sreweekly.com A message from our sponsor, Depot: Stop hunting through GitHub Actions logs. Depot now offers powerful CI log search across all your repositories and workflows. With smart filtering by timeframe, runner type, and keywords, you’ll have all the information at your fingertips to debug faster. Wow, five hundred issues! I sent the […]  ( 4 min )

  • Open

    Milo cancer diary part 21 – CHOP #4
    Milo’s had a fantastic long remission – it’s been almost nine months since his last chemo. Long enough that we started hoping for a miracle, and that he might not relapse again. But… the good folk at North Downs Specialist Referrals (NDSR) were right to be concerned about his last scan, and get him back […]  ( 13 min )

  • Open

    November 2025
    Pupdate It’s been pretty cold and wet, so the boys are needing to wear their coats outside. Milo had a scan at the start of the month. Initially things were looking good, and the plan was to stretch out the next visit to three months time :) But then the technician noticed some lymph node […]  ( 14 min )
    November 2025
    Pupdate It’s been pretty cold and wet, so the boys are needing to wear their coats outside. Milo had a scan at the start of the month. Initially things were looking good, and the plan was to stretch out the next visit to three months time :) But then the technician noticed some lymph node […]  ( 14 min )
  • Open

    SRE Weekly Issue #499
    View on sreweekly.com Fix-mas Countdown The folks at Uptime Labs and Advanced Capacity Labs have announced an advent calendar for this December. Note: In order to take part, you’ll need to provide an email address to subscribe. I gave that some serious thought before including this here, but ultimately, I have a lot of trust […]  ( 4 min )

  • Open

    Clocking on at the Outrage Factory
    TL;DR Our online discourse is the victim of industrial scale pollution, and the incentives are being aligned in the wrong direction. Rather than polluters being penalised there’s now an entire industry that’s paid to pollute. Filter Failure at the Outrage Factory is no longer just the work of ‘amateur’ fringe trolls and state sponsored propaganda; […]  ( 14 min )
    Clocking on at the Outrage Factory
    TL;DR Our online discourse is the victim of industrial scale pollution, and the incentives are being aligned in the wrong direction. Rather than polluters being penalised there’s now an entire industry that’s paid to pollute. Filter Failure at the Outrage Factory is no longer just the work of ‘amateur’ fringe trolls and state sponsored propaganda; […]  ( 14 min )

  • Open

    How Myriad Genetics achieved fast, accurate, and cost-efficient document processing using the AWS open-source Generative AI Intelligent Document Processing Accelerator
    In this post, we explore how Myriad Genetics partnered with the AWS Generative AI Innovation Center to transform their healthcare document processing pipeline using Amazon Bedrock and Amazon Nova foundation models, achieving 98% classification accuracy while reducing costs by 77% and processing time by 80%. We detail the technical implementation using AWS's open-source GenAI Intelligent Document Processing Accelerator, the optimization strategies for document classification and key information extraction, and the measurable business impact on Myriad's prior authorization workflows.  ( 115 min )
    How CBRE powers unified property management search and digital assistant using Amazon Bedrock
    In this post, CBRE and AWS demonstrate how they transformed property management by building a unified search and digital assistant using Amazon Bedrock, enabling professionals to access millions of documents and multiple databases through natural language queries. The solution combines Amazon Nova Pro for SQL generation and Claude Haiku for document interactions, achieving a 67% reduction in processing time while maintaining enterprise-grade security across more than eight million documents.  ( 118 min )
    Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod
    In this post, we introduce Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod, new capabilities that can reduce time to first token by up to 40% and lower compute costs by up to 25% for long context prompts and multi-turn conversations. These features automatically manage distributed KV caching infrastructure and intelligent request routing, making it easier to deploy production-scale LLM inference workloads with enterprise-grade performance while significantly reducing operational overhead.  ( 113 min )

  • Open

    Apply fine-grained access control with Bedrock AgentCore Gateway interceptors
    We are launching a new feature: gateway interceptors for Amazon Bedrock AgentCore Gateway. This powerful new capability provides fine-grained security, dynamic access control, and flexible schema management.  ( 119 min )
    How Condé Nast accelerated contract processing and rights analysis with Amazon Bedrock
    In this post, we explore how Condé Nast used Amazon Bedrock and Anthropic’s Claude to accelerate their contract processing and rights analysis workstreams. The company’s extensive portfolio, spanning multiple brands and geographies, required managing an increasingly complex web of contracts, rights, and licensing agreements.  ( 113 min )
    Building AI-Powered Voice Applications: Amazon Nova Sonic Telephony Integration Guide
    Available through the Amazon Bedrock bidirectional streaming API, Amazon Nova Sonic can connect to your business data and external tools and can be integrated directly with telephony systems. This post will introduce sample implementations for the most common telephony scenarios.  ( 112 min )
    University of California Los Angeles delivers an immersive theater experience with AWS generative AI services
    In this post, we will walk through the performance constraints and design choices by OARC and REMAP teams at UCLA, including how AWS serverless infrastructure, AWS Managed Services, and generative AI services supported the rapid design and deployment of our solution. We will also describe our use of Amazon SageMaker AI and how it can be used reliably in immersive live experiences.  ( 114 min )
    Optimizing Mobileye’s REM™ with AWS Graviton: A focus on ML inference and Triton integration
    This post is written by Chaim Rand, Principal Engineer, Pini Reisman, Software Senior Principal Engineer, and Eliyah Weinberg, Performance and Technology Innovation Engineer, at Mobileye. The Mobileye team would like to thank Sunita Nadampalli and Guy Almog from AWS for their contributions to this solution and this post. Mobileye is driving the global evolution toward […]  ( 112 min )
    Evaluate models with the Amazon Nova evaluation container using Amazon SageMaker AI
    This blog post introduces the new Amazon Nova model evaluation features in Amazon SageMaker AI. This release adds custom metrics support, LLM-based preference testing, log probability capture, metadata analysis, and multi-node scaling for large evaluations.  ( 120 min )
    Beyond the technology: Workforce changes for AI
    In this post, we explore three essential strategies for successfully integrating AI into your organization: addressing organizational debt before it compounds, embracing distributed decision-making through the "octopus organization" model, and redefining management roles to align with AI-powered workflows. Organizations must invest in both technology and workforce preparation, focusing on streamlining processes, empowering teams with autonomous decision-making within defined parameters, and evolving each management layer from traditional oversight to mentorship, quality assurance, and strategic vision-setting.  ( 107 min )
    Enhanced performance for Amazon Bedrock Custom Model Import
    You can now achieve significant performance improvements when using Amazon Bedrock Custom Model Import, with reduced end-to-end latency, faster time-to-first-token, and improved throughput through advanced PyTorch compilation and CUDA graph optimizations. With Amazon Bedrock Custom Model Import you can to bring your own foundation models to Amazon Bedrock for deployment and inference at scale. In this post, we introduce how to use the improvements in Amazon Bedrock Custom Model Import.  ( 114 min )
    Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference
    Amazon SageMaker AI now supports EAGLE-based adaptive speculative decoding, a technique that accelerates large language model inference by up to 2.5x while maintaining output quality. In this post, we explain how to use EAGLE 2 and EAGLE 3 speculative decoding in Amazon SageMaker AI, covering the solution architecture, optimization workflows using your own datasets or SageMaker's built-in data, and benchmark results demonstrating significant improvements in throughput and latency.  ( 112 min )

  • Open

    Train custom computer vision defect detection model using Amazon SageMaker
    In this post, we demonstrate how to migrate computer vision workloads from Amazon Lookout for Vision to Amazon SageMaker AI by training custom defect detection models using pre-trained models available on AWS Marketplace. We provide step-by-step guidance on labeling datasets with SageMaker Ground Truth, training models with flexible hyperparameter configurations, and deploying them for real-time or batch inference—giving you greater control and flexibility for automated quality inspection use cases.  ( 116 min )
    Practical implementation considerations to close the AI value gap
    The AWS Customer Success Center of Excellence (CS COE) helps customers get tangible value from their AWS investments. We've seen a pattern: customers who build AI strategies that address people, process, and technology together succeed more often. In this post, we share practical considerations that can help close the AI value gap.  ( 111 min )
    Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI
    We're introducing bidirectional streaming for Amazon SageMaker AI Inference, which transforms inference from a transactional exchange into a continuous conversation. This post shows you how to build and deploy a container with bidirectional streaming capability to a SageMaker AI endpoint. We also demonstrate how you can bring your own container or use our partner Deepgram's pre-built models and containers on SageMaker AI to enable bi-directional streaming feature for real-time inference.  ( 114 min )
    Warner Bros. Discovery achieves 60% cost savings and faster ML inference with AWS Graviton
    Warner Bros. Discovery (WBD) is a leading global media and entertainment company that creates and distributes the world’s most differentiated and complete portfolio of content and brands across television, film and streaming. In this post, we describe the scale of our offerings, artificial intelligence (AI)/machine learning (ML) inference infrastructure requirements for our real time recommender systems, and how we used AWS Graviton-based Amazon SageMaker AI instances for our ML inference workloads and achieved 60% cost savings and 7% to 60% latency improvements across different models.  ( 110 min )
    Physical AI in practice: Technical foundations that fuel human-machine interactions
    In this post, we explore the complete development lifecycle of physical AI—from data collection and model training to edge deployment—and examine how these intelligent systems learn to understand, reason, and interact with the physical world through continuous feedback loops. We illustrate this workflow through Diligent Robotics' Moxi, a mobile manipulation robot that has completed over 1.2 million deliveries in hospitals, saving nearly 600,000 hours for clinical staff while transforming healthcare logistics and returning valuable time to patient care.  ( 111 min )
    HyperPod now supports Multi-Instance GPU to maximize GPU utilization for generative AI tasks
    In this post, we explore how Amazon SageMaker HyperPod now supports NVIDIA Multi-Instance GPU (MIG) technology, enabling you to partition powerful GPUs into multiple isolated instances for running concurrent workloads like inference, research, and interactive development. By maximizing GPU utilization and reducing wasted resources, MIG helps organizations optimize costs while maintaining performance isolation and predictable quality of service across diverse machine learning tasks.  ( 128 min )
2025-12-24T18:27:21.887Z osmosfeed 1.15.1