• Open

    AWS and NVIDIA deepen strategic collaboration to accelerate AI from pilot to production
    Today at NVIDIA GTC 2026, AWS and NVIDIA announced an expanded collaboration with new technology integrations to support growing AI compute demand and help you build and run AI solutions that are production-ready.  ( 109 min )
    Agentic AI in the Enterprise Part 2: Guidance by Persona
    This is Part II of a two-part series from the AWS Generative AI Innovation Center. In Part II, we speak directly to the leaders who must turn that shared foundation into action. Each role carries a distinct set of responsibilities, risks, and leverage points. Whether you own a P&L, run enterprise architecture, lead security, govern data, or manage compliance, this section is written in the language of your job—because that's where agentic AI either succeeds or quietly dies.  ( 113 min )
    Introducing Disaggregated Inference on AWS powered by llm-d
    In this blog post, we introduce the concepts behind next-generation inference capabilities, including disaggregated serving, intelligent request scheduling, and expert parallelism. We discuss their benefits and walk through how you can implement them on Amazon SageMaker HyperPod EKS to achieve significant improvements in inference performance, resource utilization, and operational efficiency.  ( 115 min )
    How Workhuman built multi-tenant self-service reporting using Amazon Quick Sight embedded dashboards
    This post explores how Workhuman transformed their analytics delivery model and the key lessons learned from their implementation. We go through their architecture approach, implementation strategy, and the business outcomes they achieved—providing you with a practical blueprint for adding embedded analytics to your own software as a service (SaaS) applications.  ( 114 min )
    Build an offline feature store using Amazon SageMaker Unified Studio and SageMaker Catalog
    This blog post provides step-by-step guidance on implementing an offline feature store using SageMaker Catalog within a SageMaker Unified Studio domain. By adopting a publish-subscribe pattern, data producers can use this solution to publish curated, versioned feature tables—while data consumers can securely discover, subscribe to, and reuse them for model development.  ( 116 min )

  • Open

    P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM
    In this post, we explain how P-EAGLE works, how we integrated it into vLLM starting from v0.16.0 (PR#32887), and how to serve it with our pre-trained checkpoints.  ( 113 min )

  • Open

    Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption
    Today, we’re announcing two new Amazon CloudWatch metrics for Amazon Bedrock, TimeToFirstToken and EstimatedTPMQuotaUsage. In this post, we cover how these work and how to set alarms, establish baselines, and proactively manage capacity using them.  ( 112 min )
    Secure AI agents with Policy in Amazon Bedrock AgentCore
    In this post, you will understand how Policy in Amazon Bedrock AgentCore creates a deterministic enforcement layer that operates independently of the agent's own reasoning. You will learn how to turn natural language descriptions of your business rules into Cedar policies, then use those policies to enforce fine-grained, identity-aware controls so that agents only access the tools and data that their users are authorized to use. You will also see how to apply Policy through AgentCore Gateway, intercepting and evaluating every agent-to-tool request at runtime.  ( 115 min )
    Multimodal embeddings at scale: AI data lake for media and entertainment workloads
    This post shows you how to build a scalable multimodal video search system that enables natural language search across large video datasets using Amazon Nova models and Amazon OpenSearch Service. You will learn how to move beyond manual tagging and keyword-based searches to enable semantic search that captures the full richness of video content.  ( 114 min )
    Fine-tuning NVIDIA Nemotron Speech ASR on Amazon EC2 for domain adaptation
    In this post, we explore how to fine-tune a leaderboard-topping, NVIDIA Nemotron Speech Automatic Speech Recognition (ASR) model; Parakeet TDT 0.6B V2. Using synthetic speech data to achieve superior transcription results for specialised applications, we'll walk through an end-to-end workflow that combines AWS infrastructure with the following popular open-source frameworks.  ( 120 min )

  • Open

    Operationalizing Agentic AI Part 1: A Stakeholder’s Guide
    The AWS Generative AI Innovation Center has helped 1,000+ customers move AI into production, delivering millions in documented productivity gains. In this post, we share guidance for leaders across the C-suite: CTOs, CISOs, CDOs, and Chief Data Science/AI officers, as well as business owners and compliance leads.  ( 110 min )

  • Open

    Accelerate custom LLM deployment: Fine-tune with Oumi and deploy to Amazon Bedrock
    In this post, we show how to fine-tune a Llama model using Oumi on Amazon EC2 (with the option to create synthetic data using Oumi), store artifacts in Amazon S3, and deploy to Amazon Bedrock using Custom Model Import for managed inference.  ( 110 min )

  • Open

    Run NVIDIA Nemotron 3 Nano as a fully managed serverless model on Amazon Bedrock
    We are excited to announce that NVIDIA’s Nemotron 3 Nano is now available as a fully managed and serverless model in Amazon Bedrock. This follows our earlier announcement at AWS re:Invent supporting NVIDIA Nemotron 2 Nano 9B and NVIDIA Nemotron 2 Nano VL 12B models. This post explores the technical characteristics of the NVIDIA Nemotron 3 Nano model and discusses potential application use cases. Additionally, it provides technical guidance to help you get started using this model for your generative AI applications within the Amazon Bedrock environment.  ( 111 min )
    Access Anthropic Claude models in India on Amazon Bedrock with Global cross-Region inference
    In this post, you will discover how to use Amazon Bedrock's Global cross-Region Inference for Claude models in India. We will guide you through the capabilities of each Claude model variant and how to get started with a code example to help you start building generative AI applications immediately.  ( 115 min )

  • Open

    规则设计重要性
    从一次选举中看规则设计的重要性 差额选举规则说明 1. 选举方式 本次选举实行差额选举。 候选人总数:6 名 应选名额:5 名 差额人数:1 名 2. 有效票判定标准 有效票:每张选票所选人数(含“赞成”票与“另选他人”)等于或少于 5 名。 无效票(废票):每张选票所选人数超过 5 名。 3. 填写符号规范 请在候选人姓名上方的空格内按规定符号填写: 投票意向 操作说明 符号示例 赞成 在候选人姓名上方的空格内画圆圈 ○ 反对 在候选人姓名上方的空格内画叉号 × 弃权 在候选人姓名上方的空格内不画任何符号 (空白) 关于“另选他人” 如果您反对某位候选人并希望另选他人,请按以下步骤操作: 在选票指定的 “另选人” 栏内填写您希望选举的人的姓名。 在该姓名的上方空格内画 “○”。 注意:“另选他人”也计入总票数,需确保总赞成人数(候选人 + 另选人)不超过 5 人,否则选票无效。 4. 注意事项 符号清晰:请务必保证所画符号清晰、准确,易于辨认。 严禁涂改:选票不得随意涂改。若涂改导致字迹不清或无法辨认,该选票可能被视为无效票。 规范填写:请严格按照上述要求填写,未按要求操作的选票可能被认定为废票。 通俗理解 投票操作指南(6选5) 怎么选? 您最多可以投 5 个人的赞成票。 怎么画? ✅ 同意他:在名字上面画个圈 ○ ❌ 不同意:在名字上面画个叉 × ⭕ 弃权:什么都不画,留空即可 特别提醒: 如果您画圈的人数超过5人,这张票就作废了。 如果您想选名单以外的人,请在“另选人”处写上名字并画圈。 您可以只选1人、4人或5人,也可以全部反对或全部弃权。 投票结果示例: 一次真实的选举 “41人,40人投出完全一样的选票” 最近,我经历了一场小型的选举。 现场有41位投票人,需要从6位候选人中选出5位。说实话,这6位候选人里,我真正熟悉的只有2位,剩下的大半脸对我来说都是陌生的。 拿到选票时,我仔细端详了一番。姓名排列看似毫无规律,没有明显的倾向(后来我知道,这是严格按照笔画排序的,一种极致的公平)。 面对陌生的名字,该怎么选? 在那一刻,我没有看到左右交头接耳,也没有左右权衡。一种潜意识的直觉驱使着我: 既然看不出区别,那就顺着顺序,选前五个吧。 我觉得自己填得很“客观”,甚至带着一点“随机”的洒脱。 然而,当计票结果出来的那一刻,我愣住了。 41人参与投票,最终有40人的选择,和我一模一样。 全场只有1个人,做出了不同的选择。其余40人,在互不交流、没有串通、没有任何暗示的情况下,竟然齐刷刷地勾选了名单上的前五位。 没有暗箱操作,过程公开透明。但这一结果,却像某种无声的魔法,让我感到深深的震撼。 我自知没有人提前和我打过招呼,选举过程公开公正。可是这个结果让我感到很意外。 为何会出现如此惊人的“沉默共识”? 我问了一下千问,AI 给出一份答案: 选票的设计固然是严谨的,可是作为选民手中掌握的信息对称性很重要。如果信息不对称,那么多数人会遵循“心理捷径”的惯性思维。 按规定,候选人按笔画排序本是为体现公平,杜绝人为干预。但在选民对候选人缺乏深入了解时,这种“绝对中立”的排序反而成了潜意识的“推荐序”。心理学上的“首因效应”让大脑默认:排在前面的或许更资深、更靠谱。于是,“顺手勾前五”成了认知成本最低的决策路径。  ( 2 min )

  • Open

    Drive organizational growth with Amazon Lex multi-developer CI/CD pipeline
    In this post, we walk through a multi-developer CI/CD pipeline for Amazon Lex that enables isolated development environments, automated testing, and streamlined deployments. We show you how to set up the solution and share real-world results from teams using this approach.  ( 111 min )
    Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints
    This post demonstrates how to build custom model parsers for Strands agents when working with LLMs hosted on SageMaker that don't natively support the Bedrock Messages API format. We'll walk through deploying Llama 3.1 with SGLang on SageMaker using awslabs/ml-container-creator, then implementing a custom parser to integrate it with Strands agents.  ( 110 min )

  • Open

    Embed Amazon Quick Suite chat agents in enterprise applications
    Organizations find it challenging to implement a secure embedded chat in their applications and can require weeks of development to build authentication, token validation, domain security, and global distribution infrastructure. In this post, we show you how to solve this with a one-click deployment solution to embed the chat agents using the Quick Suite Embedding SDK in enterprise portals.  ( 109 min )
    Unlock powerful call center analytics with Amazon Nova foundation models
    In this post, we discuss how Amazon Nova demonstrates capabilities in conversational analytics, call classification, and other use cases often relevant to contact center solutions. We examine these capabilities for both single-call and multi-call analytics use cases.  ( 111 min )
    How Ricoh built a scalable intelligent document processing solution on AWS
    This post explores how Ricoh built a standardized, multi-tenant solution for automated document classification and extraction using the AWS GenAI IDP Accelerator as a foundation, transforming their document processing from a custom-engineering bottleneck into a scalable, repeatable service.  ( 118 min )

  • Open

    Building a scalable virtual try-on solution using Amazon Nova on AWS: part 1
    In this post, we explore the virtual try-on capability now available in Amazon Nova Canvas, including sample code to get started quickly and tips to help get the best outputs.  ( 112 min )
    How Lendi revamped the refinance journey for its customers using agentic AI in 16 weeks using Amazon Bedrock
    This post details how Lendi Group built their AI-powered Home Loan Guardian using Amazon Bedrock, the challenges they faced, the architecture they implemented, and the significant business outcomes they’ve achieved. Their journey offers valuable insights for organizations that want to use generative AI to transform customer experiences while maintaining the human touch that builds trust and loyalty.  ( 113 min )
    How Tines enhances security analysis with Amazon Quick Suite
    In this post, we show you how to connect Quick Suite with Tines to securely retrieve, analyze, and visualize enterprise data from any security or IT system. We walk through an example that uses a MCP server in Tines to retrieve data from various tools, such as AWS CloudTrail, Okta, and VirusTotal, to remediate security events using Quick Suite.  ( 110 min )

  • Open

    Building specialized AI without sacrificing intelligence: Nova Forge data mixing in action
    In this post, we share results from the AWS China Applied Science team's comprehensive evaluation of Nova Forge using a challenging Voice of Customer (VOC) classification task, benchmarked against open-source models.  ( 111 min )
    Build a serverless conversational AI agent using Claude with LangGraph and managed MLflow on Amazon SageMaker AI
    This post explores how to build an intelligent conversational agent using Amazon Bedrock, LangGraph, and managed MLflow on Amazon SageMaker AI.  ( 114 min )
    Build safe generative AI applications like a Pro: Best Practices with Amazon Bedrock Guardrails
    In this post, we will show you how to configure Amazon Bedrock Guardrails for efficient performance, implement best practices to protect your applications, and monitor your deployment effectively to maintain the right balance between safety and user experience.  ( 114 min )

  • Open

    February 2026
    Pupdate It’s been a pretty dank February, so the coats have mostly stayed on for walks. But the boys have been enjoying their usual doggy mischief. Milo is now half way through his 4th chemo protocol, and the second half has previous been easier as the pace slows down to vet visits every two weeks. […]  ( 16 min )
    February 2026
    Pupdate It’s been a pretty dank February, so the coats have mostly stayed on for walks. But the boys have been enjoying their usual doggy mischief. Milo is now half way through his 4th chemo protocol, and the second half has previous been easier as the pace slows down to vet visits every two weeks. […]  ( 16 min )

  • Open

    Publishing apt and yum/dnf repos on GitHub Pages
    TL;DR GitHub Pages is a practical way to host a low volume repo for apt and yum/dnf. The relevant metadata can be generated using GitHub Actions, and the process can be triggered by a release from the source repo. Background In my last post I wrote about creating .deb and .rpm packages (for our Dart […]  ( 14 min )
    Publishing apt and yum/dnf repos on GitHub Pages
    TL;DR GitHub Pages is a practical way to host a low volume repo for apt and yum/dnf. The relevant metadata can be generated using GitHub Actions, and the process can be triggered by a release from the source repo. Background In my last post I wrote about creating .deb and .rpm packages (for our Dart […]  ( 14 min )

  • Open

    Learnings from COBOL modernization in the real world
    Delivering successful COBOL modernization requires a solution that can reverse engineer deterministically, produce validated and traceable specs, and help those specs flow into any AI-powered coding assistant for the forward engineering. A successful modernization requires both reverse engineering and forward engineering. Learn more about COBOL in this post.  ( 109 min )
    Reinforcement fine-tuning for Amazon Nova: Teaching AI through feedback
    In this post, we explore reinforcement fine-tuning (RFT) for Amazon Nova models, which can be a powerful customization technique that learns through evaluation rather than imitation. We'll cover how RFT works, when to use it versus supervised fine-tuning, real-world applications from code generation to customer service, and implementation options ranging from fully managed Amazon Bedrock to multi-turn agentic workflows with Nova Forge. You'll also learn practical guidance on data preparation, reward function design, and best practices for achieving optimal results.  ( 118 min )
    Large model inference container – latest capabilities and performance enhancements
    AWS recently released significant updates to the Large Model Inference (LMI) container, delivering comprehensive performance improvements, expanded model support, and streamlined deployment capabilities for customers hosting LLMs on AWS. These releases focus on reducing operational complexity while delivering measurable performance gains across popular model architectures.  ( 111 min )

  • Open

    Efficiently serve dozens of fine-tuned models with vLLM on Amazon SageMaker AI and Amazon Bedrock
    In this post, we explain how we implemented multi-LoRA inference for Mixture of Experts (MoE) models in vLLM, describe the kernel-level optimizations we performed, and show you how you can benefit from this work. We use GPT-OSS 20B as our primary example throughout this post.  ( 114 min )
    Building intelligent event agents using Amazon Bedrock AgentCore and Amazon Bedrock Knowledge Bases
    This post demonstrates how to quickly deploy a production-ready event assistant using the components of Amazon Bedrock AgentCore. We'll build an intelligent companion that remembers attendee preferences and builds personalized experiences over time, while Amazon Bedrock AgentCore handles the heavy lifting of production deployment: Amazon Bedrock AgentCore Memory for maintaining both conversation context and long-term preferences without custom storage solutions, Amazon Bedrock AgentCore Identity for secure multi-IDP authentication, and Amazon Bedrock AgentCore Runtime for serverless scaling and session isolation. We will also use Amazon Bedrock Knowledge Bases for managed RAG and event data retrieval.  ( 112 min )

  • Open

    Build an intelligent photo search using Amazon Rekognition, Amazon Neptune, and Amazon Bedrock
    In this post, we show you how to build a comprehensive photo search system using the AWS Cloud Development Kit (AWS CDK) that integrates Amazon Rekognition for face and object detection, Amazon Neptune for relationship mapping, and Amazon Bedrock for AI-powered captioning.  ( 112 min )
    Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs
    In this post, we demonstrate how to train CodeFu-7B, a specialized 7-billion parameter model for competitive programming, using Group Relative Policy Optimization (GRPO) with veRL, a flexible and efficient training library for large language models (LLMs) that enables straightforward extension of diverse RL algorithms and seamless integration with existing LLM infrastructure, within a distributed Ray cluster managed by SageMaker training jobs. We walk through the complete implementation, covering data preparation, distributed training setup, and comprehensive observability, showcasing how this unified approach delivers both computational scale and developer experience for sophisticated RL training workloads.  ( 118 min )
    Generate structured output from LLMs with Dottxt Outlines in AWS
    This post explores the implementation of Dottxt’s Outlines framework as a practical approach to implementing structured outputs using AWS Marketplace in Amazon SageMaker.  ( 114 min )
    Global cross-Region inference for latest Anthropic Claude Opus, Sonnet and Haiku models on Amazon Bedrock in Thailand, Malaysia, Singapore, Indonesia, and Taiwan
    In this post, we are exciting to announce availability of Global CRIS for customers in Thailand, Malaysia, Singapore, Indonesia, and Taiwan and give a walkthrough of technical implementation steps, and cover quota management best practices to maximize the value of your AI Inference deployments. We also provide guidance on best practices for production deployments.  ( 116 min )
    Introducing Amazon Bedrock global cross-Region inference for Anthropic’s Claude models in the Middle East Regions (UAE and Bahrain)
    We’re excited to announce the availability of Anthropic’s Claude Opus 4.6, Claude Sonnet 4.6, Claude Opus 4.5, Claude Sonnet 4.5, and Claude Haiku 4.5 through Amazon Bedrock global cross-Region inference for customers operating in the Middle East. In this post, we guide you through the capabilities of each Anthropic Claude model variant, the key advantages of global cross-Region inference including improved resilience, real-world use cases you can implement, and a code example to help you start building generative AI applications immediately.  ( 110 min )
  • Open

    Packaging Dart binaries as .deb and .rpm etc.
    TL;DR nFPM makes it very easy to put your binaries into a Debian .deb or RedHat Package Manager .rpm file. Background We’ve been using full stack Dart and Flutter at Atsign since the dawn of the company in 2019, so when NoPorts came along we released the binaries in tarballs (or zip files) from GitHub […]  ( 14 min )
    Packaging Dart binaries as .deb and .rpm etc.
    TL;DR nFPM makes it very easy to put your binaries into a Debian .deb or RedHat Package Manager .rpm file. Background We’ve been using full stack Dart and Flutter at Atsign since the dawn of the company in 2019, so when NoPorts came along we released the binaries in tarballs (or zip files) from GitHub […]

  • Open

    Scaling data annotation using vision-language models to power physical AI systems
    In this post, we examine how Bedrock Robotics tackles this challenge. By joining the AWS Physical AI Fellowship, the startup partnered with the AWS Generative AI Innovation Center to apply vision-language models that analyze construction video footage, extract operational details, and generate labeled training datasets at scale, to improve data preparation for autonomous construction equipment.  ( 109 min )
    How Sonrai uses Amazon SageMaker AI to accelerate precision medicine trials
    In this post, we explore how Sonrai, a life sciences AI company, partnered with AWS to build a robust MLOps framework using Amazon SageMaker AI that addresses these challenges while maintaining the traceability and reproducibility required in regulated environments.  ( 111 min )
    Accelerating AI model production at Hexagon with Amazon SageMaker HyperPod
    In this blog post, we demonstrate how Hexagon collaborated with Amazon Web Services to scale their AI model production by pretraining state-of-the-art segmentation models, using the model training infrastructure of Amazon SageMaker HyperPod.  ( 110 min )
    Agentic AI with multi-model framework using Hugging Face smolagents on AWS
    Hugging Face smolagents is an open source Python library designed to make it straightforward to build and run agents using a few lines of code. We will show you how to build an agentic AI solution by integrating Hugging Face smolagents with Amazon Web Services (AWS) managed services. You'll learn how to deploy a healthcare AI agent that demonstrates multi-model deployment options, vector-enhanced knowledge retrieval, and clinical decision support capabilities.  ( 116 min )

  • Open

    Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads
    In 2025, Amazon SageMaker AI saw dramatic improvements to core infrastructure offerings along four dimensions: capacity, price performance, observability, and usability. In this series of posts, we discuss these various improvements and their benefits. In Part 1, we discuss capacity improvements with the launch of Flexible Training Plans. We also describe improvements to price performance for inference workloads. In Part 2, we discuss enhancements made to observability, model customization, and model hosting.  ( 113 min )
    Amazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting
    In 2025, Amazon SageMaker AI made several improvements designed to help you train, tune, and host generative AI workloads. In Part 1 of this series, we discussed Flexible Training Plans and price performance improvements made to inference components. In this post, we discuss enhancements made to observability, model customization, and model hosting. These improvements facilitate a whole new class of customer use cases to be hosted on SageMaker AI.  ( 111 min )
    Integrate external tools with Amazon Quick Agents using Model Context Protocol (MCP)
    In this post, you’ll use a six-step checklist to build a new MCP server or validate and adjust an existing MCP server for Amazon Quick integration. The Amazon Quick User Guide describes the MCP client behavior and constraints. This is a “How to” guide for detailed implementation required by 3P partners to integrate with Amazon Quick with MCP.  ( 112 min )

  • Open

    Build AI workflows on Amazon EKS with Union.ai and Flyte
    In this post, we explain how you can use the Flyte Python SDK to orchestrate and scale AI/ML workflows. We explore how the Union.ai 2.0 system enables deployment of Flyte on Amazon Elastic Kubernetes Service (Amazon EKS), integrating seamlessly with AWS services like Amazon Simple Storage Service (Amazon S3), Amazon Aurora, AWS Identity and Access Management (IAM), and Amazon CloudWatch. We explore the solution through an AI workflow example, using the new Amazon S3 Vectors service.  ( 115 min )
    Amazon Quick Suite now supports key pair authentication to Snowflake data source
    In this blog post, we will guide you through establishing data source connectivity between Amazon Quick Sight and Snowflake through secure key pair authentication.  ( 110 min )

  • Open

    Build unified intelligence with Amazon Bedrock AgentCore
    In this post, we demonstrate how to build unified intelligence systems using Amazon Bedrock AgentCore through our real-world implementation of the Customer Agent and Knowledge Engine (CAKE).  ( 115 min )
    Evaluating AI agents: Real-world lessons from building agentic systems at Amazon
    In this post, we present a comprehensive evaluation framework for Amazon agentic AI systems that addresses the complexity of agentic AI applications at Amazon through two core components: a generic evaluation workflow that standardizes assessment procedures across diverse agent implementations, and an agent evaluation library that provides systematic measurements and metrics in Amazon Bedrock AgentCore Evaluations, along with Amazon use case-specific evaluation approaches and metrics.  ( 116 min )

  • Open

    Supercharge regulated workloads with Claude Code and Amazon Bedrock
    The release of Anthropic Claude Sonnet 4.5 in the AWS GovCloud (US) Region introduces a straightforward on-ramp for AI-assisted development for workloads with regulatory compliance requirements. In this post, we explore how to combine Claude Sonnet 4.5 on Amazon Bedrock in AWS GovCloud (US) with Claude Code, an agentic coding assistant released by Anthropic. This […]  ( 110 min )
2026-03-17T14:54:03.006Z osmosfeed 1.15.1