
AWS Introduces Amazon S3 Annotations
Quick Answer
AWS has launched Amazon S3 Annotations, allowing teams to attach up to 1 GB of rich, mutable metadata to S3 objects, significantly enhancing the metadata model.
Quick Take
AWS has launched Amazon S3 Annotations, allowing teams to attach up to 1 GB of rich, mutable metadata to S3 objects, significantly enhancing the metadata model. This feature enables independent updates and querying across datasets, addressing limitations of existing metadata systems and improving workflow possibilities for AI and analytics tools.
Key Points
- S3 Annotations support up to 1000 mutable annotations per object, totaling 1 GB.
- Annotations can be queried using Amazon Athena, Redshift, or Iceberg-compatible engines.
- User-defined metadata is limited to 2 KB, while annotations provide much richer context.
- S3 Annotations are billed at standard S3 rates, regardless of object storage tier.
- Community feedback has been overwhelmingly positive, highlighting the flexibility of annotations.
📖 Reader Mode
~3 min readAWS recently announced Amazon S3 Annotations, a feature that lets teams attach rich, searchable context such as summaries, classifications, compliance data, or AI-generated insights directly to S3 objects. Annotations can be updated independently of the object and queried across datasets, reducing the need for separate metadata systems.
Written in JSON, XML, or YAML, annotations provide AI agents and analytics tools with the context they need to find and use objects in S3. While the object storage already supported tags and both system and user-defined metadata at the object level, Daniel Abib, senior specialist solutions architect at AWS, explains:
While these capabilities work well for their intended purposes, they have limitations when you need to attach much richer context without building and maintaining separate metadata systems. Annotations address these needs by providing metadata capabilities at a fundamentally different scale and flexibility, offering mutable, queryable context per object compared to 10 immutable tags or 2 KB of headers.
S3 Annotations significantly expand S3's metadata model by allowing up to 1000 mutable annotations per object, with a combined capacity of 1 GB, compared to 2 KB for user-defined metadata and just 10 tags per object. The extended flexibility makes S3 Annotations suitable for storing rich, structured business context rather than simple attributes or lifecycle metadata. In the article "Context intelligence for your data and AI agents at scale," Mai-Lan Tomsen Bukovec, technology VP at AWS, adds:
Annotations become queryable through S3 Metadata. When you enable annotation tables on a bucket, every annotation flows automatically into a fully managed Iceberg table. You can query across all your objects with Amazon Athena, Amazon Redshift or any Iceberg-compatible engine, and agents can discover annotations in natural language through the S3 Tables MCP server.
Editing object metadata on S3 has been a long-awaited request from the community. The reaction on Reddit has been overwhelmingly positive, with user ReturnOfNogginboink writing:
The really important part here is that annotations can be modified. Unlike object metadata, which requires you to read the full object out of s3, and rewrite it to S3 with new metadata. This is the big deal here. This is going to unlock all kinds of new workflow possibilities.
S3 gets an object store in the object store, and now you can bolt a full gigabyte of "context" onto each object, because the four existing metadata mechanisms weren't confusing enough. Every announcement these days whispers from behind you "agentic workflows," which is Seattle for "your AI will generate the data, then pay Athena to read it back." Vertical integration is great, but now it's on your own bill.
S3 Annotations are stored and billed at S3 Standard rates regardless of the underlying object's storage tier. Annotations are also replicated when objects are copied, with each annotation copy counted and billed as a separate PUT request. S3 Annotations are generally available in all regions.
About the Author
Renato Losio
Show moreShow less
— Originally published at infoq.com
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from InfoQ AI, ML & Data Engineering
See more →
Google OpenRL is an Experimental Self-hosted API for LLM Post-Training Fine-tuning
Google's GKE Labs has launched OpenRL, an open-source self-hosted API designed for fine-tuning Large Language Models (LLMs) on Kubernetes clusters. This initiative aims to streamline post-training processes, making it easier for developers to enhance LLM performance without relying on external services.

