AWS Introduces Multimodal Retrieval for Manufacturing Documentation
- •AWS released a manufacturing retrieval system using Amazon Nova Multimodal Embeddings and Amazon S3 Vectors
- •The system maps text, images, and documents into a shared 1024-dimension vector space
- •Researchers evaluated the pipeline against OCR-based methods using 26 queries and an LLM-as-a-judge framework
AWS released a technical guide on May 11, 2026, detailing a multimodal retrieval system designed for aerospace and heavy manufacturing documents. The system leverages Amazon Nova Multimodal Embeddings to index technical files, including CAD drawings, inspection photographs, and thermal plots, which traditional text-only OCR pipelines often misinterpret or strip of necessary spatial context.
The solution maps text, images, and document pages into a shared 1024-dimension vector space, enabling direct cosine similarity calculations between different data types. The developers tested the system on a dataset comprising 15 standalone technical images and five multi-page PDFs, evaluating performance across 26 specific manufacturing queries. The model supports a DOCUMENT_IMAGE processing mode specifically for pages containing mixed content like charts and tables.
The project compared this multimodal pipeline against an OCR-based baseline to measure retrieval and generation quality. For each query, the system retrieved the top five results. Amazon Nova 2 Lite generated answers based on the retrieved context, while Anthropic Claude Sonnet 4.5 served as an LLM-as-a-judge to score the accuracy of generated answers on a scale of 1-5 against ground truth. Retrieval metrics included Recall@K, Mean Reciprocal Rank (MRR), and NDCG@K.