AWS Guides Intelligent Document Processing with Bedrock
- •AWS released a guide for building intelligent document processing pipelines using Amazon Bedrock Data Automation.
- •The architecture automates extraction from files up to 3,000 pages, supporting text, tables, and visual elements.
- •Specialized agents and RAG enable semantic search and cross-document validation for complex financial or legal records.
AWS has published a technical guide for building intelligent document processing pipelines using Amazon Bedrock and its Data Automation (BDA) service. This architecture addresses challenges in processing complex, multimodal documents such as insurance claims, invoices, and medical records, which traditional optical character recognition (OCR) systems often struggle to interpret due to their inability to comprehend context or relationships. BDA serves as the primary engine for this workflow, providing a unified API capable of handling diverse file formats, including PDFs and scanned documents, supporting files up to 3,000 pages and 500 MB per API request. The service automates key tasks including document classification, logical splitting, and content extraction, which reduces the need for manual intervention and error-prone sorting.
The processing architecture is structured into four integrated layers: input processing, extraction and storage, intelligence, and agentic coordination. In the input layer, Amazon S3 serves as the landing zone, triggering AWS Step Functions to orchestrate the workflow. BDA performs document splitting and matches sections to pre-configured blueprints, which define extraction logic for specific document types. A single project can support up to 40 distinct document blueprints, ensuring that data extraction remains precise even across varied document categories. The extraction layer utilizes BDA to produce structured JSON output, including textual content in reading order, table structure recognition, and detailed visual analysis of charts, graphs, and diagrams. These visual elements are interpreted through generated captions and bounding box coordinates, allowing downstream systems to reference specific visual data points.
The intelligence layer integrates Amazon Bedrock Knowledge Bases with Amazon OpenSearch Serverless to enable semantic search and Retrieval Augmented Generation (RAG - technique for retrieving external data to ground LLM responses). This allows organizations to perform complex queries spanning multiple documents. Finally, the agentic coordination layer employs Strands Agents hosted on Amazon Bedrock AgentCore Runtime to manage specialized processing tasks. For instance, in a commercial real estate scenario, specialized agents can extract Net Operating Income (NOI) projections and capitalization rate trends from embedded financial charts. The system uses coordinator agents to validate this extracted data against real-time information from external APIs. This end-to-end event-driven approach enables investment firms or similar organizations to transition from raw, unstructured documents to actionable insights using natural language queries, significantly accelerating analysis workflows.