AWS Combines Nova 2 Lite and Claude for Document Processing
- •AWS deploys a two-model pipeline using Nova 2 Lite and Claude Sonnet 4.6 for yearbook digitization.
- •The system achieved 93.3 percent confidence on 3,122 associations while costing two-thirds less than single-model alternatives.
- •Fixed per-image pricing and adaptive thinking provide predictable, scalable costs for processing hundreds of thousands of pages.
Amazon Web Services (AWS) has introduced a cost-optimized, two-model pipeline on Amazon Bedrock designed to digitize scanned yearbook pages by pairing Amazon Nova 2 Lite with Anthropic’s Claude Sonnet 4.6. This architecture separates tasks to improve efficiency: Nova 2 Lite performs native multimodal extraction in a single API call, identifying photo locations, visible names, and page metadata. Claude Sonnet 4.6 then utilizes its adaptive thinking capabilities to conduct spatial reasoning, mapping names to faces based on the page layout. This split approach significantly lowers expenses by avoiding redundant processing.
Testing across 336 scanned yearbook pages demonstrated high reliability. The pipeline generated 3,122 name-to-face associations, with 93.3 percent achieving confidence scores of 0.95 or higher, and only 0.3 percent falling below 0.90. The process costs approximately $0.033 per page, roughly two-thirds cheaper than relying on a single vision-language model. Nova 2 Lite’s fixed per-image pricing—independent of resolution or file size—adds predictability for high-volume tasks. Claude's adaptive thinking further optimizes the process by automatically adjusting internal reasoning depth according to the complexity of the page layout, whether it involves simple portrait grids or more intricate group photos.
Developers can implement this solution using the provided AWS Samples repository on GitHub, which includes Jupyter notebooks and source code. The workflow requires Python 3.10 or later and the boto3 SDK. Beyond the base pipeline, further cost reductions are possible. Batch inference on Amazon Bedrock can offer 50 percent discounts for overnight workloads, while prompt caching can decrease cached prompt token costs by up to 90 percent. Additionally, developers can manage reasoning expenses by setting a budgetTokens cap on Claude. By separating detection and reasoning stages, the pipeline provides a modular architecture that allows independent tuning or upgrading of individual components as more advanced models become available.