What are the key points?

AWS and Vexcel launched an AI-queryable aerial imagery system using Amazon Bedrock and OpenSearch Serverless. The system uses multimodal embeddings to search across seven distinct geospatial views without per-feature training. Amazon Nova Multimodal Embeddings outperformed other models in F1 scores for Chicago-based swimming pool and road benchmarks.

AWS and Vexcel Launch Searchable Multimodal Aerial Imagery

•AWS and Vexcel launched an AI-queryable aerial imagery system using Amazon Bedrock and OpenSearch Serverless.
•The system uses multimodal embeddings to search across seven distinct geospatial views without per-feature training.
•Amazon Nova Multimodal Embeddings outperformed other models in F1 scores for Chicago-based swimming pool and road benchmarks.

The AWS Generative AI Innovation Center and Vexcel have developed a multimodal search architecture that transforms high-resolution aerial imagery into a searchable knowledge base using natural language. Vexcel operates a global aerial imagery program, collecting data across 45+ countries through orthomosaic, oblique, and elevation-based perspectives. The project aims to eliminate the need for manual, tile-by-tile inspection or the training of bespoke computer vision models for specific features. The resulting system, Vexcel Intelligence, uses Amazon Bedrock and Amazon OpenSearch Serverless to ingest, embed, and query geospatial data at scale.

Geospatial search presents unique challenges compared to standard consumer photography, as each map tile contains seven distinct views: an orthophoto, four oblique angles, and two elevation models (Digital Surface Model and Digital Terrain Model). An embedding model relying on a single view provides incomplete information, making fusion strategies critical. To address the lack of labeled aerial datasets, the team built an evaluation framework using OpenStreetMap ground truth via the Overpass API, enabling repeatable benchmarks across roughly 100 configurations.

The five-stage pipeline includes area-of-interest selection, automated imagery ingestion, model-based embedding and captioning, semantic search, and evaluation. During research, the team compared models including Amazon Nova Multimodal Embeddings, Amazon Titan Multimodal Embeddings G1, and Cohere Embed v4. Amazon Nova Multimodal Embeddings achieved the highest F1 scores across benchmark queries for swimming pools and road networks. The system also utilizes LLMs to generate unified textual descriptions across all seven views, which are indexed alongside embeddings to improve retrieval accuracy. Metadata extraction allows for k-nearest neighbor (k-NN) filtering, narrowing search results based on keyword tags before performing vector similarity matches.

System performance relies on careful calibration of the parameter K, which determines the number of nearest-neighbor results retrieved per query. Large values of K for sparse features—such as swimming pools—can introduce noise and reduce precision, while small values for abundant features—such as roads—limit recall. The evaluation framework supports both tile-based and entity-based metrics to assess whether the system correctly identifies relevant locations or maximizes the recovery of individual features. This modular architecture allows developers to swap embedding models and fusion strategies without changing the underlying codebase, facilitating rapid optimization and testing of geospatial search systems.

The AWS Generative AI Innovation Center and Vexcel have developed a multimodal search architecture that transforms high-resolution aerial imagery into a searchable knowledge base using natural language. Vexcel operates a global aerial imagery program, collecting data across 45+ countries through orthomosaic, oblique, and elevation-based perspectives. The project aims to eliminate the need for manual, tile-by-tile inspection or the training of bespoke computer vision models for specific features. The resulting system, Vexcel Intelligence, uses Amazon Bedrock and Amazon OpenSearch Serverless to ingest, embed, and query geospatial data at scale.

Geospatial search presents unique challenges compared to standard consumer photography, as each map tile contains seven distinct views: an orthophoto, four oblique angles, and two elevation models (Digital Surface Model and Digital Terrain Model). An embedding model relying on a single view provides incomplete information, making fusion strategies critical. To address the lack of labeled aerial datasets, the team built an evaluation framework using OpenStreetMap ground truth via the Overpass API, enabling repeatable benchmarks across roughly 100 configurations.

The five-stage pipeline includes area-of-interest selection, automated imagery ingestion, model-based embedding and captioning, semantic search, and evaluation. During research, the team compared models including Amazon Nova Multimodal Embeddings, Amazon Titan Multimodal Embeddings G1, and Cohere Embed v4. Amazon Nova Multimodal Embeddings achieved the highest F1 scores across benchmark queries for swimming pools and road networks. The system also utilizes LLMs to generate unified textual descriptions across all seven views, which are indexed alongside embeddings to improve retrieval accuracy. Metadata extraction allows for k-nearest neighbor (k-NN) filtering, narrowing search results based on keyword tags before performing vector similarity matches.

System performance relies on careful calibration of the parameter K, which determines the number of nearest-neighbor results retrieved per query. Large values of K for sparse features—such as swimming pools—can introduce noise and reduce precision, while small values for abundant features—such as roads—limit recall. The evaluation framework supports both tile-based and entity-based metrics to assess whether the system correctly identifies relevant locations or maximizes the recovery of individual features. This modular architecture allows developers to swap embedding models and fusion strategies without changing the underlying codebase, facilitating rapid optimization and testing of geospatial search systems.