US States Struggle to Measure AI Value
- •Most US states now operate active AI pilot programs
- •Fragmented adoption hinders enterprise-wide AI scaling across agencies
- •Only seven states have established metrics to prove public value
Across the United States, a quiet but widespread transformation is taking place within state bureaucracies. A new landscape assessment from Code for America reveals that nearly every state has moved past the concept phase, launching at least one pilot program to test the waters of artificial intelligence. Yet, despite this high adoption rate, the actual impact of these initiatives remains shrouded in ambiguity. The transition from testing a prototype to realizing measurable improvements in public service is proving to be a significant hurdle for government agencies nationwide.
The core issue highlighted by the report is a profound lack of evaluation mechanisms. While governments are eager to leverage tools like generative AI for operational efficiencies, many are struggling to define what 'success' looks like in a public-sector context. Currently, only seven states are classified as 'established' in their ability to measure the outcomes of their AI deployments. Without these metrics, the shift from experimental sandboxes to enterprise-scale workflows remains fragmented and, in some cases, precarious.
The path to maturity is clearly delineated by the researchers, moving from readiness and piloting to full-scale implementation and impact. Readiness, the report notes, requires more than just access to software; it demands foundational data infrastructure and robust governance frameworks. Many states are currently trapped in the early or developing stages, often because they lack the high-quality data necessary to make AI effective. Conversely, the states that are currently pulling ahead—such as Maryland, Texas, and Utah—share a common playbook. They have prioritized strong executive leadership, cross-agency data governance, and secure sandbox environments where experimentation can occur without risking public services.
For university students observing this shift, the takeaway is clear: the bottleneck for AI in government is rarely just the technology itself. It is a challenge of systems design and bureaucratic alignment. As agencies continue to deploy these tools, the focus must shift from rapid experimentation to the boring, essential work of data hygiene and outcome measurement. Only when governments can definitively prove that an AI agent or language model is actually saving time or improving lives will these tools move from novelty to necessity.
As this sector matures, we are likely to see a tightening of policy and perhaps more standardized frameworks for public-sector AI. The era of 'AI in everything' is effectively here, but the era of 'AI that provides provable public value' is still very much in its infancy. For those interested in the intersection of public policy and technology, this represents one of the most critical frontiers in the field today.