Benchmarking AI Language Accessibility in Aphasia Therapy
- •Researchers developed an ABCD simulation framework to test LLM accessibility in aphasia therapy clinical dialogues.
- •The study benchmarked Claude, GPT, and Gemini models using 16 standardized readability metrics across multiple configurations.
- •Findings showed Gemini excelled in zero-shot tasks, while few-shot prompting and advanced reasoning improved overall clinician accessibility.
Researchers Gerald C. Imaezue, K. V. Maram, David Ajayi, and others published a study in the Journal of Speech, Language and Hearing Research on June 24, 2026, evaluating large language models (LLMs) for aphasia therapy. The team utilized the Agent-Based Conversational Dialogue (ABCD) simulation method to test LLM-driven clinicians interacting with artificial intelligence (AI)-simulated patients who exhibit speech impairments. This preclinical testbed facilitates multi-turn, spoken therapeutic dialogue evaluations without requiring human subjects. The study benchmarked three model families—Claude, GPT, and Gemini—across varying configurations, including zero-shot and few-shot prompting alongside standard and advanced reasoning modes. Effectiveness was measured through 16 standardized readability metrics, including the Flesch Reading Ease and Dale-Chall scores, to assess how well models generate accessible clinician language during Response Elaboration Training. Results indicated distinct accessibility signatures across architectures. Few-shot prompting and advanced reasoning generally improved response accessibility. Gemini displayed superior performance under zero-shot and standard reasoning conditions. The authors concluded that LLMs possess systematic differences in adapting language to impaired speech, and the ABCD framework provides a scalable method for benchmarking conversational agents before clinical translation in communication rehabilitation.