Legal AI Performance Dependent on Scaffold Engineering
- •Legal Nodes study shows model performance depends heavily on the surrounding AI scaffold architecture.
- •Claude Opus 4.8 evaluated across three environments shows performance varies significantly based on workflow integration.
- •MikeOSS delivers 60% to 90% cost savings per task compared to alternative legal AI scaffolds.
A study conducted by the legal consultancy Legal Nodes, which utilized the Claude Opus 4.8 model, found that the performance of legal AI is heavily dependent on the 'scaffold'—the software wrapper surrounding the base model—rather than the model's inherent capabilities alone. Legal AI expert Nestor Dubnevych stated that the quality of legal output relies on context, workflow logic, prompt refinement, planning, agentic loops, retrieval, and tool calling. The study tested Claude Opus 4.8 across three distinct environments: Claude Chat, the Cowork with Legal Plugin, and MikeOSS. Researchers compiled a dataset of 40 specific tasks covering data protection and digital operational resilience to compare how the same model performed within these varying architectures.
The evaluation aimed to challenge the industry trend of focusing solely on base model benchmarking. According to Legal Nodes, previous leaderboards often showed low scores, leaving it unclear how much performance variance resulted from model quality versus scaffold engineering. By testing the same model in different settings, the researchers confirmed that model-only evaluations provide an incomplete picture of AI utility in legal workflows. For legal teams, refining the structure of company-specific context layers may offer a faster path to performance improvements than waiting for industry-wide fine-tuning results.
Will Chen, the creator of MikeOSS, noted that his platform achieved satisfactory performance in the benchmark. Although results were slightly lower than the Claude and Cowork environments, MikeOSS delivered significant cost savings, approximately 60% and 90% per task relative to Cowork and Claude, respectively. Chen plans to enhance the platform for privacy and compliance advisory tasks by integrating additional specialized skills. These findings emphasize that as token costs rise, the efficiency of scaffold engineering will increasingly influence the legal industry's software selection processes.