Standardizing Structured JSON Outputs Across Seven LLM APIs
- •CommitBrief standardizes structured JSON findings from seven different LLM API providers using a unified Go-based schema.
- •API providers use varying enforcement methods including forced tool calls, strict JSON schemas, and basic prompt instructions.
- •The system implements a retry-once-then-degrade strategy, falling back to Markdown rendering if structured output validation fails twice.
CommitBrief enables automated code reviews by standardizing LLM outputs into a consistent JSON schema, allowing the system to handle findings regardless of the underlying model provider. The software requires a specific Finding struct containing fields like severity, file path, line number, title, and description. Because seven different API providers enforce structural constraints differently, the system employs a multi-tiered approach: Anthropic, OpenAI, and Gemini use native mechanisms like forced tool calls or strict JSON schemas, while Ollama only guarantees syntactical JSON, and DeepSeek, Mistral, and Cohere rely entirely on prompt-based instructions to adhere to the schema.
To maintain consistency, every model response passes through a centralized ParseFindings function. This validator checks not only for valid JSON syntax but also verifies that the output conforms to the business logic of the Finding struct, such as confirming required fields are present and severity levels match one of five predefined categories: critical, high, medium, low, or info. By implementing this validation layer, the pipeline ensures that findings are machine-readable and actionable for downstream tools, such as card renderers or exit-code gates, regardless of whether a high-end commercial model or a local model generated the review.
The system handles model failures and unparseable outputs by implementing a retry-once-then-degrade strategy. If the initial response is malformed, CommitBrief automatically retries the request; if the second attempt also fails, the system falls back to rendering the raw model output as Markdown while issuing a single warning. This approach prevents pipeline crashes and ensures token usage is accurately summed across both attempts for cost tracking. Although structured output mechanisms increase parseability, they do not guarantee accuracy; therefore, the system includes explicit instructions to discourage models from hallucinating file paths or line numbers, and maintains an evaluation harness to measure model precision against known test datasets.