What are the key points?

Developer Sayed Ali Alkamel released skillscore, a CLI tool that grades AI agent SKILL.md files from 0–100. The tool enforces quality standards by checking files against official authoring guides from Anthropic, OpenAI, and Google. Designed for CI/CD, the offline-capable tool supports SARIF output to automatically flag issues in pull requests.

skillscore CLI Automates AI Agent Skill Validation

•Developer Sayed Ali Alkamel released skillscore, a CLI tool that grades AI agent SKILL.md files from 0–100.
•The tool enforces quality standards by checking files against official authoring guides from Anthropic, OpenAI, and Google.
•Designed for CI/CD, the offline-capable tool supports SARIF output to automatically flag issues in pull requests.

Developer Sayed Ali Alkamel has released skillscore, an open-source command-line interface (CLI) tool designed to lint and score AI agent SKILL.md files. The tool evaluates these files against official authoring guides from Anthropic, OpenAI, Google, and Flutter, providing a 0–100 quality score, a letter grade, and a list of actionable findings.

AI agent skills typically consist of a SKILL.md file containing YAML frontmatter and Markdown instructions. Because these descriptions remain in an agent's context window (the amount of information a model can process at once), poorly written skills can lead to inefficient token usage or incorrect agent behavior. The skillscore tool addresses this by enforcing rules across seven categories, including frontmatter validity, description quality, conciseness, and instruction structure.

The tool operates entirely offline and deterministically, meaning the same input consistently produces the same result. It is designed for CI/CD integration, allowing developers to set score thresholds like --min-score 80 to gate deployments. It supports output formats such as JSON and SARIF 2.1.0 (a standard for reporting static analysis results), which enables GitHub to annotate pull requests with identified issues.

Rules are weighted across seven categories, with a potential penalty of up to 15 points in the safety category if bundled scripts lack documentation. For example, a test run against the Flutter team's public widget-testing skill resulted in a score of 90/100. While the skill performed well, the tool identified missing boundary clauses and anti-patterns, providing specific instructions for remediation. Users can also request explanations for any finding to understand the rationale and source guide behind a rule.

The project is currently at v0.1.0 and is built on the Dart programming language. It serves as both a standalone CLI tool and a library for embedding scoring logic into other applications. Future updates planned for the tool include additional vendor-specific targets, an autofix mode for common mechanical errors, and a pre-packaged GitHub Action.

Developer Sayed Ali Alkamel has released skillscore, an open-source command-line interface (CLI) tool designed to lint and score AI agent SKILL.md files. The tool evaluates these files against official authoring guides from Anthropic, OpenAI, Google, and Flutter, providing a 0–100 quality score, a letter grade, and a list of actionable findings.

AI agent skills typically consist of a SKILL.md file containing YAML frontmatter and Markdown instructions. Because these descriptions remain in an agent's context window (the amount of information a model can process at once), poorly written skills can lead to inefficient token usage or incorrect agent behavior. The skillscore tool addresses this by enforcing rules across seven categories, including frontmatter validity, description quality, conciseness, and instruction structure.

The tool operates entirely offline and deterministically, meaning the same input consistently produces the same result. It is designed for CI/CD integration, allowing developers to set score thresholds like --min-score 80 to gate deployments. It supports output formats such as JSON and SARIF 2.1.0 (a standard for reporting static analysis results), which enables GitHub to annotate pull requests with identified issues.

Rules are weighted across seven categories, with a potential penalty of up to 15 points in the safety category if bundled scripts lack documentation. For example, a test run against the Flutter team's public widget-testing skill resulted in a score of 90/100. While the skill performed well, the tool identified missing boundary clauses and anti-patterns, providing specific instructions for remediation. Users can also request explanations for any finding to understand the rationale and source guide behind a rule.

The project is currently at v0.1.0 and is built on the Dart programming language. It serves as both a standalone CLI tool and a library for embedding scoring logic into other applications. Future updates planned for the tool include additional vendor-specific targets, an autofix mode for common mechanical errors, and a pre-packaged GitHub Action.