Agentic Workflows
The Flakiness CLI ships with skills — structured documentation packages that teach AI coding agents how to query and analyze your test data. Once installed, an agent can investigate flaky tests, find regressions, and build FQL queries on its own.
Supported Agents
Section titled “Supported Agents”| Agent | Skill directory |
|---|---|
| Claude Code | .claude/skills/ |
| Codex | .codex/skills/ |
| Cursor | .cursor/skills/ |
Installing Skills
Section titled “Installing Skills”flakiness skills install --agent <claude|codex|cursor>This installs the flakiness-investigation skill that teaches the agent how to use the Flakiness CLI to query test data. The skill covers flakiness list tests with all its options, FQL filter syntax, and common investigation recipes.
Once installed, you can ask your agent things like:
- “Fix my PR’s failing tests” (uses
--prto fetch failures from a specific pull request) - “Find the most flaky tests in our project”
- “Show me all regressions in the e2e/ directory”
- “Which tests have been failing with timeout errors?”
The agent will translate your request into the appropriate flakiness list tests command with the right FQL filters.
Restart your agent after installation to pick up new skills.
Example: Fix PR Tests
Section titled “Example: Fix PR Tests”The most common agentic workflow is fixing test failures in a pull request. With the skill installed, just ask your agent:
Fix the failing tests in PR #42 in myorg/myprojectThe agent will:
- Run
flakiness list tests --project myorg/myproject --pr 42 --fql 's:regressed'to find tests that the PR broke (tests passing on the target branch but failing in the PR) - Read the reported file paths and error messages
- Make targeted code fixes
- Ignore
failedtests (pre-existing failures on the target branch) andflakedtests (passed on retry)
This works with any supported agent — Claude Code, Codex, or Cursor.
Example: Deflake Cron Job
Section titled “Example: Deflake Cron Job”You can set up a scheduled GitHub Actions workflow that uses Claude Code to automatically investigate and fix flaky tests.
This workflow:
- Runs on a schedule (e.g. every Monday at 9 AM)
- Uses GitHub OIDC to authenticate with Flakiness.io — no secrets needed
- Runs Claude Code in non-interactive mode (
-e) with the flakiness skill to find flaky tests, investigate root causes, and open a PR with fixes
name: Deflake Testson: schedule: - cron: '0 9 * * 1' # Every Monday at 9:00 UTC workflow_dispatch: {} # Allow manual trigger
permissions: contents: write pull-requests: write id-token: write
jobs: deflake: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Install Flakiness CLI run: curl -LsSf https://cli.flakiness.io/install.sh | sh
- name: Install Flakiness skill run: flakiness skills install --agent claude
- name: Install Claude Code run: npm i -g @anthropic-ai/claude-code
- name: Deflake env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} FLAKINESS_PROJECT: myorg/myproject run: | claude -e "Use the flakiness skill to find the top 5 flakiest tests \ in our project. For each one, investigate the root cause in our \ codebase and open a PR with a fix. Use a separate PR per test."How It Works
Section titled “How It Works”flakiness skills install --agent claudeinstalls theflakiness-investigationskill so Claude Code knows how to use the CLIclaude -e "..."runs Claude Code with a prompt in non-interactive mode- Claude Code reads the installed skill, runs
flakiness list tests --fql 'flip>0%' --sort flip_rate --sort-dir descto find flaky tests, then investigates and fixes each one - GitHub OIDC handles authentication transparently — the
FLAKINESS_PROJECTenv var tells the CLI which project to query
You can customize the prompt to focus on specific areas:
# Only investigate regressions in e2e testsclaude -e "Use the flakiness skill to find regressions in e2e/ files and fix them."
# Focus on slow testsclaude -e "Use the flakiness skill to find tests slower than 10s and optimize them."
# Investigate tests failing with a specific errorclaude -e "Use the flakiness skill to find tests failing with 'timeout' errors and fix the root cause."