Skip to content

Flip Rate

Flip rate measures how often a test changes its outcome between consecutive commits in a timeline. A “flip” is any transition where a test switches from passing to failing, or from failing to passing.

Flip rate is a well-established metric for quantifying test flakiness. Apple’s research paper “Modeling and Ranking Flaky Tests at Apple” (ICSE-SEIP 2020) uses flip rate as one of two core signals for ranking flaky tests, achieving near-perfect accuracy in identification and alignment with human interpretation of flakiness.

For a given test in a timeline:

Flip Rate = Flips / (Invocations - 1)

Where:

  • Invocations is the number of commits where the test was executed
  • Flips is the number of times the test’s status changed between consecutive commits

If a test has only 1 invocation, its flip rate is 0%. If a test has no invocations, the flip rate is not defined.

Unlike simple failure rate, flip rate captures the temporal pattern of test results. Two tests can have the exact same number of passes and failures but very different flip rates depending on the order of results.

Consider two tests, both with 3 passes and 2 failures across 5 commits:

Test A — failures are clustered together:

Commit 1Commit 2Commit 3Commit 4Commit 5
✅ passed✅ passed✅ passed❌ failed❌ failed

Transitions: ✅→✅, ✅→✅, ✅→❌, ❌→❌ — 1 flip out of 4 transitions = 25%

This pattern suggests a real regression rather than flakiness — the test was stable, then something broke it.

Test B — failures are scattered:

Commit 1Commit 2Commit 3Commit 4Commit 5
✅ passed❌ failed✅ passed❌ failed✅ passed

Transitions: ✅→❌, ❌→✅, ✅→❌, ❌→✅ — 4 flips out of 4 transitions = 100%

This pattern is a strong signal of a flaky test — the outcome keeps alternating regardless of code changes.

Commit historyFlipsInvocationsFlip Rate
✅ ✅ ✅ ✅ ✅050%
❌ ❌ ❌ ❌ ❌050%
✅ ✅ ✅ ❌ ❌1525%
✅ ❌ ✅ ✅ ✅2550%
✅ ❌ ✅ ❌ ✅45100%

A consistently passing or failing test has a flip rate of 0%. A test that alternates every commit has a flip rate of 100%.