Shard Balancing

When a test suite runs as several shards in parallel, the whole run is finished only once the slowest shard finishes. Splitting the suite by test count keeps the shards even in size, but not in time: whichever shard happens to hold the slow tests runs longer than the others.

Shard balancing splits by time instead. Each test’s recorded duration is used to assign tests to shards so that every shard is expected to take about the same wall-clock time. The durations come from a project’s run history, fetched through the Durations API.

Tradeoffs

Generating the split is not instant. Before a shard can start, it fetches the timing data and computes its test list, which takes around 10 seconds. That fixed cost only pays off when shards run for minutes. On short shards the savings are negligible, so balance suites whose shards already take minutes rather than seconds.

The split is held for 15 days. Every shard in a run is given the same timing snapshot, so all shards agree on one split, and that snapshot is kept for 15 days. After it expires, a restarted shard is computed against fresh data: restarting a failed shard more than two weeks after the original run can hand it a different set of tests than its siblings ran.

Supported runners

Balancing happens inside the test runner, which has to fetch the timing data and plan its run before executing. Not every runner can do this. Playwright Test is the first runner to support it, through a dedicated flakiness-playwright-shard binary; its reporter documentation covers the setup.