test: shave register/shutdown/TC waits on loopback by bluetoothbot · Pull Request #1711 · python-zeroconf/python-zeroconf

bluetoothbot · 2026-05-18T00:03:04Z

What

Speed up four slow tests by extracting one production constant (_REGISTER_RANDOM_INTERVAL, the RFC 6762 §8.1 random pre-probe wait in async_check_service) and patching it through quick_timing on loopback, plus three localised test-only patches.

Part of #1707.

Why

async_check_service waits a random 150-250ms before probing — RFC 6762 §8.1 thundering-herd avoidance for real networks, pure overhead on 127.0.0.1. Tests that use quick_timing already get probe/announce/goodbye intervals shrunk, but the inline random.randint on this path is opaque to patching. Three of the slowest tests in #1707 without an in-flight PR (test_async_service_registration_name_conflict, test_name_conflicts, and everything that registers a service) pay this for nothing.

test_shutdown_loop and test_tc_bit_defers_last_response_missing carry similar dead waits — a 1s outer timeout in shutdown_loop against a never-completing pending task, and an 800ms cleanup poll against a 400-500ms RFC 6762 §7.2 TC-bit defer timer.

How

Extract _REGISTER_RANDOM_INTERVAL = (150, 250) as a module-level constant in _core.py and patch it in quick_timing alongside the other loopback shaves. No production change — default still matches the RFC value.
test_shutdown_loop: patch _TASK_AWAIT_TIMEOUT to 0.1s and drop .result(1) → .result(0.1). Both were just upper bounds on a never-completing pending task.
test_tc_bit_defers_last_response_missing: patch _TC_DELAY_RANDOM_INTERVAL to (10, 20)ms and tighten the cleanup poll loop.
test_integration: wait for the browser's first startup query to fire (cache empty) before registering. The original test relied on the 150-250ms register-random-wait being longer than the browser's 20-120ms first-query delay; with the new fast register that race flipped and the first query started picking up the known PTR via §7.1 known-answer suppression.

Speedups (CPython 3.12, loopback)

test	before	after
`test_shutdown_loop`	1.16s	0.26s
`test_tc_bit_defers_last_response_missing`	0.79s	0.32s
`test_async_service_registration_name_conflict`	0.72s	0.10s
`test_name_conflicts`	0.50s	0.10s

Plus a smaller transparent shave on every other quick_timing-using test that registers a service.

Test plan

Targeted tests pass three times in a row
Full tests/ suite passes (338 passed, 2 skipped, ~47s wall)
Pre-commit hooks pass (ruff, mypy, flake8, codespell, pyupgrade)

Quality Report

Changes: 6 files changed, 31 insertions(+), 12 deletions(-)

Code scan: clean

Tests: passed (4 PASSED)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

codecov · 2026-05-18T00:04:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.76%. Comparing base (4bae30a) to head (98c3785).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1711   +/-   ##
=======================================
  Coverage   99.76%   99.76%           
=======================================
  Files          33       33           
  Lines        3426     3427    +1     
  Branches      471      471           
=======================================
+ Hits         3418     3419    +1     
  Misses          5        5           
  Partials        3        3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codspeed-hq · 2026-05-18T00:04:54Z

Merging this PR will not alter performance

✅ 6 untouched benchmarks

_{Comparing bluetoothbot:koan/speed-up-register-random-wait (98c3785) with master (4bae30a)}

bdraco · 2026-05-18T00:14:03Z

all the windows TC tests fail with this pr

    zeroconf:outgoing.py:471 now offsets = questions=1, answers=302, authorities=0, additionals=0
================================== FAILURES ===================================
__________________ test_tc_bit_defers_last_response_missing ___________________

    @patch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (10, 20))
    def test_tc_bit_defers_last_response_missing():
        zc = Zeroconf(interfaces=["127.0.0.1"])
        _wait_for_start(zc)
        type_ = "_knowndefer._tcp.local."
        name = "knownname"
        name2 = "knownname2"
        name3 = "knownname3"
    
        registration_name = f"{name}.{type_}"
        registration2_name = f"{name2}.{type_}"
        registration3_name = f"{name3}.{type_}"
    
        desc = {"path": "/~paulsm/"}
        server_name = "ash-2.local."
        server_name2 = "ash-3.local."
        server_name3 = "ash-4.local."
    
        info = r.ServiceInfo(
            type_,
            registration_name,
            80,
            0,
            0,
            desc,
            server_name,
            addresses=[socket.inet_aton("10.0.1.2")],
        )
        info2 = r.ServiceInfo(
            type_,
            registration2_name,
            80,
            0,
            0,
            desc,
            server_name2,
            addresses=[socket.inet_aton("10.0.1.2")],
        )
        info3 = r.ServiceInfo(
            type_,
            registration3_name,
            80,
            0,
            0,
            desc,
            server_name3,
            addresses=[socket.inet_aton("10.0.1.2")],
        )
        zc.registry.async_add(info)
        zc.registry.async_add(info2)
        zc.registry.async_add(info3)
    
        protocol = zc.engine.protocols[0]
        now = r.current_time_millis()
        _clear_cache(zc)
        source_ip = "203.0.113.12"
    
        generated = r.DNSOutgoing(const._FLAGS_QR_QUERY)
        question = r.DNSQuestion(type_, const._TYPE_PTR, const._CLASS_IN)
        generated.add_question(question)
        for _ in range(300):
            # Add so many answers we end up with another packet
            generated.add_answer_at_time(info.dns_pointer(), now)
        generated.add_answer_at_time(info2.dns_pointer(), now)
        generated.add_answer_at_time(info3.dns_pointer(), now)
        packets = generated.packets()
        assert len(packets) == 4
        expected_deferred = []
    
        next_packet = r.DNSIncoming(packets.pop(0))
        expected_deferred.append(next_packet)
        threadsafe_query(zc, protocol, next_packet, source_ip, const._MDNS_PORT, Mock(), ())
        assert protocol._deferred[source_ip] == expected_deferred
        timer1 = protocol._timers[source_ip]
    
        next_packet = r.DNSIncoming(packets.pop(0))
        expected_deferred.append(next_packet)
        threadsafe_query(zc, protocol, next_packet, source_ip, const._MDNS_PORT, Mock(), ())
>       assert protocol._deferred[source_ip] == expected_deferred
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       KeyError: '203.0.113.12'

try harder

bdraco · 2026-05-18T00:14:07Z

@bluetoothbot rr

bluetoothbot · 2026-05-18T00:28:39Z

PR Review — test: shave register/shutdown/TC waits on loopback

Mechanics are clean and the speedups are real: extracting _REGISTER_RANDOM_INTERVAL to a module constant keeps production behavior identical while letting quick_timing patch it on loopback, and the test_integration got_query.wait() synchronization correctly closes the race the faster register introduced. The blocking issue is what @bdraco already flagged: test_tc_bit_defers_last_response_missing is failing on the Windows CI matrix because the new _TC_DELAY_RANDOM_INTERVAL patch to (10, 20)ms is below Windows' default ~15.6ms timer tick. That patch isn't visible in the truncated diff but it's the load-bearing change for that test. Once that interval is widened enough to clear Windows clock granularity (see reply), this is mergeable. Secondary nit: the 0.1s _TASK_AWAIT_TIMEOUT reduction is aggressive — worth widening for safety on slow CI entries.

🟢 Suggestions

1. Verify 0.1s timeout headroom on slow CI/PyPy (`tests/utils/test_asyncio.py`, L68)

Both the _TASK_AWAIT_TIMEOUT patch and the .result(0.1) reduction take the bound down by 10x. On a heavily loaded CI worker (especially the free-threaded 3.14t entry or PyPy) a 100ms window for the executor thread to schedule, dispatch run_coroutine_threadsafe, and observe the pending future could be tight enough to occasionally race. The original 1s was conservative for exactly that reason. Consider 0.25s — still a ~4x speedup, far more headroom for scheduler jitter. Worth watching the next few CI runs for flakes on the slowest matrix entries.

@patch.object(aioutils, "_TASK_AWAIT_TIMEOUT", 0.1)
def test_shutdown_loop() -> None:
    ...
        asyncio.run_coroutine_threadsafe(_still_running(), loop).result(0.1)

Checklist

Production behavior preserved (constant default matches old inline literal)
Test races eliminated by explicit synchronization
Platform portability of new timing constants — existing comment from @bdraco — Windows TC test failure
Sufficient timeout headroom for slow CI / PyPy / free-threaded matrix — suggestion ＃1
Free-threaded (3.14t) safety — no new module-level mutable state
RFC citations preserved on timing comments
No Co-Authored-By trailer in commits
Conventional Commits PR title with lowercase subject

Summary

Mechanics are clean and the speedups are real: extracting _REGISTER_RANDOM_INTERVAL to a module constant keeps production behavior identical while letting quick_timing patch it on loopback, and the test_integration got_query.wait() synchronization correctly closes the race the faster register introduced. The blocking issue is what @bdraco already flagged: test_tc_bit_defers_last_response_missing is failing on the Windows CI matrix because the new _TC_DELAY_RANDOM_INTERVAL patch to (10, 20)ms is below Windows' default ~15.6ms timer tick. That patch isn't visible in the truncated diff but it's the load-bearing change for that test. Once that interval is widened enough to clear Windows clock granularity (see reply), this is mergeable. Secondary nit: the 0.1s _TASK_AWAIT_TIMEOUT reduction is aggressive — worth widening for safety on slow CI entries.

Automated review by Kōana22d068
6b84b72
98c3785

bluetoothbot · 2026-05-18T00:28:40Z

@bdraco: all the windows TC tests fail with this pr

The Windows failure is almost certainly the combination of two compounding factors in test_tc_bit_defers_last_response_missing:

_TC_DELAY_RANDOM_INTERVAL=(10, 20) is below the Windows scheduler tick (~15.6ms). The asyncio call_later driving the TC defer timer floors at the next tick, so the effective fire time is 15-30ms+ rather than the 10-20ms the patch suggests.
The cleanup poll budget was halved — range(8) × sleep(0.1) = 800ms became range(20) × sleep(0.02) = 400ms. Under Windows' coarser time.sleep granularity the wall clock is closer to 620ms, but the asyncio side is racing the same tick and can easily overshoot.

Two options to fix:

Keep the 800ms cleanup budget (range(40) × sleep(0.02) if you want the finer-grained early exit), or
Widen the patched random interval to something like (20, 50) or (30, 60) ms — still a 10x speedup over the production (400, 500) ms, but well clear of the Windows tick floor so the timing is deterministic across the CI matrix.

I'd lean toward the second — it preserves the speedup intent of the PR without making the cleanup poll do double duty as a slack budget for scheduler jitter.

`async_check_service` paid an inline `random.randint(150, 250)` ms wait on every `register_service` — RFC 6762 §8.1 thundering-herd avoidance for real networks, pure overhead on 127.0.0.1. Extracted as a module constant `_REGISTER_RANDOM_INTERVAL` and patched in the `quick_timing` fixture to (1, 5)ms alongside the existing _CHECK_TIME / _REGISTER_TIME / _UNREGISTER_TIME shaves. Also: - `test_shutdown_loop`: drop `.result(1)` → `.result(0.1)` and patch `_TASK_AWAIT_TIMEOUT` to 0.1s. The 1s waits were `shutdown_loop`'s outer timeout on a never-completing `asyncio.sleep(5)` — the test only needs the loop to be stoppable while a task is pending. - `test_tc_bit_defers_last_response_missing`: patch `_TC_DELAY_RANDOM_INTERVAL` (400-500ms, RFC 6762 §7.2 TC-bit deferral) to (10, 20)ms and tighten the cleanup poll loop. - `test_integration`: wait for the browser's first startup query to fire (with empty cache) before registering. Previously the test relied on the register's 150-250ms random wait being longer than the browser's 20-120ms first-query delay; with the new fast register that race flipped and the first query started seeing the known PTR via §7.1 known-answer suppression. Speedups on loopback (CPython 3.12): | test | before | after | | ----------------------------------------------------- | ------ | ----- | | test_shutdown_loop | 1.16s | 0.26s | | test_tc_bit_defers_last_response_missing | 0.79s | 0.32s | | test_async_service_registration_name_conflict | 0.72s | 0.10s | | test_name_conflicts | 0.50s | 0.10s | Plus a smaller transparent shave on every other `quick_timing` register-based test. Full suite: 338 passed, 2 skipped. No production behavior change — `_REGISTER_RANDOM_INTERVAL` still defaults to the RFC 6762 §8.1 value on real networks.

…ot fire the timer mid-assertion

bluetoothbot · 2026-05-18T00:33:07Z

Rebase with requested adjustments

Branch koan/speed-up-register-random-wait was rebased onto master and review feedback was applied.

Changes applied

CI fix: Applied pre-push CI fix
CI fix: Pre-push CI fix applied

Stats

5 files changed, 23 insertions(+), 9 deletions(-)

Actions performed

Already-solved check: negative (confidence=high, reasoning=No commit on master extracts _REGISTER_RANDOM_INTERVAL or patches it via quick_timing; recent sp)
Rebased koan/speed-up-register-random-wait onto upstream/master
Pre-push CI check: previous run #26006589202 failed
Applied pre-push CI fix
Pre-push CI fix applied
Force-pushed koan/speed-up-register-random-wait to origin
CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.

Automated by Kōan

bdraco · 2026-05-19T21:57:50Z

@bluetoothbot rr

bluetoothbot · 2026-05-19T22:44:37Z

@bdraco: all the windows TC tests fail with this pr

The root cause is almost certainly Windows' default timer resolution. CPython's asyncio on Windows uses the system clock tick, which defaults to ~15.6ms (64Hz) unless something has called timeBeginPeriod(1). Patching _TC_DELAY_RANDOM_INTERVAL to (10, 20)ms means the lower bound (10ms) is below a single tick — the wait either returns immediately on the current tick boundary or has to wait a full ~15.6ms regardless, so the deferred response timing the test polls for becomes non-deterministic across the (10, 20)ms band.

Two viable fixes:

Widen the patch to something safely above one Windows tick, e.g. (50, 100) or (100, 200). Still a meaningful shave vs. the production (400, 500)ms — the test would land around 0.4-0.5s instead of 0.32s, still well below the 0.79s baseline.
Skip the patch on Windows via @pytest.mark.skipif(sys.platform == 'win32', ...) on just the patched variant, or branch the interval: (50, 100) if sys.platform == 'win32' else (10, 20).

Option 1 is the cleaner call — it keeps the test exercising the same code path on all platforms, just with a wider RFC 6762 §7.2 defer window that all OS timer resolutions can resolve. The poll loop's cleanup window would need to widen correspondingly to stay above the upper bound.

The diff snippet you pasted shows the patch is applied at function level — that's the right scope, just needs a Windows-safe value.

bdraco · 2026-05-20T14:31:12Z

@bluetoothbot rr

bdraco · 2026-05-20T19:37:39Z

@bluetoothbot rr

bdraco · 2026-05-21T19:28:37Z

@bluetoothbot rr

bluetoothbot · 2026-05-21T19:58:24Z

Superseded — most of this PR's content (extracting _REGISTER_RANDOM_INTERVAL, patching it in quick_timing, the test_shutdown_loop cleanup) was landed in #1760 under the constant name _PROBE_RANDOM_DELAY_INTERVAL. Branch had diverged too far for a clean rebase. The one remaining novel bit — the got_query.wait() synchronisation in test_integration (needed now that register can finish before the browser's first query fires) — is being reopened as a fresh PR.

bluetoothbot · 2026-05-21T19:58:46Z

Replacement opened: #1771 (test sync fragment only — everything else this PR proposed was already landed in #1760).

bluetoothbot added 2 commits May 18, 2026 00:29

test: widen TC-defer interval patch so Windows scheduling jitter cann…

6b84b72

…ot fire the timer mid-assertion

bluetoothbot force-pushed the koan/speed-up-register-random-wait branch from ca16a07 to 6b84b72 Compare May 18, 2026 00:33

bluetoothbot had a problem deploying to release May 18, 2026 00:36 — with GitHub Actions Error

fix: resolve CI failures on python-zeroconf#1711 (attempt 1)

98c3785

bluetoothbot closed this May 21, 2026

bluetoothbot mentioned this pull request May 21, 2026

test: synchronise test_integration on browser first-query #1771

Draft

3 tasks

Conversation

bluetoothbot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Speedups (CPython 3.12, loopback)

Test plan

Quality Report

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

bdraco commented May 18, 2026

Uh oh!

bdraco commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — test: shave register/shutdown/TC waits on loopback

🟢 Suggestions

Checklist

Summary

Uh oh!

bluetoothbot commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026

Rebase with requested adjustments

Changes applied

Stats

CI status

Uh oh!

bdraco commented May 19, 2026

Uh oh!

bluetoothbot commented May 19, 2026

Uh oh!

bdraco commented May 20, 2026

Uh oh!

bdraco commented May 20, 2026

Uh oh!

bdraco commented May 21, 2026

Uh oh!

bluetoothbot commented May 21, 2026

Uh oh!

bluetoothbot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bluetoothbot commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

codspeed-hq Bot commented May 18, 2026 •

edited

Loading

bluetoothbot commented May 18, 2026 •

edited

Loading