Skip to content

fix: bound TC-deferred queues against spoofed-source flood OOM#1751

Merged
bdraco merged 1 commit into
masterfrom
fix/bound-tc-deferred-queues
May 20, 2026
Merged

fix: bound TC-deferred queues against spoofed-source flood OOM#1751
bdraco merged 1 commit into
masterfrom
fix/bound-tc-deferred-queues

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 20, 2026

Summary

AsyncListener.handle_query_or_defer retains every truncated incoming query in self._deferred[addr] and arms a per-addr timer that flushes within ~500ms; neither the per-addr list nor the number of distinct addr keys was capped, so a LAN peer streaming byte-distinct TC-flagged queries; trivially amplified by spoofing source IPs, could grow _deferred and _timers until the process OOMs. This PR adds a per-addr cap, a total-addrs cap, and FIFO eviction so the worst-case memory footprint of the TC-deferral path stays bounded.

Details

  • _MAX_DEFERRED_PER_ADDR = 16 (const.py): bounds the per-addr queue. RFC 6762 §18.5 anticipates only a handful of segments per truncated query, and 16 is well above legitimate need; the existing O(N) for incoming in reversed(deferred) dedup scan now runs over a constant-bounded list, so the per-arrival CPU cost is constant.
  • _MAX_DEFERRED_ADDRS = 512 (const.py): bounds the total distinct addrs with in-flight TC-deferral state. The cap leaves headroom for a legitimate burst (a LAN-wide power-resume / boot storm where many devices announce at once) while bounding worst-case memory at ~72 MB even under a spoofed-source flood (512 x 16 x _MAX_MSG_ABSOLUTE).
  • _evict_oldest_deferred (src/zeroconf/_listener.py): when a new addr would push _deferred past the cap, the oldest insertion-order entry is evicted; its timer is cancelled, its deadline is dropped, and its queue is discarded. Eviction is FIFO (oldest by first-seen, via dict insertion order) rather than LRU so an active flooder cannot pin its slots by re-sending into the same addr.
  • Both constants are cdef'd in src/zeroconf/_listener.pxd so len(self._deferred) >= _MAX_DEFERRED_ADDRS and len(deferred) >= _MAX_DEFERRED_PER_ADDR compile to direct C integer compares; verified via cython -a.

Test plan

  • New tests in tests/test_core.py:
    • test_tc_bit_per_addr_queue_is_bounded: 20 distinct TC-flagged queries from one addr produce a 16-entry queue and retain the first 16 by insertion order.
    • test_tc_bit_total_addrs_is_bounded: 516 distinct source IPs sending TC packets keep _deferred and _timers at exactly 512 entries; uses patch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (60_000, 60_001)) so the reassembly timer cannot fire during the test window (otherwise PyPy / slow runners can drain entries between the last enqueue and the assertion).
    • test_tc_bit_eviction_drops_oldest_addr: filling to capacity then adding one more addr drops the original-oldest from both _deferred and _timers; the new addr is present.
  • REQUIRE_CYTHON=1 poetry install --only=main,dev && poetry run pytest tests/ --no-cov --timeout=60: 381 passed, 3 skipped.
  • SKIP_CYTHON=1 poetry run pytest tests/ --no-cov --timeout=60: 381 passed, 3 skipped.
  • poetry run pre-commit run on touched files: ruff, ruff format, mypy, codespell, flake8, cython-lint all green.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will not alter performance

✅ 14 untouched benchmarks


Comparing fix/bound-tc-deferred-queues (2c927fd) with master (8c9d6ce)1

Open in CodSpeed

Footnotes

  1. No successful run was found on master (6a83ab8) during the generation of this report, so 8c9d6ce was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.77%. Comparing base (8c9d6ce) to head (2c927fd).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1751   +/-   ##
=======================================
  Coverage   99.77%   99.77%           
=======================================
  Files          33       33           
  Lines        3486     3497   +11     
  Branches      487      489    +2     
=======================================
+ Hits         3478     3489   +11     
  Misses          5        5           
  Partials        3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bdraco bdraco requested a review from Copilot May 20, 2026 13:45
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 20, 2026

@bluetoothbot review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the TC-bit (“truncated”) query deferral/reassembly path in AsyncListener.handle_query_or_defer against spoofed-source floods that could otherwise grow _deferred/_timers without bound and lead to OOM.

Changes:

  • Add global and per-address caps for TC-deferred state via new constants (_MAX_DEFERRED_ADDRS, _MAX_DEFERRED_PER_ADDR).
  • Implement eviction of the oldest deferred address when the global cap would be exceeded.
  • Add tests covering the new bounds and eviction behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/test_core.py Adds regression tests for per-addr cap, total-addr cap, and oldest-entry eviction behavior.
src/zeroconf/const.py Introduces _MAX_DEFERRED_PER_ADDR and _MAX_DEFERRED_ADDRS limits for TC deferral.
src/zeroconf/_listener.py Enforces the new caps and adds _evict_oldest_deferred() to bound memory use under TC floods.
src/zeroconf/_listener.pxd Declares the new constants and eviction helper for Cython-compiled fast-path comparisons.
Comments suppressed due to low confidence (1)

tests/test_core.py:906

  • test_tc_bit_eviction_drops_oldest_addr has the same potential timing flakiness as the total-addrs test: filling _MAX_DEFERRED_ADDRS entries can take long enough for the ~400–500ms TC timers to start firing and draining _deferred/_timers, making the capacity/eviction assertions unreliable. Patch zeroconf._listener._TC_DELAY_RANDOM_INTERVAL to a larger value within this test (or otherwise prevent timer callbacks) so eviction is tested deterministically.
    # Fill the dict to exactly capacity with predictable addrs.
    fillers = [f"203.0.113.{i // 256}.{i % 256}" for i in range(const._MAX_DEFERRED_ADDRS)]
    for source_ip in fillers:
        threadsafe_query(zc, protocol, r.DNSIncoming(raw), source_ip, const._MDNS_PORT, Mock(), ())
    assert len(protocol._deferred) == const._MAX_DEFERRED_ADDRS
    oldest = fillers[0]
    assert oldest in protocol._deferred


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_core.py Outdated
Comment thread tests/test_core.py
@bdraco bdraco force-pushed the fix/bound-tc-deferred-queues branch from eb031b7 to e2ee4dc Compare May 20, 2026 13:51
@bluetoothbot
Copy link
Copy Markdown
Contributor

bluetoothbot commented May 20, 2026

PR Review — fix: bound TC-deferred queues against spoofed-source flood OOM

Solid DoS hardening. The two-tier cap (per-addr + total addrs) is the right shape for this class of bug, and FIFO eviction keyed off dict insertion order correctly prevents an active flooder from pinning slots by re-using a source IP. Ordering of the checks is sound: the eviction predicate (addr not in self._deferred and len >= cap) only fires for new addrs and guarantees _evict_oldest_deferred is never called on an empty dict; eviction cleans up _deferred, _timers (via the existing _cancel_any_timers_for_addr), and _deferred_deadlines consistently. Placing the per-addr cap check before the O(N) dedup scan is a nice secondary win — the scan is now O(_MAX_DEFERRED_PER_ADDR) = O(16) rather than O(N) in adversarial input.

Memory math checks out: 512 × 16 × _MAX_MSG_ABSOLUTE ≈ 72 MB, matching the PR description (the earlier 256/36MB number from one of Copilot's comments referred to a previous revision and is correctly updated). .pxd declarations match the C-int compare convention already used for _MAX_MSG_ABSOLUTE/_RECENT_PACKETS_MAX, so the hot path stays in cdef land. Tests pin the right invariants (per-addr cap, total-addrs cap, FIFO eviction identity) and correctly suppress timer firing via patch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (60_000, 60_001)) to avoid the flakiness Copilot flagged on the earlier revision. Only nit is the docstring length on _evict_oldest_deferred vs. the repo's terse-docstring convention. Merge-ready.


🟢 Suggestions

1. Docstring is longer than the project's style guide recommends (`src/zeroconf/_listener.py`, L312-323)

CLAUDE.md in this repo says docstrings should default to single-line and that rationale/motivation belongs in commit messages and PR bodies, not docstrings. The first line here (Discard the oldest deferred addr's reassembly state.) is the contract; the remaining four lines re-state what the PR description already covers (FIFO-vs-LRU reasoning, attacker behavior).

The FIFO-vs-LRU note is genuinely non-obvious and worth keeping somewhere — consider either trimming the docstring to the one-line summary and moving the FIFO rationale to a short inline comment where it actually matters (next to next(iter(self._deferred))), or leaving the docstring at the single-line contract and dropping the rest entirely since the PR/commit already carries the why.

Not blocking — just inconsistent with how the rest of _listener.py documents methods like _cancel_any_timers_for_addr.

    def _evict_oldest_deferred(self) -> None:
        """Discard the oldest deferred addr's reassembly state.

        Used when ``_MAX_DEFERRED_ADDRS`` would be exceeded; the
        evicted addr's queue and timer are dropped without firing, so
        the bound holds even when an attacker rotates source IPs.
        Eviction is FIFO (oldest by first-seen, via dict insertion
        order) rather than LRU so an active flooder cannot pin its
        slots by re-sending into the same addr.
        """

Checklist

  • Per-addr and total caps are enforced before unbounded growth
  • Eviction cannot be called on an empty _deferred (guarded by len >= cap)
  • Eviction cleans _deferred, _timers, and _deferred_deadlines consistently
  • FIFO (insertion-order) eviction resists pinning by an active flooder
  • .pxd declarations match .py types (unsigned int) and cover the new helper
  • Tests pin observable behavior (sizes, retained-set identity) without inspecting source
  • Tests suppress TC-reassembly timer to remove timing flakiness on slow runners
  • PR title is Conventional Commits + lowercase subject
  • Docstrings follow project terse-single-line convention — suggestion #1
  • RFC citation present for protocol-affecting bound (RFC 6762 §18.5)

Summary

Solid DoS hardening. The two-tier cap (per-addr + total addrs) is the right shape for this class of bug, and FIFO eviction keyed off dict insertion order correctly prevents an active flooder from pinning slots by re-using a source IP. Ordering of the checks is sound: the eviction predicate (addr not in self._deferred and len >= cap) only fires for new addrs and guarantees _evict_oldest_deferred is never called on an empty dict; eviction cleans up _deferred, _timers (via the existing _cancel_any_timers_for_addr), and _deferred_deadlines consistently. Placing the per-addr cap check before the O(N) dedup scan is a nice secondary win — the scan is now O(_MAX_DEFERRED_PER_ADDR) = O(16) rather than O(N) in adversarial input.

Memory math checks out: 512 × 16 × _MAX_MSG_ABSOLUTE ≈ 72 MB, matching the PR description (the earlier 256/36MB number from one of Copilot's comments referred to a previous revision and is correctly updated). .pxd declarations match the C-int compare convention already used for _MAX_MSG_ABSOLUTE/_RECENT_PACKETS_MAX, so the hot path stays in cdef land. Tests pin the right invariants (per-addr cap, total-addrs cap, FIFO eviction identity) and correctly suppress timer firing via patch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (60_000, 60_001)) to avoid the flakiness Copilot flagged on the earlier revision. Only nit is the docstring length on _evict_oldest_deferred vs. the repo's terse-docstring convention. Merge-ready.


Automated review by Kōan2c927fd

@bdraco bdraco force-pushed the fix/bound-tc-deferred-queues branch from e2ee4dc to ad6169a Compare May 20, 2026 14:02
@bdraco bdraco requested a review from Copilot May 20, 2026 14:03
@bdraco bdraco force-pushed the fix/bound-tc-deferred-queues branch from ad6169a to 58e15e3 Compare May 20, 2026 14:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread tests/test_core.py Outdated
Comment thread src/zeroconf/const.py
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 20, 2026

@bluetoothbot review

`AsyncListener.handle_query_or_defer` retains every truncated query in
`self._deferred[addr]` and arms a per-addr timer that flushes within
~500ms. Neither the per-addr list nor the number of distinct addr keys
was capped, so a LAN peer streaming byte-distinct TC-flagged queries —
trivially amplified by spoofing source IPs — could grow `_deferred`
and `_timers` until the process OOMs.

Add two caps and a FIFO eviction:

- `_MAX_DEFERRED_PER_ADDR = 16` bounds the per-addr queue. RFC 6762
  §18.5 anticipates only a handful of segments per truncated query.
  The existing O(N) dedup scan now runs over a constant-bounded list.
- `_MAX_DEFERRED_ADDRS = 512` bounds the total distinct addrs with
  in-flight state. The cap leaves headroom for a legitimate burst
  (a LAN-wide power-resume / boot storm where many devices announce
  at once) while bounding worst-case memory at ~72 MB. When a new
  addr would exceed the cap, the oldest insertion-order entry is
  evicted (FIFO, not LRU, so an active flooder cannot pin its slots
  by re-sending into the same addr): its timer is cancelled and its
  queue is discarded so the bound holds even when an attacker rotates
  source IPs faster than the reassembly timer can drain entries.

Both constants are cdef'd in `_listener.pxd` so the bound checks
compile to direct C integer compares.
@bdraco bdraco force-pushed the fix/bound-tc-deferred-queues branch from 58e15e3 to 2c927fd Compare May 20, 2026 14:13
@bdraco bdraco requested a review from Copilot May 20, 2026 14:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@bdraco bdraco marked this pull request as ready for review May 20, 2026 14:20
@bdraco bdraco merged commit b22c8ff into master May 20, 2026
40 checks passed
@bdraco bdraco deleted the fix/bound-tc-deferred-queues branch May 20, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants