fix: bound TC-deferred queues against spoofed-source flood OOM#1751
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1751 +/- ##
=======================================
Coverage 99.77% 99.77%
=======================================
Files 33 33
Lines 3486 3497 +11
Branches 487 489 +2
=======================================
+ Hits 3478 3489 +11
Misses 5 5
Partials 3 3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@bluetoothbot review |
There was a problem hiding this comment.
Pull request overview
This PR hardens the TC-bit (“truncated”) query deferral/reassembly path in AsyncListener.handle_query_or_defer against spoofed-source floods that could otherwise grow _deferred/_timers without bound and lead to OOM.
Changes:
- Add global and per-address caps for TC-deferred state via new constants (
_MAX_DEFERRED_ADDRS,_MAX_DEFERRED_PER_ADDR). - Implement eviction of the oldest deferred address when the global cap would be exceeded.
- Add tests covering the new bounds and eviction behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/test_core.py |
Adds regression tests for per-addr cap, total-addr cap, and oldest-entry eviction behavior. |
src/zeroconf/const.py |
Introduces _MAX_DEFERRED_PER_ADDR and _MAX_DEFERRED_ADDRS limits for TC deferral. |
src/zeroconf/_listener.py |
Enforces the new caps and adds _evict_oldest_deferred() to bound memory use under TC floods. |
src/zeroconf/_listener.pxd |
Declares the new constants and eviction helper for Cython-compiled fast-path comparisons. |
Comments suppressed due to low confidence (1)
tests/test_core.py:906
test_tc_bit_eviction_drops_oldest_addrhas the same potential timing flakiness as the total-addrs test: filling_MAX_DEFERRED_ADDRSentries can take long enough for the ~400–500ms TC timers to start firing and draining_deferred/_timers, making the capacity/eviction assertions unreliable. Patchzeroconf._listener._TC_DELAY_RANDOM_INTERVALto a larger value within this test (or otherwise prevent timer callbacks) so eviction is tested deterministically.
# Fill the dict to exactly capacity with predictable addrs.
fillers = [f"203.0.113.{i // 256}.{i % 256}" for i in range(const._MAX_DEFERRED_ADDRS)]
for source_ip in fillers:
threadsafe_query(zc, protocol, r.DNSIncoming(raw), source_ip, const._MDNS_PORT, Mock(), ())
assert len(protocol._deferred) == const._MAX_DEFERRED_ADDRS
oldest = fillers[0]
assert oldest in protocol._deferred
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
eb031b7 to
e2ee4dc
Compare
PR Review — fix: bound TC-deferred queues against spoofed-source flood OOMSolid DoS hardening. The two-tier cap (per-addr + total addrs) is the right shape for this class of bug, and FIFO eviction keyed off dict insertion order correctly prevents an active flooder from pinning slots by re-using a source IP. Ordering of the checks is sound: the eviction predicate ( Memory math checks out: 512 × 16 × 🟢 Suggestions1. Docstring is longer than the project's style guide recommends (`src/zeroconf/_listener.py`, L312-323)
The FIFO-vs-LRU note is genuinely non-obvious and worth keeping somewhere — consider either trimming the docstring to the one-line summary and moving the FIFO rationale to a short inline comment where it actually matters (next to Not blocking — just inconsistent with how the rest of Checklist
SummarySolid DoS hardening. The two-tier cap (per-addr + total addrs) is the right shape for this class of bug, and FIFO eviction keyed off dict insertion order correctly prevents an active flooder from pinning slots by re-using a source IP. Ordering of the checks is sound: the eviction predicate ( Memory math checks out: 512 × 16 × Automated review by Kōan2c927fd |
e2ee4dc to
ad6169a
Compare
ad6169a to
58e15e3
Compare
|
@bluetoothbot review |
`AsyncListener.handle_query_or_defer` retains every truncated query in `self._deferred[addr]` and arms a per-addr timer that flushes within ~500ms. Neither the per-addr list nor the number of distinct addr keys was capped, so a LAN peer streaming byte-distinct TC-flagged queries — trivially amplified by spoofing source IPs — could grow `_deferred` and `_timers` until the process OOMs. Add two caps and a FIFO eviction: - `_MAX_DEFERRED_PER_ADDR = 16` bounds the per-addr queue. RFC 6762 §18.5 anticipates only a handful of segments per truncated query. The existing O(N) dedup scan now runs over a constant-bounded list. - `_MAX_DEFERRED_ADDRS = 512` bounds the total distinct addrs with in-flight state. The cap leaves headroom for a legitimate burst (a LAN-wide power-resume / boot storm where many devices announce at once) while bounding worst-case memory at ~72 MB. When a new addr would exceed the cap, the oldest insertion-order entry is evicted (FIFO, not LRU, so an active flooder cannot pin its slots by re-sending into the same addr): its timer is cancelled and its queue is discarded so the bound holds even when an attacker rotates source IPs faster than the reassembly timer can drain entries. Both constants are cdef'd in `_listener.pxd` so the bound checks compile to direct C integer compares.
58e15e3 to
2c927fd
Compare
Summary
AsyncListener.handle_query_or_deferretains every truncated incoming query inself._deferred[addr]and arms a per-addr timer that flushes within ~500ms; neither the per-addr list nor the number of distinct addr keys was capped, so a LAN peer streaming byte-distinct TC-flagged queries; trivially amplified by spoofing source IPs, could grow_deferredand_timersuntil the process OOMs. This PR adds a per-addr cap, a total-addrs cap, and FIFO eviction so the worst-case memory footprint of the TC-deferral path stays bounded.Details
_MAX_DEFERRED_PER_ADDR = 16(const.py): bounds the per-addr queue. RFC 6762 §18.5 anticipates only a handful of segments per truncated query, and 16 is well above legitimate need; the existing O(N)for incoming in reversed(deferred)dedup scan now runs over a constant-bounded list, so the per-arrival CPU cost is constant._MAX_DEFERRED_ADDRS = 512(const.py): bounds the total distinct addrs with in-flight TC-deferral state. The cap leaves headroom for a legitimate burst (a LAN-wide power-resume / boot storm where many devices announce at once) while bounding worst-case memory at ~72 MB even under a spoofed-source flood (512 x 16 x_MAX_MSG_ABSOLUTE)._evict_oldest_deferred(src/zeroconf/_listener.py): when a new addr would push_deferredpast the cap, the oldest insertion-order entry is evicted; its timer is cancelled, its deadline is dropped, and its queue is discarded. Eviction is FIFO (oldest by first-seen, via dict insertion order) rather than LRU so an active flooder cannot pin its slots by re-sending into the same addr.src/zeroconf/_listener.pxdsolen(self._deferred) >= _MAX_DEFERRED_ADDRSandlen(deferred) >= _MAX_DEFERRED_PER_ADDRcompile to direct C integer compares; verified viacython -a.Test plan
tests/test_core.py:test_tc_bit_per_addr_queue_is_bounded: 20 distinct TC-flagged queries from one addr produce a 16-entry queue and retain the first 16 by insertion order.test_tc_bit_total_addrs_is_bounded: 516 distinct source IPs sending TC packets keep_deferredand_timersat exactly 512 entries; usespatch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (60_000, 60_001))so the reassembly timer cannot fire during the test window (otherwise PyPy / slow runners can drain entries between the last enqueue and the assertion).test_tc_bit_eviction_drops_oldest_addr: filling to capacity then adding one more addr drops the original-oldest from both_deferredand_timers; the new addr is present.REQUIRE_CYTHON=1 poetry install --only=main,dev && poetry run pytest tests/ --no-cov --timeout=60: 381 passed, 3 skipped.SKIP_CYTHON=1 poetry run pytest tests/ --no-cov --timeout=60: 381 passed, 3 skipped.poetry run pre-commit runon touched files: ruff, ruff format, mypy, codespell, flake8, cython-lint all green.