Skip to content

[adapters] log retention duration config for delta connector#6146

Open
swanandx wants to merge 1 commit into
mainfrom
issue6027
Open

[adapters] log retention duration config for delta connector#6146
swanandx wants to merge 1 commit into
mainfrom
issue6027

Conversation

@swanandx
Copy link
Copy Markdown
Member

@swanandx swanandx commented Apr 28, 2026

expose log_retention_duration and enable_expired_log_cleanup for delta sinks so users can use them for configuring the table. Similar to what we do for checkpoint_interval

Fix #6027

Describe Manual Test Plan

Tested under various scenarios and modes

Checklist

  • Unit tests added/updated
  • Integration tests added/updated
  • Documentation updated
  • Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

Describe Incompatible Changes

We add new fields to delta output config, shouldn't break existing things

@swanandx swanandx marked this pull request as draft April 28, 2026 07:31
Comment thread crates/adapters/src/integrated/delta_table/output.rs Outdated
Comment thread crates/feldera-types/src/transport/delta_table.rs Outdated
expose log_retention_duration and enable_expired_log_cleanup for delta
sinks so users can use them for configuring the table. Similar to what we
do for checkpoint_interval

Signed-off-by: Swanand Mulay <73115739+swanandx@users.noreply.github.com>
@swanandx swanandx marked this pull request as ready for review May 21, 2026 16:47
@swanandx swanandx requested a review from ryzhyk May 21, 2026 16:47
dt = DeltaTable(loc.uri, storage_options=loc.delta_storage_options())
config = dt.metadata().configuration
assert config.get("delta.logRetentionDuration") == "interval 7 days", config
assert config.get("delta.enableExpiredLogCleanup") == "false", config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you not checking all 3 properties?

);

// Case 2: both options set — both properties should be reflected in the table metadata.
let table_dir = TempDir::new().unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the TempDir delete the file after the test is over?

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, approving. Clean addition that mirrors the existing checkpoint_interval plumbing.

Highlights

  • validate_delta_interval faithfully mirrors delta-rs grammar (lowercase interval, closed unit list, non-negative integer) so misconfigured tables fail fast at config-load time instead of silently falling back at log-cleanup time. The rejection test list (INTERVAL, Interval, 30 days, interval -5 days, etc.) is thorough and the rationale is documented in the function doc comment.
  • The behavior change has tests on both sides: validate_delta_interval_accepts_* / rejects_malformed lock the grammar, test_log_retention_table_properties proves the properties actually land in the Delta metadata for both unset and set cases, and the Python integration test test_log_retention_properties_land_in_metadata covers the live pipeline path with checkpoint_interval=1 to exercise the cleanup hook surface.
  • warn_about_table_property_discrepancies is a nice touch for the existing-table case where delta-rs silently keeps the old config.

Soft non-blocking notes

  1. False-positive risk in the discrepancy warning (output.rs, the new check closure): for enable_expired_log_cleanup the user-supplied value is Some(true) (a no-op vs the delta-rs default), but a freshly created table may not write delta.enableExpiredLogCleanup into the metadata configuration at all, so actual.get(key) returns <unset> and the warning fires spuriously. Consider treating <unset> as "matches default" for this specific key, or only warning when the requested value disagrees with the documented Delta Lake default.
  2. The warning lives behind mode = append against an existing table — that path is the most user-visible reason these properties get silently ignored. The Rust test test_log_retention_table_properties and the Python test only exercise the "fresh table" happy path; an integration test that pre-creates a table with different delta.logRetentionDuration and then runs the pipeline in mode = append would lock the warning behavior and the silent-ignore contract. Not blocking, but a logical follow-up.
  3. Manual Test Plan box says "Tested under various scenarios and modes" — the description boilerplate from #6027 would be more useful if it named the scenarios (fresh truncate, append-existing, append-fresh, invalid-interval). Small ask.
  4. Nit: log_retention_duration is a String because delta-rs wants the raw interval string, but it would be more ergonomic for callers to accept a Rust Duration and serialize it to the delta grammar internally — would also remove the need for validate_delta_interval to mirror upstream. Out of scope for this PR; just noting.
  5. The new fields are added to the existing Default / make_config test helpers in three benches plus the parallel-test fixtures with : None, — good, that catches any future struct literal that forgets them.

PR description still has the Changelog updated checkbox unchecked; please tick it (or add a changelog line) before merge if you have a changelog rotation. Nothing else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[adapters] log retention config for delta output connector

3 participants