Conversation
mihaibudiu
approved these changes
Apr 28, 2026
expose log_retention_duration and enable_expired_log_cleanup for delta sinks so users can use them for configuring the table. Similar to what we do for checkpoint_interval Signed-off-by: Swanand Mulay <73115739+swanandx@users.noreply.github.com>
mihaibudiu
approved these changes
May 21, 2026
| dt = DeltaTable(loc.uri, storage_options=loc.delta_storage_options()) | ||
| config = dt.metadata().configuration | ||
| assert config.get("delta.logRetentionDuration") == "interval 7 days", config | ||
| assert config.get("delta.enableExpiredLogCleanup") == "false", config |
Contributor
There was a problem hiding this comment.
why are you not checking all 3 properties?
| ); | ||
|
|
||
| // Case 2: both options set — both properties should be reflected in the table metadata. | ||
| let table_dir = TempDir::new().unwrap(); |
Contributor
There was a problem hiding this comment.
Does the TempDir delete the file after the test is over?
mythical-fred
approved these changes
May 21, 2026
mythical-fred
left a comment
There was a problem hiding this comment.
LGTM, approving. Clean addition that mirrors the existing checkpoint_interval plumbing.
Highlights
validate_delta_intervalfaithfully mirrors delta-rs grammar (lowercaseinterval, closed unit list, non-negative integer) so misconfigured tables fail fast at config-load time instead of silently falling back at log-cleanup time. The rejection test list (INTERVAL,Interval,30 days,interval -5 days, etc.) is thorough and the rationale is documented in the function doc comment.- The behavior change has tests on both sides:
validate_delta_interval_accepts_*/rejects_malformedlock the grammar,test_log_retention_table_propertiesproves the properties actually land in the Delta metadata for both unset and set cases, and the Python integration testtest_log_retention_properties_land_in_metadatacovers the live pipeline path withcheckpoint_interval=1to exercise the cleanup hook surface. warn_about_table_property_discrepanciesis a nice touch for the existing-table case where delta-rs silently keeps the old config.
Soft non-blocking notes
- False-positive risk in the discrepancy warning (output.rs, the new
checkclosure): forenable_expired_log_cleanupthe user-supplied value isSome(true)(a no-op vs the delta-rs default), but a freshly created table may not writedelta.enableExpiredLogCleanupinto the metadata configuration at all, soactual.get(key)returns<unset>and the warning fires spuriously. Consider treating<unset>as "matches default" for this specific key, or only warning when the requested value disagrees with the documented Delta Lake default. - The warning lives behind
mode = appendagainst an existing table — that path is the most user-visible reason these properties get silently ignored. The Rust testtest_log_retention_table_propertiesand the Python test only exercise the "fresh table" happy path; an integration test that pre-creates a table with differentdelta.logRetentionDurationand then runs the pipeline inmode = appendwould lock the warning behavior and the silent-ignore contract. Not blocking, but a logical follow-up. - Manual Test Plan box says "Tested under various scenarios and modes" — the description boilerplate from #6027 would be more useful if it named the scenarios (fresh truncate, append-existing, append-fresh, invalid-interval). Small ask.
- Nit:
log_retention_durationis aStringbecause delta-rs wants the raw interval string, but it would be more ergonomic for callers to accept a RustDurationand serialize it to the delta grammar internally — would also remove the need forvalidate_delta_intervalto mirror upstream. Out of scope for this PR; just noting. - The new fields are added to the existing
Default/make_configtest helpers in three benches plus the parallel-test fixtures with: None,— good, that catches any future struct literal that forgets them.
PR description still has the Changelog updated checkbox unchecked; please tick it (or add a changelog line) before merge if you have a changelog rotation. Nothing else.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
expose log_retention_duration and enable_expired_log_cleanup for delta sinks so users can use them for configuring the table. Similar to what we do for checkpoint_interval
Fix #6027
Describe Manual Test Plan
Tested under various scenarios and modes
Checklist
Breaking Changes?
Mark if you think the answer is yes for any of these components:
Describe Incompatible Changes
We add new fields to delta output config, shouldn't break existing things