Skip to content

fix(large-refs): cleanup based on table read#4716

Open
icecrasher321 wants to merge 1 commit into
stagingfrom
fix/cleanup-logs-query
Open

fix(large-refs): cleanup based on table read#4716
icecrasher321 wants to merge 1 commit into
stagingfrom
fix/cleanup-logs-query

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

Summary

Cleanup large refs based on table ref

Type of Change

  • Other: Performance

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 22, 2026 3:27am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 22, 2026

PR Summary

Medium Risk
Changes add new DB-backed metadata and update log/payload flows to record and prune large execution values, which can affect data retention/deletion behavior and background cleanup correctness. Risk is moderated by bounded batch limits and added tests, but mistakes could lead to premature deletions or leaked storage objects.

Overview
Introduces a new DB-backed metadata system for durable “large execution values” (execution_large_values, execution_large_value_references, execution_large_value_dependencies) and wires it into execution lifecycle.

Durable large-value writes now register owner + dependency metadata when persisting (storeLargeValue), and will clean up the uploaded object (and fail durable writes) if metadata registration fails.

Log and pause/snapshot persistence now updates reference tables (replaceLargeValueReferencesWithClient) so cleanup no longer needs to scan execution_data JSON. The cleanup worker is refactored to delete execution log files in bulk, separately select and delete unreferenced large values based on the new tables (with grace/limits), prune metadata tombstones, and cap Trigger.dev concurrency for the task.

Also tightens chunkedBatchDelete row-limit behavior by tracking attempted rows, and adds focused tests for the new cleanup flow and metadata module.

Reviewed by Cursor Bugbot for commit 4812504. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4812504. Configure here.

Comment thread apps/sim/lib/execution/payloads/large-value-metadata.ts
Comment thread apps/sim/background/cleanup-logs.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR replaces a costly LATERAL unnest full-scan approach for large-ref cleanup with a dedicated DB-driven model: three new tables (execution_large_values, execution_large_value_references, execution_large_value_dependencies) track ownership, live references, and transitive dependency closures, enabling efficient bounded cleanup queries.

  • New large-value-metadata.ts: registers each large value at write time, records reference sets atomically inside existing transactions (execution log finalization, pause/resume), and prunes stale metadata in bounded batches via pruneLargeValueMetadata.
  • cleanup-logs.ts refactor: drops execution-data scanning from the log-deletion batch; replaces it with cleanupLargeExecutionValues reading directly from execution_large_values using a correlated unreferencedLargeValuePredicate, plus a cleanupLargeValueMetadata pass for tombstone/reference GC.
  • batch-delete.ts fix: switches the row-limit tracker to a dedicated attempted counter and caps selectChunk to the remaining budget to avoid over-fetching.

Confidence Score: 3/5

The core logic is well-designed and test coverage is solid, but the storage-delete-before-DB-mark ordering in the cleanup path can leave keys permanently uncleanable if the DB update fails and the storage provider is non-idempotent for missing objects.

The cleanup path deletes files from storage before calling markLargeValuesDeleted. If the DB call throws, those keys keep deleted_at = null and are re-selected on every subsequent run. If the storage provider returns errors for non-existent objects, the circuit-breaker (result.deleted === 0 → break) silently skips them every run, leaving them permanently stuck.

apps/sim/background/cleanup-logs.ts (deleteLargeValueKeys ordering) and packages/db/schema.ts (executionLargeValueReferences source column constraint)

Important Files Changed

Filename Overview
apps/sim/background/cleanup-logs.ts Major refactor replacing LATERAL unnest scan with table-driven cleanup. Has an ordering risk where storage deletion before DB marking can leave keys permanently uncleanable on DB failure.
apps/sim/lib/execution/payloads/large-value-metadata.ts New file implementing owner registration, reference tracking, dependency closure computation, and metadata pruning. Logic is sound with budget-guarded closure and transactional writes.
apps/sim/lib/execution/payloads/store.ts Adds registerPersistedValueOwner and deleteUntrackedPersistedValue to storeLargeValue with correct orphan cleanup and error propagation for requireDurable contexts.
apps/sim/lib/logs/execution/logger.ts Adds replaceLargeValueReferencesWithClient inside finalizeLog transaction, ensuring execution_log references are written atomically with the log update.
apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts Adds paused_snapshot reference tracking inside persistPauseResult and updateSnapshotAfterResume transactions with safe snapshotWorkspaceId guard.
packages/db/migrations/0212_gifted_patriot.sql Creates three new tables with appropriate FKs, partial indexes for cleanup patterns, and composite PKs. Missing EOF newline.
packages/db/schema.ts Adds three Drizzle table definitions matching the migration. source column in executionLargeValueReferences has no DB-level check constraint.
apps/sim/lib/cleanup/batch-delete.ts Fixes totalRowLimit tracking with a dedicated attempted counter, caps selectChunk to remaining budget, and correctly tracks failed rows.

Reviews (1): Last reviewed commit: "fix(large-refs): cleanup based on table ..." | Re-trigger Greptile

Comment thread apps/sim/background/cleanup-logs.ts
Comment thread apps/sim/background/cleanup-logs.ts
Comment thread packages/db/migrations/0212_gifted_patriot.sql
Comment thread packages/db/schema.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant