Skip to content

feat: configure SessionContext and RuntimeEnv via builder#28

Merged
andygrove merged 10 commits into
apache:mainfrom
andygrove:feat/session-context-config
May 13, 2026
Merged

feat: configure SessionContext and RuntimeEnv via builder#28
andygrove merged 10 commits into
apache:mainfrom
andygrove:feat/session-context-config

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

Previously new SessionContext() was the only way to construct a context, leaving callers with DataFusion's defaults for batch size, partitioning, statistics, information_schema, memory limits, and disk-spill location. This PR adds a typed builder so callers can configure those knobs at construction time:

SessionContext ctx = SessionContext.builder()
    .batchSize(8192)
    .targetPartitions(4)
    .collectStatistics(true)
    .informationSchema(true)
    .memoryLimit(1L << 30, 0.8)
    .tempDirectory(spillDir)
    .build();

The existing zero-arg constructor is untouched, so current callers continue to work unchanged.

What changes are included in this PR?

  • New SessionContextBuilder Java class with typed setters and input validation (positive sizes, fraction in (0, 1]).
  • New static SessionContext.builder() factory.
  • New project-local proto/session_options.proto describing the wire format.
  • Maven protobuf-maven-plugin split into two named executions so the local proto compiles alongside the upstream-downloaded ones.
  • Rust prost-build wired up via a new native/build.rs to generate matching Rust types.
  • New JNI function createSessionContextWithOptions that decodes the proto, applies the knobs to SessionConfig + RuntimeEnvBuilder, and constructs the context via SessionContext::new_with_config_rt.

Object stores, default catalog/schema, and per-query overrides are intentionally out of scope for this PR.

Are these changes tested?

Yes. SessionContextBuilderTest adds:

  • A proto round-trip test that verifies every setter sets the right presence bit and value.
  • A test that unset fields remain absent in the proto.
  • information_schema on → meta SQL succeeds; off → meta SQL throws. This is the strongest end-to-end check that an option actually crosses JNI and changes runtime behavior.
  • An all-knobs happy-path test using @TempDir for the spill directory and running SELECT 1 end-to-end.

Existing SessionContextTest continues to pass, confirming the zero-arg path is unchanged.

Are there any user-facing changes?

A new public API: SessionContext.builder() and the SessionContextBuilder class.

@andygrove andygrove merged commit 3698b9a into apache:main May 13, 2026
1 check passed
@andygrove andygrove deleted the feat/session-context-config branch May 13, 2026 16:44
LantaoJin added a commit to LantaoJin/datafusion-java that referenced this pull request May 14, 2026
… / getOption

DataFusion's SessionConfig carries roughly 200 keys split across seven sections (datafusion.execution.*, datafusion.optimizer.*, etc). The Java SessionContextBuilder introduced in apache#28 covers six of them with named setters, and there is no Java surface to read any config value back at all. Rather than ship ~200 named get/set pairs one at a time, mirror DataFusion's existing string-keyed surface (ConfigOptions::set) as a generic escape hatch on the builder + context.

Adds two additions to session_options.proto: a `repeated ConfigOption options = 7;` field plus the matching ConfigOption message. `repeated` is used instead of `map<string,string>` because protobuf maps decode into a Rust HashMap whose iteration order is randomized -- that would silently break overlapping-key cases like `datafusion.optimizer.enable_dynamic_filter_pushdown` (whose setter has side effects on the per-operator `enable_*_dynamic_filter_pushdown` flags). The Java builder gains setOption(key, value) and setOptions(Map). Java-side storage is a LinkedHashMap so caller insertion order is preserved end to end.

On the native side, free-form options are applied via config.options_mut().set(k, v)? before SessionContext construction. Map entries are applied after the typed setters so an explicit setOption call wins over a typed setter for the same knob, and within the entry list the caller's last write wins -- both for same-key duplicates (LinkedHashMap dedups) and for overlapping side-effect keys. The ? form (rather than SessionConfig::set_str's .unwrap()) means unknown keys or unparseable values surface as a RuntimeException with DataFusion's error message instead of panicking the JVM.

Adds SessionContext.getOption(key) on the constructed context (not on the builder, since the value reflects "what DataFusion actually compiled" -- only knowable post-construction). The native side walks ctx.copied_config().options().entries() and returns ConfigEntry.value as a String, or null if the key is known but unset and has no default. Unknown keys throw RuntimeException to mirror setOption's strictness.

datafusion.runtime.* keys (memory limit, temp directory, cache sizes) live on a separate RuntimeEnv config object and have several upstream-shaped round-trip pitfalls that don't apply to the SessionConfig subtree (lazy default tempdir, per-session datafusion-Xxxxxx spill suffixes, OS-specific path separators, integer K/M/G truncation, sub-1KB byte formatting, the unlimited sentinel, multi-statement SET clobbering). Both setOption and getOption reject runtime keys with a clear error pointing at the typed memoryLimit() / tempDirectory() setters; round-trippable runtime support is tracked as a follow-up PR that needs a per-context side-cache.

Tests cover the proto round-trip with explicit on-the-wire ordering assertions, bulk setOptions, null rejection, override-typed-setter semantics (asserted by reading the value back via getOption), last-write-wins for repeated keys, an unknown-key error path on both set and get, default-fallback on get, a closed-context guard on get, the runtime-key rejection on both set and get with messages that point at the typed setters, and the order-preservation case where setting the umbrella `enable_dynamic_filter_pushdown` flag followed by an explicit `enable_topk_dynamic_filter_pushdown=false` correctly leaves topk disabled (the override winning over the umbrella's side effect).

Common knobs this unlocks include parquet pushdown_filters / bloom_filter_on_read, optimizer prefer_hash_join, execution time_zone, sql_parser dialect, and explain show_statistics -- previously inaccessible from Java without adding a named get/set per key.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to configure SessionContext and RuntimeEnv

1 participant