feat: configure SessionContext and RuntimeEnv via builder#28
Merged
Conversation
This was referenced May 13, 2026
LantaoJin
added a commit
to LantaoJin/datafusion-java
that referenced
this pull request
May 14, 2026
… / getOption DataFusion's SessionConfig carries roughly 200 keys split across seven sections (datafusion.execution.*, datafusion.optimizer.*, etc). The Java SessionContextBuilder introduced in apache#28 covers six of them with named setters, and there is no Java surface to read any config value back at all. Rather than ship ~200 named get/set pairs one at a time, mirror DataFusion's existing string-keyed surface (ConfigOptions::set) as a generic escape hatch on the builder + context. Adds two additions to session_options.proto: a `repeated ConfigOption options = 7;` field plus the matching ConfigOption message. `repeated` is used instead of `map<string,string>` because protobuf maps decode into a Rust HashMap whose iteration order is randomized -- that would silently break overlapping-key cases like `datafusion.optimizer.enable_dynamic_filter_pushdown` (whose setter has side effects on the per-operator `enable_*_dynamic_filter_pushdown` flags). The Java builder gains setOption(key, value) and setOptions(Map). Java-side storage is a LinkedHashMap so caller insertion order is preserved end to end. On the native side, free-form options are applied via config.options_mut().set(k, v)? before SessionContext construction. Map entries are applied after the typed setters so an explicit setOption call wins over a typed setter for the same knob, and within the entry list the caller's last write wins -- both for same-key duplicates (LinkedHashMap dedups) and for overlapping side-effect keys. The ? form (rather than SessionConfig::set_str's .unwrap()) means unknown keys or unparseable values surface as a RuntimeException with DataFusion's error message instead of panicking the JVM. Adds SessionContext.getOption(key) on the constructed context (not on the builder, since the value reflects "what DataFusion actually compiled" -- only knowable post-construction). The native side walks ctx.copied_config().options().entries() and returns ConfigEntry.value as a String, or null if the key is known but unset and has no default. Unknown keys throw RuntimeException to mirror setOption's strictness. datafusion.runtime.* keys (memory limit, temp directory, cache sizes) live on a separate RuntimeEnv config object and have several upstream-shaped round-trip pitfalls that don't apply to the SessionConfig subtree (lazy default tempdir, per-session datafusion-Xxxxxx spill suffixes, OS-specific path separators, integer K/M/G truncation, sub-1KB byte formatting, the unlimited sentinel, multi-statement SET clobbering). Both setOption and getOption reject runtime keys with a clear error pointing at the typed memoryLimit() / tempDirectory() setters; round-trippable runtime support is tracked as a follow-up PR that needs a per-context side-cache. Tests cover the proto round-trip with explicit on-the-wire ordering assertions, bulk setOptions, null rejection, override-typed-setter semantics (asserted by reading the value back via getOption), last-write-wins for repeated keys, an unknown-key error path on both set and get, default-fallback on get, a closed-context guard on get, the runtime-key rejection on both set and get with messages that point at the typed setters, and the order-preservation case where setting the umbrella `enable_dynamic_filter_pushdown` flag followed by an explicit `enable_topk_dynamic_filter_pushdown=false` correctly leaves topk disabled (the override winning over the umbrella's side effect). Common knobs this unlocks include parquet pushdown_filters / bloom_filter_on_read, optimizer prefer_hash_join, execution time_zone, sql_parser dialect, and explain show_statistics -- previously inaccessible from Java without adding a named get/set per key.
This was referenced May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Previously
new SessionContext()was the only way to construct a context, leaving callers with DataFusion's defaults for batch size, partitioning, statistics, information_schema, memory limits, and disk-spill location. This PR adds a typed builder so callers can configure those knobs at construction time:The existing zero-arg constructor is untouched, so current callers continue to work unchanged.
What changes are included in this PR?
SessionContextBuilderJava class with typed setters and input validation (positive sizes, fraction in (0, 1]).SessionContext.builder()factory.proto/session_options.protodescribing the wire format.protobuf-maven-pluginsplit into two named executions so the local proto compiles alongside the upstream-downloaded ones.prost-buildwired up via a newnative/build.rsto generate matching Rust types.createSessionContextWithOptionsthat decodes the proto, applies the knobs toSessionConfig+RuntimeEnvBuilder, and constructs the context viaSessionContext::new_with_config_rt.Object stores, default catalog/schema, and per-query overrides are intentionally out of scope for this PR.
Are these changes tested?
Yes.
SessionContextBuilderTestadds:information_schemaon → meta SQL succeeds; off → meta SQL throws. This is the strongest end-to-end check that an option actually crosses JNI and changes runtime behavior.@TempDirfor the spill directory and runningSELECT 1end-to-end.Existing
SessionContextTestcontinues to pass, confirming the zero-arg path is unchanged.Are there any user-facing changes?
A new public API:
SessionContext.builder()and theSessionContextBuilderclass.