Skip to content

feat(substrait): add SessionContext.fromSubstrait gated behind opt-in Cargo feature#80

Open
LantaoJin wants to merge 1 commit into
apache:mainfrom
LantaoJin:feat/substrait-from-bytes
Open

feat(substrait): add SessionContext.fromSubstrait gated behind opt-in Cargo feature#80
LantaoJin wants to merge 1 commit into
apache:mainfrom
LantaoJin:feat/substrait-from-bytes

Conversation

@LantaoJin
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

SessionContext.fromProto(byte[]) accepts only DataFusion's own LogicalPlanNode proto. Substrait — the cross-engine logical-plan standard that DataFusion already supports through the datafusion-substrait crate — has had no Java-side entry point. Embedders that compile plans elsewhere (Calcite via Isthmus, custom planners, federation hubs, integrations with other engines) had to round-trip through SQL to use the Java binding. That round-trip is lossy: source-side optimisations baked into the Substrait plan are discarded, and SQL is not always expressive enough to round-trip cleanly when plans reference extensions or function variants with no surface SQL form.

What changes are included in this PR?

This PR adds a single new entry point that mirrors the existing fromProto shape but consumes Substrait Plan bytes instead. The implementation is small (~50 LOC of JNI plus ~25 LOC on the Java side); the bulk of the diff is the test that round-trips a hand-built Substrait plan through the JNI bridge.

New public Java API on SessionContext:

public DataFrame fromSubstrait(byte[] planBytes);

planBytes is a serialised substrait.proto.Plan. The plan is translated against this context's catalog: any tables it references must already be registered. The returned DataFrame is lazy and composes with the rest of the API.

Default-off, so cargo build (and therefore make test, make, and everyone who doesn't need Substrait) stays hermetic without any new build prerequisites. Substrait support is opt-in:

invocation substrait support build prereqs
cargo build (default) off (stub handler) none
cargo build --features substrait on protoc on PATH
cargo build --features substrait,protoc on (vendored protoc) cmake on PATH

The Java surface is unchanged either way — SessionContext.fromSubstrait(...) is always present; calls just throw a clear "datafusion-jni was built without the substrait Cargo feature; rebuild with --features substrait" error from the JVM if the feature was compiled off. SessionContextSubstraitTest detects this case and skips itself via JUnit's Assumptions.assumeFalse(...), so make test stays green either way.

This is intentionally different from PR #60's avro handling, which is always-on.

Are these changes tested?

Yes, 7 new tests in SessionContextSubstraitTest

Are there any user-facing changes?

Yes, purely additive. New public API:

  • SessionContext.fromSubstrait(byte[]) → DataFrame

No API removals, no deprecations, no behavior change for existing callers. The default cargo build does not pull in datafusion-substrait and adds no new build prerequisites; SessionContext.fromSubstrait(...) is present but throws "feature not enabled" at runtime. Users who need Substrait rebuild with --features substrait (and either install protoc or also enable the protoc helper feature). The native binary is unchanged in size unless the feature is opted in.

The new test-scope dependency io.substrait:core:0.81.0 is added to the parent POM's dependencyManagement (with version property substrait.java.version) and to core/pom.xml in test scope only; it does not enter the runtime classpath of the published artifact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: expose Substrait plan ingestion via SessionContext.fromSubstrait

1 participant