Skip to content

feat: typed DataFusionException hierarchy across the JNI boundary#81

Open
LantaoJin wants to merge 1 commit into
apache:mainfrom
LantaoJin:feat/typed-exceptions
Open

feat: typed DataFusionException hierarchy across the JNI boundary#81
LantaoJin wants to merge 1 commit into
apache:mainfrom
LantaoJin:feat/typed-exceptions

Conversation

@LantaoJin
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Every error from the native side currently surfaces as a single java.lang.RuntimeException. try_unwrap_or_throw in native/src/errors.rs calls env.throw_new("java/lang/RuntimeException", message) for every Err(...) it sees, so a DataFusionError::Plan and a DataFusionError::ResourcesExhausted and a DataFusionError::IoError are indistinguishable on the Java side. Callers that want to retry on transient resource pressure but not on a malformed query plan have only the message string to discriminate on — fragile and brittle to upstream wording changes.

This PR adds a typed Java exception hierarchy and routes each DataFusionError variant onto the matching subclass at the JNI layer, so callers can catch (PlanException) / catch (ResourcesExhaustedException) selectively without scraping messages.

The hierarchy is shallow on purpose. Multiple Rust variants fold into one Java class when they're the same kind of problem from the caller's standpoint (e.g. IoError / ObjectStore / ParquetError / AvroError all surface as IoException). Variants with no clean caller-facing category — Internal, Substrait, Collection, plus any new variant a future DataFusion bump introduces — fall through to the parent DataFusionException. The mapping is the public contract; the underlying variant set is not.

ArrowError gets per-variant treatment because it's a 21-variant grab bag that mixes IO with execution-time failures. Routing the whole ArrowError arm to a single class would put e.g. SELECT 1/0 (ArrowError::DivideByZero) under IoException, which is wrong from a caller's standpoint. A small classify_arrow helper splits the variants: IoError / IpcError go to IoException, SchemaError / ParseError go to PlanException, and the arithmetic / cast / compute variants go to ExecutionException.

Sequenced before upstream #55 ("propagate Java stack traces from JVM upcalls into DataFusionError"), which is complementary: every subclass ships a (String, Throwable) constructor as the seam #55 plugs into for setting the original Java throwable as cause.

What changes are included in this PR?

  • Java side: new public class DataFusionException (extends RuntimeException) plus 6 subclasses — PlanException, ExecutionException, ResourcesExhaustedException, IoException, NotImplementedException, ConfigurationException. Each ships (String) and (String, Throwable) constructors.
  • Native side: errors.rs downcasts the boxed error to DataFusionError, walks the cause chain via upstream's DataFusionError::find_root(), and routes the root variant through a classify() helper to the right Java class. The thrown message preserves the outer error's full chain so wrapping context isn't lost; only the class is picked from the inner variant. JniResult<T> is unchanged (Box<dyn Error + Send + Sync>) — every ? site keeps working as-is. Rust panics surface as the parent class with a panic: prefix.
  • find_root() matters because DataFusion can wrap a ResourcesExhausted / Plan / etc. inside an ArrowError::ExternalError(...) or a Context(...) chain (upstream documents this exact shape). Without find_root, those would land on the outer arm and be misclassified.
  • ArrowError gets a dedicated classify_arrow helper because the enum mixes IO (IoError, IpcError) with execution-time variants (DivideByZero, ArithmeticOverflow, ComputeError, CastError, InvalidArgumentError, MemoryError, External, …) and schema-shaped variants (SchemaError, ParseError). Each lands on the matching Java class instead of all collapsing to IoException.
  • Variants without a clean category (Internal, Substrait, Collection, future variants) fall through to the parent class.
  • native/Cargo.toml adds rlib alongside cdylib to enable Rust unit tests that pin the find_root routing behavior on synthetic wrapper chains; the cdylib artifact the JVM loads is unchanged.

Are these changes tested?

Yes. 10 new Java tests + 7 new Rust unit tests.

Are there any user-facing changes?

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: typed DataFusionException hierarchy across the JNI boundary

1 participant