Spark DataSource backed by a DataFusion TableProvider over ADBC

**Is your feature request related to a problem or challenge?**

Spark users want to read data from a DataFusion `TableProvider` as a native Spark `DataSourceV2`. Today there is no first-class path; options are either a bespoke per-operation JNI surface (more native surface to maintain) or copying data out of process.

**Describe the solution you'd like**

A Spark `DataSourceV2` connector that places the native boundary at a **standard ADBC driver**. Spark talks to the upstream arrow-adbc Java driver manager (`adbc-core` + `adbc-driver-jni`), which loads a native DataFusion ADBC cdylib and returns arrow-java `ArrowReader`s consumed zero-copy as `ArrowColumnVector`s on the cluster-provided Arrow. This reuses the upstream ADBC bindings rather than reproducing them.

Scope:
- `adbc-datafusion` format registered as a `DataSourceV2`; schema probed on the driver.
- Projection / filter / limit pushdown via Substrait, with a SQL fallback.
- Multi-partition reads (`executePartitioned` / `readPartition`) and a `target_partitions` option.
- Per-executor connection pool to amortize driver/database setup across task slots.
- An example DataFusion ADBC driver cdylib plus end-to-end (PySpark) coverage.

**Describe alternatives you've considered**

A plain-C scan ABI + hand-written JNI shim (discussed on #103 / #104). The ADBC approach reuses standard, separately-reviewed bindings and a stable driver contract instead.

**Additional context**

Implemented in #111.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark DataSource backed by a DataFusion TableProvider over ADBC #112

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Spark DataSource backed by a DataFusion TableProvider over ADBC #112

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions