Add first-class support for Blosc2 CTable (.b2z) tables by FrancescAlted · Pull Request #288 · ironArray/Caterva2

FrancescAlted · 2026-07-01T11:50:08Z

Overview

Adds first-class support for Blosc2 heterogeneous tables (blosc2.CTable, compact single-file .b2z) alongside the existing NDArray/.b2nd support — discover, download, inspect, preview, and slice tables through the REST API, Python client, CLI, and web UI. Design rationale lives in plans/ctable-support.md.

Key changes

REST / server

/api/fetch serves .b2z (whole and sliced) as a Blosc2 cframe via CTable.to_cframe(), mirroring the array workflow. Whole tables are excluded from the raw-file FileResponse short-circuit (a whole .b2z is a zip, not a cframe, so returning it raw broke client decode).
read_metadata() → new CTableMetadata model (nrows/ncols/columns/schema_dict/…); open_b2() returns CTable early; .b2z treated as a native suffix.
Upload/download paths switched to the shared BLOSC2_NATIVE_SUFFIXES constant (incl. .b2z) so tables are stored/served as-is.

Python client (behavior change — see caveats)

Reworked leaf-class hierarchy: File → Dataset → {Array, Table}. blosc2.Operand now lives on Array only; Array/Table are exported. root["x.b2nd"] now returns an Array (was Dataset).
New Table API: nrows, ncols, columns, schema, slice/[...] (→ blosc2.CTable), rows(), head().
_fetch_data dispatches decode on known kind instead of trial-and-except sniffing.

CLI

cat2-client info/show support .b2z (table-shaped info; show table.b2z[start:stop] prints rows off the cframe).
Also includes a cleaner error message for invalid cat2-client commands.

Web UI

CTable preview reuses the existing structured-array visualizer (info_view.html): Display tab + a htmx_path_view branch rendering rows/columns; filter/sort hidden for tables (filterable flag). Fixes a pre-existing Meta-tab crash on non-array metadata.

Tests / docs

New caterva2/tests/test_ctable.py (25 tests): metadata, /api/info, /api/download round-trip, whole+slice fetch, client Table, CLI, web preview, and nested/non-identifier column-name regressions.
Design + task plans under plans/.

⚠️ Caveats for reviewers / merge

Requires a newer blosc2 than the current >=4.6.0 pin — the code uses CTable.to_cframe()/blosc2.ctable_from_cframe() and nested-column dotted-path access, which land in 4.7.x. Bump the blosc2 requirement in pyproject.toml before/with merge, or installs will fail at runtime.
Behavior change: isinstance(x, cat2.Dataset) now matches tables too (tables are Datasets), and array datasets are Array instances / <Array: …> reprs instead of <Dataset: …>. Dataset stays importable as the shared base.

…(), Python client Table class

…z storage handling Whole-table /api/fetch was returning the raw .b2z zip instead of a cframe, so table[:] failed client-side. Also introduces Array/Table as proper Dataset subclasses (client.py), dispatches cframe decoding by known kind instead of trial/except, adds Table.nrows/columns/head/rows, and treats .b2z as a native Blosc2 suffix for upload/load_from_url/htmx paths so tables round-trip byte-identical. Adds regression tests.

htmx_path_info/htmx_path_view: render a paged row/column preview for CTable using schema_dict(), with Filter/Sort-by hidden (filterable flag) since they don't apply to tables; also fixes a pre-existing crash in the Meta tab template for CTableMetadata (no cparams). cli.py: `info` prints table-shaped fields instead of crashing on cparams.get(None); `show` parses the optional row-slice syntax (table.b2z[start:stop]) and prints rows via the Table client class instead of calling the array-oriented fetch(). Adds regression tests for both surfaces, plus tests for nested/ non-identifier CTable column names (e.g. "trip.sec" struct leaves), now resolved natively by blosc2's CTableRow.__getitem__.

Follow-up fixes from review of the CTable support work: - client: bound Table.rows() default to [0:50) instead of the whole table, so table.rows() no longer silently fetches every row of a large table (pass stop=self.nrows for all rows). - server: fix /api/fetch CTable slice resolution to use `is None` instead of truthiness, so table[0:0] returns an empty result rather than the whole table; also normalize negative indices and clamp start/stop to [0, nrows]. - cli: coerce numpy scalars, bytes, and arrays in `show --json` via a json default, matching the web preview's cell handling. - server: return a clean htmx error (not an uncaught AssertionError) when a filter/sort is requested on a dataset type that does not support it (e.g. a .b2z). - server: drop a stray comment token in the CTable fetch branch.

Copilot

Pull request overview

Adds end-to-end support for Blosc2 CTable single-file tables (.b2z) across Caterva2’s server APIs, Python client, CLI, and web UI, aligning table handling with existing NDArray workflows (notably via /api/fetch returning cframes for both whole tables and slices).

Changes:

Server: recognize .b2z as a native Blosc2 suffix; add CTableMetadata; extend /api/fetch to stream table cframes; add web preview rendering for tables and guard filter/sort UI.
Client/CLI: introduce Array/Table first-class client types; decode fetch responses by known kind; add CLI info/show behavior for .b2z.
Tests/docs: add comprehensive CTable tests plus design/task plan documents.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
plans/ctable-support.md	Full design record + implementation notes for CTable support.
plans/ctable-support-tasks.md	Task checklist and acceptance criteria for the implementation.
plans/ctable-support-orig-gpt5.5.md	Archived original plan draft for reference.
caterva2/tests/test_notebook_bootstrap.py	Updates bootstrap-cell injection expectations (now two cells).
caterva2/tests/test_ctable.py	Adds CTable coverage: metadata, fetch/download, client, CLI, and web preview.
caterva2/services/templates/info_view.html	Hides filter/sort controls when `filterable=False` (tables).
caterva2/services/templates/includes/info_metadata.html	Adds a CTable metadata branch to prevent Meta-tab crashes.
caterva2/services/srv_utils.py	Centralizes Blosc2 suffix constants and adds CTable metadata extraction.
caterva2/services/server.py	Implements CTable-aware open/metadata/fetch/upload/web-preview behavior.
caterva2/models.py	Adds `CTableMetadata` Pydantic model.
caterva2/clients/cli.py	Adds `.b2z` table display logic for `info` and `show` (incl. JSON coercion).
caterva2/client.py	Refactors client hierarchy and adds `Table` API + kind-based fetch decode.
caterva2/init.py	Exports `Array` and `Table` symbols.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…IVE_SUFFIXES

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

A .b2z may hold a TreeStore (a hierarchy of leaves), not just a CTable. Address leaves by path (tree.b2z/level1/ctable) without unfolding to disk: list descends, info/fetch open the leaf, the web tree expands into leaf rows, and the client dispatches by server-reported kind.

A .b2z may hold a TreeStore (a hierarchy of NDArray/CTable leaves), not just a single CTable. Address leaves by virtual path (tree.b2z/level1/ctable) without unfolding to disk: - split_container_path()/treestore_leaves() split a request path at the .b2z boundary and enumerate leaves. - API: list descends, info/fetch open the leaf; leaves inherit the container mtime. - Web: the tree expands a container into leaf rows; info/view tabs work on leaves. Unify group-like things behind models.Directory (kind="dir", mtime, size, nfiles): info returns it for a real directory, a TreeStore container (root group), and a virtual group inside one. Group size is summed cheaply from the .b2z zip index (no per-leaf open). Client: new Group class (browsable/indexable); Root.__getitem__ dispatches on server-reported kind (dir->Group, ctable->Table, shape->Array, else File), reusing already-fetched metadata to avoid a double info round-trip.

@public

A .b2z TreeStore now shows as a single mountable row in the datasets list instead of auto-expanding into one row per leaf (which flooded the list). Clicking the plug icon "mounts" it as a virtual root alongside @personal/@shared/@public, with its own checkbox and an unmount control; checking it lists that container's leaves. Mount state lives client-side in localStorage (key caterva2:mounted), bridged to the server via an htmx:configRequest listener that adds `mounted=` params to the root-list request. No new endpoints, DB, or per-user server state. - server.py: htmx_root_list accepts `mounted` and filters it through get_rootdir_or_none; htmx_path_list renders TreeStores as single mountable rows and expands mounted containers into leaf rows. - templates: root_list.html renders mounted roots, path_list.html adds the plug button, home.html holds the mount/unmount JS. - Rename Directory.kind "dir" -> "group" (models/client/cli). Review fixes: - Avoid stored XSS: read paths from data-* attributes at click time instead of interpolating into inline handler JS source. - Don't 500 the listing on a corrupt/non-TreeStore/stale .b2z (untrusted localStorage input); skip it in both the walk and virtual-root loops. - Dedup roots in mountRoot so a repeat click can't double-list leaves. - stat() the container once per mount instead of once per leaf. - Update test_treestore.py for single-row behavior; add coverage for virtual-root leaf expansion and bogus-container safety.

Extend the .b2z TreeStore virtual-descent/mount feature to plain HDF5 files: a srv_utils.open_container() adapter (_TreeStoreAdapter / _HDF5Adapter) unifies list/info/fetch across both formats, backed by a file-less HDF5Proxy.open_leaf() (in-memory, no .b2nd written to disk). Client Group gains unfold/copy/move/remove/download, since a plain .h5 now dispatches to Group instead of File. Also fix the mounted-root unmount (x) icon: a long root name (typical for .h5 files) grew the row past the sidebar's fixed column, pushing the icon under the neighboring higher-z-index panel and eating the click. The icon now sits in a fixed, absolutely-positioned slot in the row's own gutter, so it stays clickable and its checkbox lines up with the regular-root rows above it.

htmx_path_list reused the container file's stat().st_size for every leaf inside a mounted .b2z/.h5, so all datasets showed the same size. Add a cheap leaf_size() to both container adapters (schunk cbytes for TreeStore, h5py storage size for HDF5, no full proxy needed) and use it for per-leaf rows instead.

…r 500 - get_filtered_array: accept inner_key param, open container member instead of blosc2.open() on the whole file - htmx_path_view: replace blanket "no filter/sort on container members" 400 with HDF5-only guard; .b2z members now flow through get_filtered_array - Fix pre-existing 500 on 0-d container members: arr[()] returns unhashable ndarray, broken by `value in header_sort` in template; convert to scalar - Tests: structured & 0-d leaves in _make_tree fixture, sort asc/desc tests, 0-d view test, i4-no-fields 400 test

- get_filtered_array: HDF5Proxy branch using .indices()/.sort() (materialized, cache-safe). Filter still blocked (needs LazyExpr plumbing on proxy). - htmx_path_view: narrow HDF5 guard to filter-only; sort passes through. Set filterable=False for HDF5 members (hide filter box in UI). - hdf5.py: blosc2.asarray(self.dset) instead of self.dset[:] so ingestion streams chunk-by-chunk from HDF5 for >16 MB datasets — no intermediate full numpy array. - Tests: structured HDF5 leaf in fixture, sort asc/desc, filter 400, sort on plain-dtype 400, filterable=False assertion, 0-d scalar view fix.

1. Filter-only crash on .b2z members — root cause is a blosc2 bug: the where-fastpath re-opens the operand's urlpath, which for a TreeStore leaf is the whole .b2z. Worked around in get_filtered_array by detaching filtered members with an in-memory arr.copy() (cache-bounded, same materialization trade-off the filter path already makes). New tests cover filter-only and filter+sort on members. 2. /api/fetch silently dropping filter on members — fetch_data now routes filter requests through get_filtered_array(..., inner_key=inner_key); HDF5-member filters get a clean 400 (raised from a 2-line guard in get_filtered_array), and ValueErrors map to 400 instead of 500. Tested for both .b2z (filtered rows come back) and .h5 (400). 3. Corrupt-member 500s — added except (RuntimeError, OSError) to the htmx except chain. 4. open_container None-check divergence — new srv_utils.open_container_member() helper replaces all three copies of the open→get→validate pattern (htmx view, fetch, filtered path). The bogus-.b2z-member case now yields "Cannot open container member" instead of the nonsensical "Invalid filter" message (regression test added). 5. Double dataset ingest — HDF5Proxy now materializes once via a memoized _as_blosc2(); argsort (with indices kept as an alias) and sort share the single conversion. 6. Tiny-chunk inheritance cliff — _as_blosc2() ignores degenerate HDF5 chunks (< 1 MiB) and lets blosc2 pick its own chunking. 7. Redundant HDF5Proxy branch in the server — deleted; the argsort alias lets HDF5 members flow through the generic NDArray path. 8. 0-d comment misattribution — reworded to name blosc2.NDArray[()] as the 0-d source. Two bonus fixes along the way: the "unsupported dataset type" asserts became ValueErrors (they were uncaught 500s from /api/fetch and vanish under python -O), and running the suite exposed 4 latent test bugs from the earlier header-sort session (assertions matching row-label/y cells, and raise_for_status() on an intentional 400) — those tests were curl-verified back then because port 8000 was occupied; they're now fixed and passing under pytest.

Datasets panel: clicking a row highlights it as the keyboard cursor (separate from the teal "loaded" indicator); Up/Down move the cursor and focus its link so Enter loads it, starting from whichever dataset is already active if no cursor has been set yet. Display tab: clicking a data row highlights it; Up/Down move the highlight within the loaded window and page in the adjacent window at the edges, continuing the highlight into it. Reuses Bootstrap's border/table-active utilities, no new CSS beyond suppressing the default focus outline on dataset links in favor of the row border.

FrancescAlted added 6 commits June 30, 2026 21:45

Add CTable .b2z support: metadata model, fetch pipeline via to_cframe…

3f4748c

…(), Python client Table class

Much cleaner error message for cat2-client on invalid commands

ebd52a8

Much cleaner error message for cat2-client on invalid commands

86e850a

FrancescAlted requested a review from Copilot July 1, 2026 11:50

Copilot started reviewing on behalf of FrancescAlted July 1, 2026 11:50 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

Comment thread caterva2/services/server.py

Comment thread caterva2/services/server.py

Comment thread caterva2/client.py Outdated

Copilot started work on behalf of FrancescAlted July 1, 2026 12:02 View session

FrancescAlted and others added 2 commits July 1, 2026 14:02

Potential fix for pull request finding

2adefd8

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Fix hardcoded suffix set in get_abspath() to use srv_utils.BLOSC2_NAT…

6ce34ca

…IVE_SUFFIXES

Copilot started work on behalf of FrancescAlted July 1, 2026 12:03 View session

Potential fix for pull request finding

e3fc9b3

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot finished work on behalf of FrancescAlted July 1, 2026 12:04

FrancescAlted added 14 commits July 1, 2026 14:11

Add test for the table[-1] fix

e410ea3

Add plan for virtual descent for h5 too

af2cce6

[WebUI] CTable sorting now works using blosc2's sort_by()

2be8746

[WebUI] Table sorting works with ascending/descending directions now

004e2b9

[WebUI] Mouse/touch scroll is supported in Datasets and Display panels

454b032

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add first-class support for Blosc2 CTable (.b2z) tables#288

Add first-class support for Blosc2 CTable (.b2z) tables#288
FrancescAlted wants to merge 23 commits into
mainfrom
new-table

FrancescAlted commented Jul 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

FrancescAlted commented Jul 1, 2026

Overview

Key changes

⚠️ Caveats for reviewers / merge

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants