Skip to content

Add binary serialization of the parsed AST (JRuby support, step 2)#2999

Merged
soutaro merged 1 commit into
masterfrom
claude/practical-mendel-hnqws8-2
Jun 22, 2026
Merged

Add binary serialization of the parsed AST (JRuby support, step 2)#2999
soutaro merged 1 commit into
masterfrom
claude/practical-mendel-hnqws8-2

Conversation

@soutaro

@soutaro soutaro commented Jun 16, 2026

Copy link
Copy Markdown
Member

Stacked on #2998 (step 1). Review/merge that first; this PR's diff is against it.

Why

For JRuby, the parser runs inside WebAssembly and the AST has to come back to Ruby without the Ruby C API. This PR adds that bridge: the parser serializes the AST to a compact binary buffer, and pure-Ruby code rebuilds the same RBS::AST objects on the other side.

Both ends are generated from config.yml, right alongside the existing C→Ruby translation (ast_translation.c), so they can't drift apart.

What's here

  • src/serialize.c (rbs_serialize_node) — generated C encoder that walks the C AST into the binary format. Self-contained C, so it compiles into both the extension and the .wasm.
  • lib/rbs/wasm/serialization_schema.rb — generated table describing every node.
  • lib/rbs/wasm/deserializer.rb — pure-Ruby decoder, the counterpart of ast_translation.c. Locations are rebuilt through the public RBS::Location API, so the same decoder works whether Location is C-backed (CRuby) or pure Ruby (JRuby, step 3).
  • docs/wasm_serialization.md — the wire format.
  • The template generator now also emits .rb files (with a Ruby-style header).

How it's validated (on CRuby, no WASM needed)

The extension exposes _parse_signature_to_bytes / _parse_type_to_bytes / _parse_method_type_to_bytes. test/rbs/wasm/serialization_test.rb parses each input twice — once through the normal C→Ruby translation, once through serialize→deserialize — and asserts the two trees are deeply identical, down to locations and string encodings (stricter than RBS ==, which ignores location/comment).

Coverage: the entire bundled RBS corpus (core + stdlib + sig, ~340 files) plus type and method-type batteries. All green locally (4 tests, 389 assertions).

This de-risks the hardest part of the JRuby work entirely on CRuby, before any WASM/JVM is involved.

Notes

  • Found a latent bug while doing this: rbs_hash_t never updates its length field (unlike rbs_node_list_t, which does in rbs_node_list_append). Nothing currently reads it, so it's harmless today, but the serializer can't trust it — it counts hash entries by walking the list. Worth a separate fix to rbs_hash_set if you'd like.
  • Heads up, unrelated to this PR: a local steep check currently dies with RBS::DuplicatedMethodDefinitionError: Module#ruby2_keywords — it's defined in both core/module.rbs:1701 and steep/patch.rbs:4. That blocks the type-checker from even building its environment here.

Next step (#step 3): load rbs_parser.wasm with Chicory on JRuby, feed its output to this deserializer, and branch lib/rbs.rb on RUBY_ENGINE.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA


Generated by Claude Code

@soutaro soutaro force-pushed the claude/practical-mendel-hnqws8-2 branch from 4ce3226 to 2f4e0c4 Compare June 16, 2026 14:43
Base automatically changed from claude/practical-mendel-hnqws8 to master June 22, 2026 02:20
@soutaro soutaro force-pushed the claude/practical-mendel-hnqws8-2 branch from 2f4e0c4 to b73d3a7 Compare June 22, 2026 05:06
@soutaro soutaro added this to the RBS 4.1 milestone Jun 22, 2026
This is the bridge that lets the WebAssembly build hand a parsed AST back to
Ruby without the Ruby C API: the parser serializes the tree to a compact binary
buffer, and pure-Ruby code rebuilds the same RBS::AST objects on the other side.
This is what will let RBS run on JRuby.

Both ends are generated from config.yml, alongside the existing C -> Ruby
translation, so they stay in sync:

- src/serialize.c (rbs_serialize_node): walks the C AST into the binary format.
- lib/rbs/wasm/serialization_schema.rb: the table the decoder follows.
- lib/rbs/wasm/deserializer.rb: pure-Ruby decoder, the counterpart of
  ast_translation.c. Locations go through the public RBS::Location API so it
  works whether Location is C-backed (CRuby) or pure Ruby (JRuby).

The format is documented in docs/wasm_serialization.md.

To validate it on CRuby, the extension exposes `_parse_*_to_bytes`, and
test/rbs/wasm/serialization_test.rb round-trips the whole bundled RBS corpus
(core/stdlib/sig) plus type/method-type batteries, asserting the rebuilt tree is
deeply identical to the direct C -> Ruby translation, down to locations and
string encodings.

Notably, rbs_hash_t does not maintain its `length` field (unlike
rbs_node_list_t), so the serializer counts hash entries by walking the list.

The template generator now also emits Ruby files (with a Ruby-style header).

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
@soutaro soutaro force-pushed the claude/practical-mendel-hnqws8-2 branch from b73d3a7 to f8d1b67 Compare June 22, 2026 05:14
@soutaro soutaro merged commit e4464c0 into master Jun 22, 2026
26 checks passed
@soutaro soutaro deleted the claude/practical-mendel-hnqws8-2 branch June 22, 2026 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants