Fix `get_lora_param_count` with model-specific per-component table by nealwu · Pull Request #695 · thinking-machines-lab/tinker-cookbook

nealwu · 2026-05-07T18:30:04Z

Removes the previous implementation, which was off by 10x or more for many newer models, plus required network access and HF auth on gated repos and was often extremely slow.

Now mirrors create_lora_training_client's signature so callers can predict the exact trainable parameter count for any Tinker base model:

get_lora_param_count(model_name, lora_rank=32,
                     train_mlp=True, train_attn=True, train_unembed=True)

total = lora_rank * (mlp * train_mlp +
                     attn * train_attn +
                     unembed * train_unembed)

Removed kwargs: detailed, include_experts, shared_expert_outer_loras.

Tests verify the function reproduces an independently-maintained reference table for all 31 Tinker base models across 7 ranks and 7 valid flag combinations (217 parametrizations × 7 ranks).

Next up: updating docs to reflect the new signature.

Mirrors `create_lora_training_client`'s signature so callers can predict the exact trainable parameter count for any Tinker base model: ``` get_lora_param_count(model_name, lora_rank=32, train_mlp=True, train_attn=True, train_unembed=True) ``` Removes the previous implementation, which parsed HuggingFace safetensors headers and was off by 10x or more for many models, plus required network access and HF auth on gated repos. ``` total = lora_rank * (mlp * train_mlp + attn * train_attn + unembed * train_unembed + mlp_attn_extra * (train_mlp and train_attn)) ``` The `mlp_attn_extra` term is NemotronH-specific: SSM/Mamba layers are LoRA-adapted only when both `train_mlp` and `train_attn` are True. Removed kwargs: `detailed`, `include_experts`, `shared_expert_outer_loras`. Tests verify the function reproduces an independently-maintained reference table for all 31 Tinker base models across 7 ranks and 7 valid flag combinations (217 parametrizations × 7 ranks). Next up: updating docs to reflect the new signature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

opherlieber · 2026-05-07T19:15:55Z

Maybe add a test to validate that all models in model_info support get_lora_param_count?

nealwu · 2026-05-07T19:28:48Z

Maybe add a test to validate that all models in model_info support get_lora_param_count?

hmm I like this idea! but the problem is it has a bunch of models that aren't actually on Tinker:

8 models live in model_info but not in our params dict — none of them appear in models.json (Tinker's served-models list), so they likely exist in model_info for renderer/HF-metadata purposes only:

Qwen/Qwen3-0.6B
Qwen/Qwen3-1.7B
Qwen/Qwen3-4B
Qwen/Qwen3-4B-Base
Qwen/Qwen3-14B
Qwen/Qwen3-14B-Base
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct

For the two Nemotron-H models, derive per-component (mlp / attn / unembed) param counts by partitioning the all-True adapter's tensors into groups by sub-layer name, rather than measuring each flag combination directly. The result is additive across all 7 valid flag combinations, matching every other model in the table. This changes the values returned by `get_lora_param_count` for partial flag combinations on the two Nemotron-H models; the default (all-True) total is unchanged. Drops the now-unused `mlp_attn_extra` branch and its corresponding key in the lookup dict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

simonguozirui · 2026-05-09T05:48:44Z

LGTM after @nealwu walked me through his measurement methodology (I also ran some to validate) and mirroring of the class signature is quite neat. So we need to update this list for every model release? Also that mlp_attn_extra abstraction hold up for the future (right now just for ssm part of NemotronH)

Do note if developers and researchers want to compute the number of params themselves and cross reference, we could let them know LoRA counts for MoEs are using shared-outer-LoRA for experts.

simonguozirui · 2026-05-09T05:53:33Z

+    @pytest.mark.parametrize("model_name", sorted(_REFERENCE_PARAMS_PER_RANK.keys()))
+    def test_get_lora_param_count_matches_measurements(
+        self, model_name: str, flag_combo: tuple[bool, bool, bool]
+    ) -> None:


nit: this test and _LORA_PARAMS_PER_RANK_BY_COMPONENT seem to come from same measurement, so mostly testing adding up the components, is that the goal of the test? that won't cover any actual lora count measurement issues.

the difference is the test doesn't assume addition of components and actually measures each configuration, so if there are any inconsistencies (like we saw here) the test would help pick it up

nealwu · 2026-05-11T16:15:43Z

LGTM after @nealwu walked me through his measurement methodology (I also ran some to validate) and mirroring of the class signature is quite neat. So we need to update this list for every model release? Also that mlp_attn_extra abstraction hold up for the future (right now just for ssm part of NemotronH)

Do note if developers and researchers want to compute the number of params themselves and cross reference, we could let them know LoRA counts for MoEs are using shared-outer-LoRA for experts.

(the mlp_attn_extra abstraction is removed now FYI, was not correct)

and on "shared-outer-LoRA for experts" sounds good, I'll update the docs to explain this and have you help double check it!

nealwu requested review from dphuang2, erikwijmans and simonguozirui May 7, 2026 18:30

nealwu changed the title ~~Fix get_lora_param_count with hardcoded per-component table~~ Fix get_lora_param_count with model-specific per-component table May 7, 2026

Fix formatting

fdc3408

nealwu requested a review from opherlieber May 8, 2026 21:27

erikwijmans approved these changes May 9, 2026

View reviewed changes

simonguozirui reviewed May 9, 2026

View reviewed changes

simonguozirui approved these changes May 9, 2026

View reviewed changes

nealwu merged commit 76133f8 into thinking-machines-lab:main May 11, 2026
6 checks passed

nealwu deleted the neal/fix-lora-param-count branch May 11, 2026 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `get_lora_param_count` with model-specific per-component table#695

Fix `get_lora_param_count` with model-specific per-component table#695
nealwu merged 3 commits into
thinking-machines-lab:mainfrom
nealwu:neal/fix-lora-param-count

nealwu commented May 7, 2026 •

edited

Loading

Uh oh!

opherlieber commented May 7, 2026

Uh oh!

nealwu commented May 7, 2026

Uh oh!

simonguozirui commented May 9, 2026 •

edited

Loading

Uh oh!

simonguozirui May 9, 2026

Uh oh!

nealwu May 11, 2026

Uh oh!

nealwu commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nealwu commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opherlieber commented May 7, 2026

Uh oh!

nealwu commented May 7, 2026

Uh oh!

simonguozirui commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonguozirui May 9, 2026

Choose a reason for hiding this comment

Uh oh!

nealwu May 11, 2026

Choose a reason for hiding this comment

Uh oh!

nealwu commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nealwu commented May 7, 2026 •

edited

Loading

simonguozirui commented May 9, 2026 •

edited

Loading