Skip to content

Fix get_lora_param_count with model-specific per-component table#695

Merged
nealwu merged 3 commits into
thinking-machines-lab:mainfrom
nealwu:neal/fix-lora-param-count
May 11, 2026
Merged

Fix get_lora_param_count with model-specific per-component table#695
nealwu merged 3 commits into
thinking-machines-lab:mainfrom
nealwu:neal/fix-lora-param-count

Conversation

@nealwu
Copy link
Copy Markdown
Member

@nealwu nealwu commented May 7, 2026

Removes the previous implementation, which was off by 10x or more for many newer models, plus required network access and HF auth on gated repos and was often extremely slow.

Now mirrors create_lora_training_client's signature so callers can predict the exact trainable parameter count for any Tinker base model:

get_lora_param_count(model_name, lora_rank=32,
                     train_mlp=True, train_attn=True, train_unembed=True)
total = lora_rank * (mlp * train_mlp +
                     attn * train_attn +
                     unembed * train_unembed)

Removed kwargs: detailed, include_experts, shared_expert_outer_loras.

Tests verify the function reproduces an independently-maintained reference table for all 31 Tinker base models across 7 ranks and 7 valid flag combinations (217 parametrizations × 7 ranks).

Next up: updating docs to reflect the new signature.

Mirrors `create_lora_training_client`'s signature so callers can predict
the exact trainable parameter count for any Tinker base model:

```
get_lora_param_count(model_name, lora_rank=32,
                     train_mlp=True, train_attn=True, train_unembed=True)
```

Removes the previous implementation, which parsed HuggingFace
safetensors headers and was off by 10x or more for many models, plus
required network access and HF auth on gated repos.

```
total = lora_rank * (mlp * train_mlp + attn * train_attn
                     + unembed * train_unembed
                     + mlp_attn_extra * (train_mlp and train_attn))
```

The `mlp_attn_extra` term is NemotronH-specific: SSM/Mamba layers are
LoRA-adapted only when both `train_mlp` and `train_attn` are True.

Removed kwargs: `detailed`, `include_experts`,
`shared_expert_outer_loras`.

Tests verify the function reproduces an independently-maintained
reference table for all 31 Tinker base models across 7 ranks and 7
valid flag combinations (217 parametrizations × 7 ranks).

Next up: updating docs to reflect the new signature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nealwu nealwu changed the title Fix get_lora_param_count with hardcoded per-component table Fix get_lora_param_count with model-specific per-component table May 7, 2026
@opherlieber
Copy link
Copy Markdown
Contributor

Maybe add a test to validate that all models in model_info support get_lora_param_count?

@nealwu
Copy link
Copy Markdown
Member Author

nealwu commented May 7, 2026

Maybe add a test to validate that all models in model_info support get_lora_param_count?

hmm I like this idea! but the problem is it has a bunch of models that aren't actually on Tinker:

8 models live in model_info but not in our params dict — none of them appear in models.json (Tinker's served-models list), so they likely exist in model_info for renderer/HF-metadata purposes only:

Qwen/Qwen3-0.6B
Qwen/Qwen3-1.7B
Qwen/Qwen3-4B
Qwen/Qwen3-4B-Base
Qwen/Qwen3-14B
Qwen/Qwen3-14B-Base
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct

@nealwu nealwu requested a review from opherlieber May 8, 2026 21:27
For the two Nemotron-H models, derive per-component (mlp / attn /
unembed) param counts by partitioning the all-True adapter's tensors
into groups by sub-layer name, rather than measuring each flag
combination directly. The result is additive across all 7 valid flag
combinations, matching every other model in the table.

This changes the values returned by `get_lora_param_count` for partial
flag combinations on the two Nemotron-H models; the default (all-True)
total is unchanged.

Drops the now-unused `mlp_attn_extra` branch and its corresponding key
in the lookup dict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonguozirui
Copy link
Copy Markdown

simonguozirui commented May 9, 2026

LGTM after @nealwu walked me through his measurement methodology (I also ran some to validate) and mirroring of the class signature is quite neat. So we need to update this list for every model release? Also that mlp_attn_extra abstraction hold up for the future (right now just for ssm part of NemotronH)

Do note if developers and researchers want to compute the number of params themselves and cross reference, we could let them know LoRA counts for MoEs are using shared-outer-LoRA for experts.

@pytest.mark.parametrize("model_name", sorted(_REFERENCE_PARAMS_PER_RANK.keys()))
def test_get_lora_param_count_matches_measurements(
self, model_name: str, flag_combo: tuple[bool, bool, bool]
) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this test and _LORA_PARAMS_PER_RANK_BY_COMPONENT seem to come from same measurement, so mostly testing adding up the components, is that the goal of the test? that won't cover any actual lora count measurement issues.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the difference is the test doesn't assume addition of components and actually measures each configuration, so if there are any inconsistencies (like we saw here) the test would help pick it up

@nealwu
Copy link
Copy Markdown
Member Author

nealwu commented May 11, 2026

LGTM after @nealwu walked me through his measurement methodology (I also ran some to validate) and mirroring of the class signature is quite neat. So we need to update this list for every model release? Also that mlp_attn_extra abstraction hold up for the future (right now just for ssm part of NemotronH)

Do note if developers and researchers want to compute the number of params themselves and cross reference, we could let them know LoRA counts for MoEs are using shared-outer-LoRA for experts.

(the mlp_attn_extra abstraction is removed now FYI, was not correct)

and on "shared-outer-LoRA for experts" sounds good, I'll update the docs to explain this and have you help double check it!

@nealwu nealwu merged commit 76133f8 into thinking-machines-lab:main May 11, 2026
6 checks passed
@nealwu nealwu deleted the neal/fix-lora-param-count branch May 11, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants