Skip to content

Add Human-SL KGS-rank ladder (gtp_human<rank>.cfg, 9d→20k) + tunehuman subcommand#1209

Draft
ChinChangYang wants to merge 1 commit into
lightvector:masterfrom
ChinChangYang:gtp-human-rank-configs
Draft

Add Human-SL KGS-rank ladder (gtp_human<rank>.cfg, 9d→20k) + tunehuman subcommand#1209
ChinChangYang wants to merge 1 commit into
lightvector:masterfrom
ChinChangYang:gtp-human-rank-configs

Conversation

@ChinChangYang

@ChinChangYang ChinChangYang commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a tunehuman subcommand and a set of GTP configs (gtp_human<rank>.cfg) that make
KataGo + the Human-SL net play at a chosen amateur KGS rank, from 9d down to 20k,
where each consecutive rank is exactly 1 KGS rank (1 stone) apart.

tunehuman subcommand

Calibrates humanSLChosenMovePiklLambda (the Human-SL strength dial) to a target winrate by
playing in-process candidate-vs-baseline games at fixed visits:

  • -komi / -cand-color flags realize the KGS 1-rank handicap (weaker rank as Black,
    komi 0.5, no color alternation).
  • Inherits the baseline config's ruleset, so tuning is scored exactly like deployed play.
  • Resumable per-round checkpointing (-resume-file).
  • Unit tests in cpp/tests/testhumansltuner.cpp (run via katago runtests).

The ladder

29 configs, gtp_human{9d…1d,1k…20k}.cfg. The Human-SL net is conditioned on KGS rank, so a
1-rank gap is well-defined: per KGS, 1 rank = 1 stone, realized as an even game where the
stronger side (White) gets no komi compensation (komi 0.5). Each rung is tuned so the weaker
rank (Black, komi 0.5) is an even game (50%) vs the rung above, with the 95% CI ⊂
[40%, 60%]
, under Japanese rules.

  • λ rises from 0.045 (9d) to 1.223 (20k); the deep-kyu rungs need near-pure-human play
    because the net's rank profiles compress at the weakest end.
  • docs/HumanSL_Rank_Ladder.md documents the method, ruleset rationale, reproduction commands,
    the full results table (λ / win rate / 95% CI / games per rung), and findings.
  • ladder_step.sh + tune_decide.py + tune_{lambda,maxvisits}.sh are the automated
    sequential root-finder harness that produced the ladder.

Results

Experimental calibration results — every rung's 95% Wilson CI of win rate is ⊂ [40%, 60%].
The 9d row is even-game parity vs the modern rank_9d profile; every other row is the
komi-0.5 handicap win rate (weaker rank as Black) vs the rung above it.

Rank λ (humanSLChosenMovePiklLambda) 95% CI of win rate
9d (anchor) 0.04500 [39.4%, 58.7%]
8d 0.08680 [40.5%, 53.6%]
7d 0.12670 [40.6%, 56.7%]
6d 0.19830 [44.1%, 59.8%]
5d 0.28064 [43.6%, 58.8%]
4d 0.37300 [43.9%, 56.1%]
3d 0.45556 [43.1%, 59.7%]
2d 0.51330 [41.9%, 58.1%]
1d 0.50930 [42.5%, 55.7%]
1k 0.48988 [42.5%, 58.9%]
2k 0.46755 [40.8%, 55.7%]
3k 0.49173 [41.5%, 58.5%]
4k 0.47130 [40.5%, 55.8%]
5k 0.50720 [43.6%, 58.9%]
6k 0.48925 [42.0%, 59.6%]
7k 0.53370 [41.8%, 60.0%]
8k 0.50640 [40.2%, 58.1%]
9k 0.53880 [41.3%, 54.9%]
10k 0.59036 [42.0%, 58.0%]
11k 0.56458 [40.5%, 55.8%]
12k 0.54297 [42.2%, 59.3%]
13k 0.58977 [42.1%, 59.4%]
14k 0.61625 [41.3%, 58.7%]
15k 0.61839 [40.2%, 58.1%]
16k 0.67050 [42.1%, 57.9%]
17k 0.74130 [40.9%, 55.7%]
18k 0.78210 [40.4%, 52.2%]
19k 0.89820 [41.0%, 59.0%]
20k 1.22270 [40.6%, 59.4%]

Notes

  • Draft — opening for discussion/review. Configs use humanSLProfile = preaz_<rank>
    (pre-AlphaZero KGS-rank profiles) at 400 visits. The subcommand is backend-agnostic.

🤖 Generated with Claude Code

https://claude.ai/code/session_01L2nqY5X9rSVpH65nWHCPaF

…n subcommand

Adds a `tunehuman` subcommand and a complete set of GTP configs that make KataGo
(with the Human-SL net) play at a chosen amateur rank from 9d down to 20k, where
each consecutive rank is exactly 1 KGS rank (1 stone) apart.

tunehuman subcommand (cpp/command/tunehuman.cpp, cpp/program/humansltuner.{cpp,h}):
- Plays in-process candidate-vs-baseline games and calibrates
  humanSLChosenMovePiklLambda (the strength dial) to a target winrate at fixed
  visits, reading the raw winrate (robust to the steep, ceiling-biased λ curve).
- -komi / -cand-color flags for the KGS 1-rank handicap (weaker rank as Black,
  komi 0.5, no color alternation); inherits the baseline config's ruleset so
  tuning is scored exactly like deployed play.
- Resumable per-round checkpointing (-resume-file) to survive long runs.
- Unit tests in cpp/tests/testhumansltuner.cpp (run via `katago runtests`).

The ladder (cpp/configs/gtp_human{9d..1d,1k..20k}.cfg):
- 29 configs. The 9d anchor is even-game parity vs the modern rank_9d profile;
  every weaker rank is tuned so it (Black, komi 0.5) is an even game (50%) vs the
  rung above it, with the 95% Wilson CI inside [40%, 60%]. Japanese rules.
- λ rises 0.045 (9d) → 1.223 (20k); the deep-kyu rungs need near-pure-human play
  because the Human-SL net's rank profiles compress at the weakest end.

docs/HumanSL_Rank_Ladder.md documents the method, ruleset rationale, reproduction
commands, the full results table (λ / win rate / 95% CI / games per rung), and
findings. ladder_step.sh + tune_decide.py + tune_{lambda,maxvisits}.sh are the
automated sequential root-finder harness that produced the ladder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01L2nqY5X9rSVpH65nWHCPaF
ChinChangYang added a commit to ChinChangYang/KataGo that referenced this pull request Jun 26, 2026
…<year>)

Consolidate rank_*/preaz_* into unified <rank> keys driven by the
empirically-tuned KGS-rank ladder from lightvector/KataGo PR lightvector#1209
(preaz_<rank> + tuned humanSLChosenMovePiklLambda), and rename
proyear_<year> to "Pro <year>" derived from the 9d config with lambda 0.06.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WoQvc49FJf5btyZsHLawb3
ChinChangYang added a commit to ChinChangYang/KataGo that referenced this pull request Jun 26, 2026
TDD plan: rewrite HumanSLModel around clean menu keys (lightvector#1209 preaz_<rank>
ladder + Pro <year>), legacy input normalization, picker wiring, and a
3-platform verification pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WoQvc49FJf5btyZsHLawb3
ChinChangYang added a commit to ChinChangYang/KataGo that referenced this pull request Jun 26, 2026
Consolidate rank_*/preaz_* into single <rank> keys driven by PR lightvector#1209's
tuned KGS-rank ladder (preaz_<rank> + per-rank humanSLChosenMovePiklLambda),
rename proyear_<year> to 'Pro <year>' (9d-derived, lambda 0.06), and add
legacy input normalization. AI profile behavior preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WoQvc49FJf5btyZsHLawb3
ChinChangYang added a commit to ChinChangYang/KataGo that referenced this pull request Jun 26, 2026
Human-profile engine moves search a fixed 400 visits (lightvector#1209 calibration
point) instead of the time budget, so a rank plays at its calibrated
strength; "Time per move" applies only to the AI profile (human profiles
get an Engine/Human toggle). Continuous analysis stays unbounded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WoQvc49FJf5btyZsHLawb3
ChinChangYang added a commit to ChinChangYang/KataGo that referenced this pull request Jun 26, 2026
Add GtpCommandBuilder.searchBudgetCommands: AI profile stays time-bounded
with unbounded visits; human rank/pro profiles use a fixed 400 visits
(lightvector#1209 calibration point) + a safety time cap, ignoring Time per move.
genMoveAnalyzeCommands now takes the effective profile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WoQvc49FJf5btyZsHLawb3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant