Fix device KeyError in tied_params_map by dvrogozh · Pull Request #3403 · huggingface/accelerate

dvrogozh · 2025-02-20T02:53:34Z

The #3448 is a better way to fix the reported #3402, but the fix is XPU specific (as well as 3402 to be fair). I do worry that the issue might still exists for accelerators other than cuda and xpu which got aligned behavior after 3448. I don't have a way to verify that however. So, I am rebasing this PR and leave that to maintainers and users/developers of non-cuda/xpu devices to take it from here if needed. Better way however might be to aligned behavior of these accelerators on pytorch side and then make a fix similar to 3448.

CC: @SunMarc @faaany @zucchini-nlp

dvrogozh · 2025-02-20T02:56:37Z

@SunMarc @faaany @zucchini-nlp : at the moment this PR just adds if condition to avoid stepping into KeyError. However, I am not sure why this situation happens. I afraid that I might not have addressed actual issue and just fixed symptom. Can someone help suggest a better fix or explain why this fix would be correct one?

github-actions · 2025-03-22T15:06:23Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

dvrogozh · 2025-03-24T17:18:11Z

@SunMarc, @muellerzr : the #3448 is a better way to fix the reported #3402, but the fix is XPU specific (as well as 3402 to be fair). I do worry that the issue might still exists for accelerators other than cuda and xpu which got aligned behavior after 3448. I don't have a way to verify that however. So, I am rebasing this PR and leave that to maintainers and users./developers of non-cuda/xpu devices to take it from here if needed. Better way however might be to aligned behavior of these accelerators on pytorch side and then make a fix similar to 3448.

SunMarc

Thanks for the report. Let's merge this nevertheless just to be more careful

Fixes: huggingface#3402 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

HuggingFaceDocBuilderDev · 2025-03-25T15:22:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dvrogozh mentioned this pull request Feb 20, 2025

Transformers test_cpu_offload tests fail with KeyError: 'xpu:0' #3402

Closed

dvrogozh force-pushed the nokey branch from 51a43e4 to 6bc2517 Compare March 24, 2025 17:13

SunMarc approved these changes Mar 25, 2025

View reviewed changes

Comment thread src/accelerate/hooks.py Outdated

Fix device KeyError in tied_params_map

a533d68

Fixes: huggingface#3402 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh force-pushed the nokey branch from 6bc2517 to a533d68 Compare March 25, 2025 15:03

SunMarc merged commit 8ab01d3 into huggingface:main Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix device KeyError in tied_params_map#3403

Fix device KeyError in tied_params_map#3403
SunMarc merged 1 commit into
huggingface:mainfrom
dvrogozh:nokey

dvrogozh commented Feb 20, 2025 •

edited

Loading

Uh oh!

dvrogozh commented Feb 20, 2025

Uh oh!

github-actions Bot commented Mar 22, 2025

Uh oh!

dvrogozh commented Mar 24, 2025

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dvrogozh commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvrogozh commented Feb 20, 2025

Uh oh!

github-actions Bot commented Mar 22, 2025

Uh oh!

dvrogozh commented Mar 24, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dvrogozh commented Feb 20, 2025 •

edited

Loading