Skip to content

Fix device KeyError in tied_params_map#3403

Merged
SunMarc merged 1 commit into
huggingface:mainfrom
dvrogozh:nokey
Mar 25, 2025
Merged

Fix device KeyError in tied_params_map#3403
SunMarc merged 1 commit into
huggingface:mainfrom
dvrogozh:nokey

Conversation

@dvrogozh
Copy link
Copy Markdown
Contributor

@dvrogozh dvrogozh commented Feb 20, 2025

Fixes: #3402

The #3448 is a better way to fix the reported #3402, but the fix is XPU specific (as well as 3402 to be fair). I do worry that the issue might still exists for accelerators other than cuda and xpu which got aligned behavior after 3448. I don't have a way to verify that however. So, I am rebasing this PR and leave that to maintainers and users/developers of non-cuda/xpu devices to take it from here if needed. Better way however might be to aligned behavior of these accelerators on pytorch side and then make a fix similar to 3448.

CC: @SunMarc @faaany @zucchini-nlp

@dvrogozh
Copy link
Copy Markdown
Contributor Author

@SunMarc @faaany @zucchini-nlp : at the moment this PR just adds if condition to avoid stepping into KeyError. However, I am not sure why this situation happens. I afraid that I might not have addressed actual issue and just fixed symptom. Can someone help suggest a better fix or explain why this fix would be correct one?

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@dvrogozh
Copy link
Copy Markdown
Contributor Author

@SunMarc, @muellerzr : the #3448 is a better way to fix the reported #3402, but the fix is XPU specific (as well as 3402 to be fair). I do worry that the issue might still exists for accelerators other than cuda and xpu which got aligned behavior after 3448. I don't have a way to verify that however. So, I am rebasing this PR and leave that to maintainers and users./developers of non-cuda/xpu devices to take it from here if needed. Better way however might be to aligned behavior of these accelerators on pytorch side and then make a fix similar to 3448.

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report. Let's merge this nevertheless just to be more careful

Comment thread src/accelerate/hooks.py Outdated
Fixes: huggingface#3402

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc merged commit 8ab01d3 into huggingface:main Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transformers test_cpu_offload tests fail with KeyError: 'xpu:0'

3 participants