set backend correctly for CUDA+FSDP2+cpu-offload#3574
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@SunMarc Hi, it is really a nice patch! However, I found a corner case when setting fsdp using kwargs like this: Accelerator(
gradient_accumulation_steps=1,
mixed_precision='bf16',
fsdp_plugin=FullyShardedDataParallelPlugin(
fsdp_version=2,
cpu_offload=True,
),
)Currently, I have to set the backend explicitly to avoid the error, but didn't have time to find a final solution to this. Accelerator(
gradient_accumulation_steps=1,
mixed_precision='bf16',
fsdp_plugin=FullyShardedDataParallelPlugin(
fsdp_version=2,
cpu_offload=True,
),
kwargs_handlers=[
InitProcessGroupKwargs(
backend="cuda:nccl,cpu:gloo"
),
]
) |
|
Indeed that's an edge case that we might need to fix if we want to allow users to depend only on the plugin in the future. cc @S1ro1 |
|
I guess the easiest way for now is to update kwargs that is passed in partial state depending on |
I think we should just set gloo by default together with nccl if fsdp2 is happening, i.e. async checkpointing I work on also requires gloo so I feel like defaulting to both is sensible, even costing a little overhead in launch |
|
Okay, then we can do that in the async checkpoint pr |
What does this PR do?
Supersedes #3544