Skip to content

Fix environment variable forwarding to ray runtime env#265

Merged
jeffreysijuntan merged 3 commits into
rllm-org:mainfrom
listar2000:fix-ray-runtime-env
Oct 24, 2025
Merged

Fix environment variable forwarding to ray runtime env#265
jeffreysijuntan merged 3 commits into
rllm-org:mainfrom
listar2000:fix-ray-runtime-env

Conversation

@listar2000
Copy link
Copy Markdown
Collaborator

@listar2000 listar2000 commented Oct 24, 2025

What is this PR about?

Fix issue #262

Specifically, this PR now correctly overrides (instead of dropping) the default PPO_RAY_RUNTIME_ENV in ray_runtime_env.py when we export certain environment variables from script. We add a default VLLM_USE_V1 = 1 as we are now pinning verl = 0.5.0 (which uses a version of vllm that embraces the V1 engine).

With the suggestion of @kylemontgomery1, for a more complete fix, we will also forward any driver environment variable with the following prefixes are forwarded:

  • Inference Engines: VLLM_, SGL_, SGLANG_
  • HuggingFace Libraries: HF_, TOKENIZERS_, DATASETS_
  • Training Frameworks: TORCH_, PYTORCH_, DEEPSPEED_, MEGATRON_
  • CUDA/NCCL: NCCL_, CUDA_, CUBLAS_, CUDNN_, NV_, NVIDIA_

We further let the user to specify a flag RLLM_EXCLUDE to rule out any prefix or particular variable that the user wants to exclude from the above forwarding. As an example:

export RLLM_EXCLUDE="VLLM*,CUDA*,NCCL_IB_DISABLE"
# Excludes all VLLM_*, all CUDA_*, and the specific NCCL_IB_DISABLE variable

Corresponding tests and the documentation of the ray_runtime_env module is also added.


This PR also makes sure that ray_init_settings can be properly passed into train_workflow_pipeline (so that it's now consistent with train_agent_ppo.

@kylemontgomery1
Copy link
Copy Markdown
Collaborator

@listar2000 For a complete fix, maybe we can forward the relevant variables from the driver process (in order to ensure env variables not already in PPO_RAY_RUNTIME_ENV get forwarded).

PPO_RAY_RUNTIME_ENV = {
    "env_vars": {
        "TOKENIZERS_PARALLELISM": "true",
        "NCCL_DEBUG": "WARN",
        "VLLM_LOGGING_LEVEL": "WARN",
        "VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true",
        "CUDA_DEVICE_MAX_CONNECTIONS": "1",
        "VLLM_USE_V1": "1",
    },
    "worker_process_setup_hook": "rllm.patches.verl_patch_hook.setup",
}

FORWARD_PREFIXES = (
    "VLLM_", "SGL_", "SGLANG_", 
    "HF_", "TOKENIZERS_", "DATASETS_",
    "TORCH_", "PYTORCH_", "DEEPSPEED_", "MEGATRON_", 
    "NCCL_", "CUDA_", "CUBLAS_", "CUDNN_", "NV_", "NVIDIA_",
)

def get_ppo_ray_runtime_env():
    env = PPO_RAY_RUNTIME_ENV["env_vars"].copy()
    forwarded = {
        k: v for k, v in os.environ.items()
        if any(k.startswith(p) for p in FORWARD_PREFIXES)
    }
    env.update(forwarded)
    return {
        "env_vars": env,
        "worker_process_setup_hook": PPO_RAY_RUNTIME_ENV["worker_process_setup_hook"],
    }

@listar2000
Copy link
Copy Markdown
Collaborator Author

The idea LGTM, but I wonder whether it's possible that this will introduce unwanted side effects (e.g. the user is setting some driver variables not intended for ray, or even for rLLM)?

@listar2000
Copy link
Copy Markdown
Collaborator Author

@kylemontgomery1
Bear with me 😂 for making this PR heavier. My recent commit integrates your suggestion, while also adding a RLLM_EXCLUDE flag to let the user decide what env var (either excluding with prefix matching with VLLM*, or excluding a particular variable name directly).

Since the logic is more complicated (while IMO more robust now), I've added some tests and the documentation for this module as well.

@listar2000 listar2000 changed the title Fix issue #262 (problem with ray_runtime_env) Fix environment variable forwarding to ray runtime env Oct 24, 2025
@kylemontgomery1
Copy link
Copy Markdown
Collaborator

Looks good to me. Thanks!

@jeffreysijuntan jeffreysijuntan merged commit bd57a54 into rllm-org:main Oct 24, 2025
1 check passed
@listar2000 listar2000 deleted the fix-ray-runtime-env branch October 24, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants