scrape_cves

Scrape CVE records from the NVD API and flatten the relevant CVSS v3.1 metrics into a CSV, along with the most-starred GitHub repository referenced by each CVE.

Setup

Requires Python 3.11+. Dependencies are managed with uv:

uv sync

Create a .env file with API keys:

NVD_API_KEY=your_nvd_api_key
GITHUB_API_KEY=your_github_personal_access_token

NVD_API_KEY — request one at https://nvd.nist.gov/developers/request-an-api-key. Note: the key is currently read from the environment but not sent with NVD requests (the key-based rate limit did not work in testing), so the scraper runs against the unauthenticated rate limit.
GITHUB_API_KEY — a GitHub personal access token (public repo scope is sufficient) used to query star counts

Usage

1. Scrape raw CVE JSON from NVD

scrape_cves.py queries the NVD API one day at a time for the given date range and severity, and writes one JSON file per day to --output_dir. Existing files are skipped, so the scraper is resumable.

uv run scrape_cves.py \
    --start_date 2025-01-01 \
    --end_date 2025-06-30 \
    --severity CRITICAL \
    --output_dir data

Arguments:

--start_date (required) — YYYY-MM-DD
--end_date — YYYY-MM-DD; defaults to today
--severity — CRITICAL, HIGH, MEDIUM, or LOW (default CRITICAL)
--output_dir — defaults to the current directory

Output files are named cves-{YYYY}-{MM}-{DD}-{SEVERITY}.json.

2. Parse CVE JSON into a CSV

parse_cves.py walks the daily JSON files in --input_dir, extracts CVSS v3.1 metrics, and for each CVE resolves the GitHub reference with the highest star count. Results are appended to --output_file; CVEs already present in the file are skipped.

uv run parse_cves.py \
    --start_date 2025-01-01 \
    --end_date 2025-06-30 \
    --severity CRITICAL \
    --input_dir data \
    --output_file cves_github.csv

CSV columns:

cveid, publishedDate, baseScore, exploitabilityScore, attackVector, attackComplexity, privilegesRequired, userInteraction, scope, impactScore, confidentialityImpact, integrityImpact, availabilityImpact, github_url, n_stars, references

CVEs without cvssMetricV31 are skipped. n_stars is N/A when no GitHub reference is found or the GitHub API call fails.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
parse_cves.py		parse_cves.py
pyproject.toml		pyproject.toml
scrape_cves.py		scrape_cves.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrape_cves

Setup

Usage

1. Scrape raw CVE JSON from NVD

2. Parse CVE JSON into a CSV

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scrape_cves

Setup

Usage

1. Scrape raw CVE JSON from NVD

2. Parse CVE JSON into a CSV

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages