Scrape CVE records from the NVD API and flatten the relevant CVSS v3.1 metrics into a CSV, along with the most-starred GitHub repository referenced by each CVE.
Requires Python 3.11+. Dependencies are managed with uv:
uv syncCreate a .env file with API keys:
NVD_API_KEY=your_nvd_api_key
GITHUB_API_KEY=your_github_personal_access_token
NVD_API_KEY— request one at https://nvd.nist.gov/developers/request-an-api-key. Note: the key is currently read from the environment but not sent with NVD requests (the key-based rate limit did not work in testing), so the scraper runs against the unauthenticated rate limit.GITHUB_API_KEY— a GitHub personal access token (public repo scope is sufficient) used to query star counts
scrape_cves.py queries the NVD API one day at a time for the given date range and severity, and writes one JSON file per day to --output_dir. Existing files are skipped, so the scraper is resumable.
uv run scrape_cves.py \
--start_date 2025-01-01 \
--end_date 2025-06-30 \
--severity CRITICAL \
--output_dir dataArguments:
--start_date(required) —YYYY-MM-DD--end_date—YYYY-MM-DD; defaults to today--severity—CRITICAL,HIGH,MEDIUM, orLOW(defaultCRITICAL)--output_dir— defaults to the current directory
Output files are named cves-{YYYY}-{MM}-{DD}-{SEVERITY}.json.
parse_cves.py walks the daily JSON files in --input_dir, extracts CVSS v3.1 metrics, and for each CVE resolves the GitHub reference with the highest star count. Results are appended to --output_file; CVEs already present in the file are skipped.
uv run parse_cves.py \
--start_date 2025-01-01 \
--end_date 2025-06-30 \
--severity CRITICAL \
--input_dir data \
--output_file cves_github.csvCSV columns:
cveid, publishedDate, baseScore, exploitabilityScore, attackVector, attackComplexity, privilegesRequired, userInteraction, scope, impactScore, confidentialityImpact, integrityImpact, availabilityImpact, github_url, n_stars, references
CVEs without cvssMetricV31 are skipped. n_stars is N/A when no GitHub reference is found or the GitHub API call fails.