vaers_complete.py is a comprehensive Python script for processing VAERS (Vaccine Adverse Event Reporting System) data with advanced features including multi-core parallel processing, memory-efficient chunked data handling, and comprehensive change tracking across CDC data releases.
Original Author: Gary Hawkins - http://univaers.com/download/
Enhanced Version: 2025 by Jason Page
bash
pip install pandas numpy tqdm zipfile-deflate64
bash
python vaers_complete.py [OPTIONS]
--dataset {covid,full}
Default: covid
Selects which dataset to process:
covid: Process COVID-19 era data only (from 2020-12-13 onwards by default)
full: Process full historical VAERS dataset (from 1990-01-01 onwards by default)
bash
python vaers_complete.py --dataset covid
python vaers_complete.py --dataset full
#### --cores NUMBER
Default: Number of CPU cores available on system
Specifies the number of CPU cores to use for parallel processing.
Examples:
bash
python vaers_complete.py --cores 8
python vaers_complete.py --cores 16
python vaers_complete.py --dataset full --cores 4
#### --chunk-size NUMBER
Default: 50000
Sets the chunk size for processing large datasets. Larger chunks use more memory but may be faster. Smaller chunks are more memory-efficient.
Examples:
bash
python vaers_complete.py --chunk-size 100000
python vaers_complete.py --chunk-size 25000
#### --date-floor DATE
Default: 2020-12-13 for COVID dataset, 1990-01-01 for full dataset
Sets the earliest date to process (format: YYYY-MM-DD). Records before this date will be excluded.
Examples:
bash
python vaers_complete.py --date-floor 2021-01-01
python vaers_complete.py --dataset full --date-floor 2000-01-01
#### --date-ceiling DATE
Default: 2025-01-01
Sets the latest date to process (format: YYYY-MM-DD). Records after this date will be excluded.
Examples:
bash
python vaers_complete.py --date-ceiling 2024-12-31
python vaers_complete.py --date-floor 2020-01-01 --date-ceiling 2023-12-31
#### --test
Default: Not set
Uses test cases directory (ztestcases) instead of the main working directory. Useful for development and testing.
Example:
bash
python vaers_complete.py --test
#### --no-progress
Default: Not set
Disables progress bars. Useful for logging output to files or when running in environments without terminal support.
Example:
bash
python vaers_complete.py --no-progress > output.log
#### --merge-only
Default: Not set
Skips all processing and only creates the final merged file from existing processed data. Useful when you want to regenerate the final output without reprocessing everything.
Example:
bash
python vaers_complete.py --merge-only
bash
python vaers_complete.py --dataset covid --cores 8
bash
python vaers_complete.py --dataset full --cores 16 --chunk-size 100000
bash
python vaers_complete.py --dataset covid --date-floor 2021-01-01
bash
python vaers_complete.py --date-floor 2021-01-01 --date-ceiling 2023-12-31 --cores 8
bash
python vaers_complete.py --dataset covid --chunk-size 25000 --cores 4
bash
python vaers_complete.py --merge-only
bash
python vaers_complete.py --test --cores 4
bash
python vaers_complete.py --dataset covid --no-progress > processing.log 2>&1
.
├── 0VAERSDownloads/ # Input: Raw VAERS ZIP files from CDC
├── 1vaersworking/ # Intermediate: Extracted CSV files
├── 1vaersconsolidated/ # Intermediate: Consolidated data files
├── 2vaersfull_compared/ # Output: Comparison results with change tracking
├── 3vaersflattened/ # Intermediate: Flattened data (one row per VAERS_ID)
├── stats.csv # Output: Processing statistics
├── neverpublishedany.txt # Output: VAERS IDs never published
├── everpublishedany.txt # Output: All VAERS IDs ever published
├── everpublishedcovid.txt # Output: COVID-related VAERS IDs
├── writeups_deduped.txt # Output: Deduplicated symptom descriptions
└── VAERSFINALMERGED.csv # Final output: Complete merged dataset
--test flag:
ztestcases/
├── drops/ # Input: Test VAERS data
├── 1vaersworking/
├── 1vaersconsolidated/
├── 2vaersfull_compared/
├── 3vaersflattened/
└── [output files]
*VAERSDATA.csv - Main report data
*VAERSVAX.csv - Vaccination details
*VAERSSYMPTOMS.csv - Symptom entries
1vaersconsolidated/
3vaersflattened/
changes column
2vaersfull_compared/
VAERSFINALMERGED.csv
VAERSFINALMERGED.csv
cell_edits, status, changes
stats.csv
neverpublishedany.txt
everpublishedany.txt
everpublishedcovid.txt
writeups_deduped.txt
VAERS_ID - Unique report identifier
AGE_YRS, SEX, STATE - Demographic information
DIED, LTHREAT, ERVISIT, HOSPITAL, DISABLE - Serious outcomes
VAXTYPE, VAXMANU, VAX_LOT - Vaccine information
VAXDATE, ONSETDATE, RPT_DATE - Date information
SYMPTOM_TEXT - Symptom description
cell_edits - Count of cells modified across all releases
status - Report status (new, modified, deleted)
changes - Detailed log of all changes made to the report
symptom_entries - Aggregated symptom entries
bash
python vaers_complete.py --dataset covid --cores 16 --chunk-size 100000
bash
python vaers_complete.py --dataset covid --cores 4 --chunk-size 25000
bash
python vaers_complete.py --dataset full --cores 16 --chunk-size 50000
2023-01-15: AGE_YRS changed from "45" to "46"
2023-01-15: SYMPTOM_TEXT appended with "Patient recovered"
--chunk-size to 25000 or lower
--cores to use fewer parallel processes
--date-floor and --date-ceiling
pip install tqdm
--no-progress if not needed
pip install zipfile-deflate64
0VAERSDownloads/ directory
No changelog available.