VAERSdrops (beta)

← Back

VAERSdrops (beta)

By Page Telegram

💛 Support Page Telegram directly 💳 Donate

Current Version: 1.2.5

Unique Views

Free Downloads

Donation Downloads

VAERS Complete - Enhanced Data Processing Script

Overview

vaers_complete.py is a comprehensive Python script for processing VAERS (Vaccine Adverse Event Reporting System) data with advanced features including multi-core parallel processing, memory-efficient chunked data handling, and comprehensive change tracking across CDC data releases. Original Author: Gary Hawkins - http://univaers.com/download/ Enhanced Version: 2025 by Jason Page

Features

✓ Multi-core parallel processing for faster execution
✓ Memory-efficient chunked data handling for large datasets
✓ Command-line dataset selection (COVID-19 era or full historical data)
✓ Progress bars for all major operations
✓ Comprehensive error tracking and reporting
✓ Fixed statistics functionality
✓ Change detection and tracking across data releases
✓ Deduplication and data consolidation
✓ Complete audit trail of modifications to VAERS reports

Requirements

Python Dependencies

bash
pip install pandas numpy tqdm zipfile-deflate64

pandas: Data manipulation and analysis
numpy: Numerical operations
tqdm: Progress bars (optional but recommended)
zipfile-deflate64: Enhanced ZIP file handling (optional, falls back to standard zipfile)

System Requirements

Python 3.x
Multi-core CPU recommended for parallel processing
Sufficient RAM for large dataset processing (16GB+ recommended for full dataset)

Command-Line Options

Basic Syntax

bash
python vaers_complete.py [OPTIONS]

Options Reference

--dataset {covid,full}

Default

covid

covid: Process COVID-19 era data only (from 2020-12-13 onwards by default)
full: Process full historical VAERS dataset (from 1990-01-01 onwards by default)

Examples

bash
python vaers_complete.py --dataset covid
python vaers_complete.py --dataset full

--cores NUMBER

Default

Examples

bash
python vaers_complete.py --cores 8
python vaers_complete.py --cores 16
python vaers_complete.py --dataset full --cores 4

--chunk-size NUMBER

Default

50000

Examples

bash
python vaers_complete.py --chunk-size 100000
python vaers_complete.py --chunk-size 25000

--date-floor DATE

Default

2020-12-13

1990-01-01

Examples

bash
python vaers_complete.py --date-floor 2021-01-01
python vaers_complete.py --dataset full --date-floor 2000-01-01

--date-ceiling DATE

Default

2025-01-01

Examples

bash
python vaers_complete.py --date-ceiling 2024-12-31
python vaers_complete.py --date-floor 2020-01-01 --date-ceiling 2023-12-31

--test

Default

ztestcases

Example

bash
python vaers_complete.py --test

--no-progress

Default

Example

bash
python vaers_complete.py --no-progress > output.log

--merge-only

Default

Example

bash
python vaers_complete.py --merge-only

Usage Examples

Process COVID-19 data with 8 cores

bash
python vaers_complete.py --dataset covid --cores 8

Process full historical dataset with 16 cores and larger chunks

bash
python vaers_complete.py --dataset full --cores 16 --chunk-size 100000

Process COVID data from a specific start date

bash
python vaers_complete.py --dataset covid --date-floor 2021-01-01

Process data for a specific date range

bash
python vaers_complete.py --date-floor 2021-01-01 --date-ceiling 2023-12-31 --cores 8

Process with smaller chunks for memory-constrained systems

bash
python vaers_complete.py --dataset covid --chunk-size 25000 --cores 4

Create final merged file only

bash
python vaers_complete.py --merge-only

Run with test data

bash
python vaers_complete.py --test --cores 4

Process without progress bars (for logging)

bash
python vaers_complete.py --dataset covid --no-progress > processing.log 2>&1

Directory Structure


.
├── 0VAERSDownloads/          # Input: Raw VAERS ZIP files from CDC
├── 1vaersworking/            # Intermediate: Extracted CSV files
├── 1vaersconsolidated/       # Intermediate: Consolidated data files
├── 2vaersfull_compared/      # Output: Comparison results with change tracking
├── 3vaersflattened/          # Intermediate: Flattened data (one row per VAERS_ID)
├── stats.csv                   # Output: Processing statistics
├── neverpublishedany.txt     # Output: VAERS IDs never published
├── everpublishedany.txt      # Output: All VAERS IDs ever published
├── everpublishedcovid.txt    # Output: COVID-related VAERS IDs
├── writeups_deduped.txt        # Output: Deduplicated symptom descriptions
└── VAERSFINALMERGED.csv      # Final output: Complete merged dataset

Test Mode Directory Structure

--test


ztestcases/
├── drops/                      # Input: Test VAERS data
├── 1vaersworking/
├── 1vaersconsolidated/
├── 2vaersfull_compared/
├── 3vaersflattened/
└── [output files]

Processing Workflow

1. Consolidation

*VAERSDATA.csv - Main report data
*VAERSVAX.csv - Vaccination details
*VAERSSYMPTOMS.csv - Symptom entries

1vaersconsolidated/

2. Flattening

Groups vaccine records by VAERS_ID
Merges all related data into one row per report
Joins symptom entries

3vaersflattened/

3. Comparison

Identifies new reports
Detects modifications to existing reports
Tracks deletions
Records all changes in the changes column
Counts cell edits

2vaersfull_compared/

4. Final Merge

All reports with complete change history
Cell edit counts
Status indicators (new, modified, deleted)
Complete audit trail

VAERSFINALMERGED.csv

Output Files

Primary Output

VAERSFINALMERGED.csv

Complete dataset with all VAERS reports
Includes all historical changes tracked across data releases
Contains columns: cell_edits, status, changes
One row per VAERS_ID with complete information

Statistics and Tracking Files

stats.csv

Processing statistics for each data release
Counts of new reports, modifications, deletions
Date ranges and record counts

neverpublishedany.txt

VAERS IDs that were never published in any release
Identifies gaps in the VAERS ID sequence

everpublishedany.txt

Complete list of all VAERS IDs ever published
Includes all vaccine types

everpublishedcovid.txt

List of COVID-19 vaccine-related VAERS IDs
Filtered by VAX_TYPE containing 'covid'

writeups_deduped.txt

Deduplicated symptom text descriptions
Useful for analysis of unique symptom patterns

Key Columns in Output

Standard VAERS Columns

VAERS_ID - Unique report identifier
AGE_YRS, SEX, STATE - Demographic information
DIED, LTHREAT, ERVISIT, HOSPITAL, DISABLE - Serious outcomes
VAXTYPE, VAXMANU, VAX_LOT - Vaccine information
VAXDATE, ONSETDATE, RPT_DATE - Date information
SYMPTOM_TEXT - Symptom description
And many more...

Enhanced Tracking Columns

cell_edits - Count of cells modified across all releases
status - Report status (new, modified, deleted)
changes - Detailed log of all changes made to the report
symptom_entries - Aggregated symptom entries

Performance Tuning

For Fast Processing (High RAM)

bash
python vaers_complete.py --dataset covid --cores 16 --chunk-size 100000

For Memory-Constrained Systems

bash
python vaers_complete.py --dataset covid --cores 4 --chunk-size 25000

For Very Large Full Dataset

bash
python vaers_complete.py --dataset full --cores 16 --chunk-size 50000

Error Handling

All errors are collected and displayed at the end of processing
Errors include timestamps for tracking
Processing continues when possible, skipping problematic files
Final error summary shows total errors encountered
Exit code 0 = success, 1 = errors occurred

Data Filtering

COVID Dataset Mode

Automatically detects the earliest COVID VAERS_ID
Removes all reports prior to first COVID vaccine report
Typically starts from VAERS_ID ~896636 (first trial report)

Full Dataset Mode

Includes all vaccine types from 1990 onwards (or specified date-floor)
Significantly larger processing time and storage requirements

Change Tracking

New reports: First appearance in a data release
Modifications: Changes to any field in existing reports
Deletions: Reports removed from later releases
Cell edits: Count of individual cell changes
Change log: Detailed description of what changed


2023-01-15: AGE_YRS changed from "45" to "46"
2023-01-15: SYMPTOM_TEXT appended with "Patient recovered"

Troubleshooting

Out of Memory Errors

Reduce --chunk-size to 25000 or lower
Reduce --cores to use fewer parallel processes
Process smaller date ranges using --date-floor and --date-ceiling

Progress Bars Not Showing

Install tqdm: pip install tqdm
Or disable with --no-progress if not needed

ZIP File Errors

Install zipfile-deflate64: pip install zipfile-deflate64
Script falls back to standard zipfile if not available

Missing Input Files

Ensure VAERS data files are in 0VAERSDownloads/ directory
Check that files are in correct ZIP format from CDC

License and Attribution

Notes

The script automatically handles mixed date formats (MM/DD/YYYY → YYYY-MM-DD)
Duplicate records are automatically identified and removed
String type handling is optimized for memory efficiency
All CSV files use UTF-8-sig encoding for compatibility
Progress tracking can be disabled for automated/batch processing

Support

For issues, questions, or contributions, refer to the original source or the repository where this script is maintained.

Download Options

Free Download: Source code is freely available below.
Compiled Versions: Support development with a PayPal donation.

Free Downloads

📦 Download Source Code

Changelog

No changelog available.