Logo VAERSdrops (beta)
VAERSdrops (beta)

VAERSdrops (beta)

By Page Telegram
💛 Support Page Telegram directly 💳 Donate
Current Version: 1.2.5
11
Unique Views
2
Free Downloads
0
Donation Downloads

VAERS Complete - Enhanced Data Processing Script

Overview

vaers_complete.py is a comprehensive Python script for processing VAERS (Vaccine Adverse Event Reporting System) data with advanced features including multi-core parallel processing, memory-efficient chunked data handling, and comprehensive change tracking across CDC data releases. Original Author: Gary Hawkins - http://univaers.com/download/ Enhanced Version: 2025 by Jason Page

Features

  • ✓ Multi-core parallel processing for faster execution
  • ✓ Memory-efficient chunked data handling for large datasets
  • ✓ Command-line dataset selection (COVID-19 era or full historical data)
  • ✓ Progress bars for all major operations
  • ✓ Comprehensive error tracking and reporting
  • ✓ Fixed statistics functionality
  • ✓ Change detection and tracking across data releases
  • ✓ Deduplication and data consolidation
  • ✓ Complete audit trail of modifications to VAERS reports
  • Requirements

    Python Dependencies

    bash
    pip install pandas numpy tqdm zipfile-deflate64
    
  • pandas: Data manipulation and analysis
  • numpy: Numerical operations
  • tqdm: Progress bars (optional but recommended)
  • zipfile-deflate64: Enhanced ZIP file handling (optional, falls back to standard zipfile)
  • System Requirements

  • Python 3.x
  • Multi-core CPU recommended for parallel processing
  • Sufficient RAM for large dataset processing (16GB+ recommended for full dataset)
  • Command-Line Options

    Basic Syntax

    bash
    python vaers_complete.py [OPTIONS]
    

    Options Reference

    #### --dataset {covid,full} Default: covid Selects which dataset to process:
  • covid: Process COVID-19 era data only (from 2020-12-13 onwards by default)
  • full: Process full historical VAERS dataset (from 1990-01-01 onwards by default)
  • Examples:
    bash
    python vaers_complete.py --dataset covid
    python vaers_complete.py --dataset full
    
    #### --cores NUMBER Default: Number of CPU cores available on system Specifies the number of CPU cores to use for parallel processing. Examples:
    bash
    python vaers_complete.py --cores 8
    python vaers_complete.py --cores 16
    python vaers_complete.py --dataset full --cores 4
    
    #### --chunk-size NUMBER Default: 50000 Sets the chunk size for processing large datasets. Larger chunks use more memory but may be faster. Smaller chunks are more memory-efficient. Examples:
    bash
    python vaers_complete.py --chunk-size 100000
    python vaers_complete.py --chunk-size 25000
    
    #### --date-floor DATE Default: 2020-12-13 for COVID dataset, 1990-01-01 for full dataset Sets the earliest date to process (format: YYYY-MM-DD). Records before this date will be excluded. Examples:
    bash
    python vaers_complete.py --date-floor 2021-01-01
    python vaers_complete.py --dataset full --date-floor 2000-01-01
    
    #### --date-ceiling DATE Default: 2025-01-01 Sets the latest date to process (format: YYYY-MM-DD). Records after this date will be excluded. Examples:
    bash
    python vaers_complete.py --date-ceiling 2024-12-31
    python vaers_complete.py --date-floor 2020-01-01 --date-ceiling 2023-12-31
    
    #### --test Default: Not set Uses test cases directory (ztestcases) instead of the main working directory. Useful for development and testing. Example:
    bash
    python vaers_complete.py --test
    
    #### --no-progress Default: Not set Disables progress bars. Useful for logging output to files or when running in environments without terminal support. Example:
    bash
    python vaers_complete.py --no-progress > output.log
    
    #### --merge-only Default: Not set Skips all processing and only creates the final merged file from existing processed data. Useful when you want to regenerate the final output without reprocessing everything. Example:
    bash
    python vaers_complete.py --merge-only
    

    Usage Examples

    Process COVID-19 data with 8 cores

    bash
    python vaers_complete.py --dataset covid --cores 8
    

    Process full historical dataset with 16 cores and larger chunks

    bash
    python vaers_complete.py --dataset full --cores 16 --chunk-size 100000
    

    Process COVID data from a specific start date

    bash
    python vaers_complete.py --dataset covid --date-floor 2021-01-01
    

    Process data for a specific date range

    bash
    python vaers_complete.py --date-floor 2021-01-01 --date-ceiling 2023-12-31 --cores 8
    

    Process with smaller chunks for memory-constrained systems

    bash
    python vaers_complete.py --dataset covid --chunk-size 25000 --cores 4
    

    Create final merged file only

    bash
    python vaers_complete.py --merge-only
    

    Run with test data

    bash
    python vaers_complete.py --test --cores 4
    

    Process without progress bars (for logging)

    bash
    python vaers_complete.py --dataset covid --no-progress > processing.log 2>&1
    

    Directory Structure

    The script expects and creates the following directory structure:
    
    .
    ├── 0VAERSDownloads/          # Input: Raw VAERS ZIP files from CDC
    ├── 1vaersworking/            # Intermediate: Extracted CSV files
    ├── 1vaersconsolidated/       # Intermediate: Consolidated data files
    ├── 2vaersfull_compared/      # Output: Comparison results with change tracking
    ├── 3vaersflattened/          # Intermediate: Flattened data (one row per VAERS_ID)
    ├── stats.csv                   # Output: Processing statistics
    ├── neverpublishedany.txt     # Output: VAERS IDs never published
    ├── everpublishedany.txt      # Output: All VAERS IDs ever published
    ├── everpublishedcovid.txt    # Output: COVID-related VAERS IDs
    ├── writeups_deduped.txt        # Output: Deduplicated symptom descriptions
    └── VAERSFINALMERGED.csv      # Final output: Complete merged dataset
    

    Test Mode Directory Structure

    When using --test flag:
    
    ztestcases/
    ├── drops/                      # Input: Test VAERS data
    ├── 1vaersworking/
    ├── 1vaersconsolidated/
    ├── 2vaersfull_compared/
    ├── 3vaersflattened/
    └── [output files]
    

    Processing Workflow

    The script performs the following main steps:

    1. Consolidation

    Combines the three VAERS data files for each data release:
  • *VAERSDATA.csv - Main report data
  • *VAERSVAX.csv - Vaccination details
  • *VAERSSYMPTOMS.csv - Symptom entries
  • Output: Consolidated files in 1vaersconsolidated/

    2. Flattening

    Aggregates multiple vaccine entries per report into single rows:
  • Groups vaccine records by VAERS_ID
  • Merges all related data into one row per report
  • Joins symptom entries
  • Output: Flattened files in 3vaersflattened/

    3. Comparison

    Compares current data release with previous releases to detect changes:
  • Identifies new reports
  • Detects modifications to existing reports
  • Tracks deletions
  • Records all changes in the changes column
  • Counts cell edits
  • Output: Comparison files in 2vaersfull_compared/

    4. Final Merge

    Creates the final consolidated output file containing:
  • All reports with complete change history
  • Cell edit counts
  • Status indicators (new, modified, deleted)
  • Complete audit trail
  • Output: VAERSFINALMERGED.csv

    Output Files

    Primary Output

    VAERSFINALMERGED.csv
  • Complete dataset with all VAERS reports
  • Includes all historical changes tracked across data releases
  • Contains columns: cell_edits, status, changes
  • One row per VAERS_ID with complete information
  • Statistics and Tracking Files

    stats.csv
  • Processing statistics for each data release
  • Counts of new reports, modifications, deletions
  • Date ranges and record counts
  • neverpublishedany.txt
  • VAERS IDs that were never published in any release
  • Identifies gaps in the VAERS ID sequence
  • everpublishedany.txt
  • Complete list of all VAERS IDs ever published
  • Includes all vaccine types
  • everpublishedcovid.txt
  • List of COVID-19 vaccine-related VAERS IDs
  • Filtered by VAX_TYPE containing 'covid'
  • writeups_deduped.txt
  • Deduplicated symptom text descriptions
  • Useful for analysis of unique symptom patterns
  • Key Columns in Output

    The final merged file contains all standard VAERS columns plus:

    Standard VAERS Columns

  • VAERS_ID - Unique report identifier
  • AGE_YRS, SEX, STATE - Demographic information
  • DIED, LTHREAT, ERVISIT, HOSPITAL, DISABLE - Serious outcomes
  • VAXTYPE, VAXMANU, VAX_LOT - Vaccine information
  • VAXDATE, ONSETDATE, RPT_DATE - Date information
  • SYMPTOM_TEXT - Symptom description
  • And many more...
  • Enhanced Tracking Columns

  • cell_edits - Count of cells modified across all releases
  • status - Report status (new, modified, deleted)
  • changes - Detailed log of all changes made to the report
  • symptom_entries - Aggregated symptom entries
  • Performance Tuning

    For Fast Processing (High RAM)

    bash
    python vaers_complete.py --dataset covid --cores 16 --chunk-size 100000
    

    For Memory-Constrained Systems

    bash
    python vaers_complete.py --dataset covid --cores 4 --chunk-size 25000
    

    For Very Large Full Dataset

    bash
    python vaers_complete.py --dataset full --cores 16 --chunk-size 50000
    

    Error Handling

    The script includes comprehensive error handling:
  • All errors are collected and displayed at the end of processing
  • Errors include timestamps for tracking
  • Processing continues when possible, skipping problematic files
  • Final error summary shows total errors encountered
  • Exit code 0 = success, 1 = errors occurred
  • Data Filtering

    COVID Dataset Mode

    By default, filters to COVID-19 era data:
  • Automatically detects the earliest COVID VAERS_ID
  • Removes all reports prior to first COVID vaccine report
  • Typically starts from VAERS_ID ~896636 (first trial report)
  • Full Dataset Mode

    Processes complete historical VAERS data:
  • Includes all vaccine types from 1990 onwards (or specified date-floor)
  • Significantly larger processing time and storage requirements
  • Change Tracking

    The script tracks modifications to VAERS reports across CDC data releases:
  • New reports: First appearance in a data release
  • Modifications: Changes to any field in existing reports
  • Deletions: Reports removed from later releases
  • Cell edits: Count of individual cell changes
  • Change log: Detailed description of what changed
  • Example change tracking entry:
    
    2023-01-15: AGE_YRS changed from "45" to "46"
    2023-01-15: SYMPTOM_TEXT appended with "Patient recovered"
    

    Troubleshooting

    Out of Memory Errors

  • Reduce --chunk-size to 25000 or lower
  • Reduce --cores to use fewer parallel processes
  • Process smaller date ranges using --date-floor and --date-ceiling
  • Progress Bars Not Showing

  • Install tqdm: pip install tqdm
  • Or disable with --no-progress if not needed
  • ZIP File Errors

  • Install zipfile-deflate64: pip install zipfile-deflate64
  • Script falls back to standard zipfile if not available
  • Missing Input Files

  • Ensure VAERS data files are in 0VAERSDownloads/ directory
  • Check that files are in correct ZIP format from CDC
  • License and Attribution

    Original script by Gary Hawkins (http://univaers.com/download/) Enhanced version with performance improvements and additional features by Jason Page.

    Notes

  • The script automatically handles mixed date formats (MM/DD/YYYY → YYYY-MM-DD)
  • Duplicate records are automatically identified and removed
  • String type handling is optimized for memory efficiency
  • All CSV files use UTF-8-sig encoding for compatibility
  • Progress tracking can be disabled for automated/batch processing

Support

For issues, questions, or contributions, refer to the original source or the repository where this script is maintained.

Download Options

Free Download: Source code is freely available below.
Compiled Versions: Support development with a PayPal donation.

Free Downloads

📦 Download Source Code

Changelog

No changelog available.