# Query Gram - Customizable Document Search Engine

## Version 1.10 - Enhanced Features

A powerful, customizable full-text search interface with advanced boolean logic, lazy-loading, streaming results, SRT subtitle support, and file downloads.

---

## What's New

### 1. **Customizable Branding**
- No longer hardcoded to JFK documents
- Fully customizable site title, description, about section, and footer
- Support for special characters and line breaks in content
- HTML support in about section

### 2. **Improved Boolean Search Logic**
- Fixed operator precedence (NOT > AND/NAND > OR/XOR/NOR)
- Proper combination of logic gates
- More intuitive query evaluation
- **Case-sensitive search option** to prevent false matches
- Example: `apple AND orange OR banana NOT pear` now works correctly

### 3. **Lazy Loading**
- Results load in batches of 100
- "Load More" button for seamless browsing
- **Auto-scroll detection** automatically loads more results when within 800px of the button
- Reduces initial page load time
- Better performance with large result sets

### 4. **Streaming Results**
- Results appear as they're found
- Uses PHP's `ob_flush()` for real-time output
- No waiting for entire search to complete
- Improved user experience for large searches

### 5. **Admin Panel**
- Secure login system with password hashing
- Easy configuration management
- Live preview of about content
- Session-based authentication

### 6. **SRT Subtitle File Support** *(New in 1.10)*
- Search `.srt` subtitle/transcript files alongside `.txt` documents
- Custom parser extracts entry number, timeframe, and text
- Results display entry number, timestamp, and matched text
- Context lines show surrounding subtitle entries
- Combine SRT and text file searches in one query

### 7. **File Downloads** *(New in 1.10)*
- **Download individual files** directly from the file list (⬇ button)
- **Download search results** as a formatted `.txt` file
- Result filenames include date, time, and search terms (e.g., `2026-03-01-143022-kennedy.txt`)
- Downloaded result files include query, timestamp, and match count

### 8. **SEO & Open Graph Meta Tags** *(New in 1.10)*
- Open Graph tags for social sharing (og:title, og:description, og:image, og:url)
- Dynamic title and description pulled from site config
- Improved search engine indexing support

### 9. **Print Support** *(New in 1.10)*
- Full print CSS hides UI chrome and shows clean results
- White background with black text for readable printouts

---

## Installation

### First-Time Setup

1. **Navigate to setup page:**
   ```
   http://yourdomain.com/setup.php
   ```

2. **Configure your site:**
   - Set admin username (default: "admin")
   - Set admin password (min 6 characters)
   - Customize site title
   - Write site description
   - Add about content (supports HTML and line breaks)
   - Set footer text

3. **Complete setup:**
   - Click "Complete Setup"
   - Save your credentials securely
   - Setup creates flat files in `config_data/` directory

### File Structure

```
/
├── index.php              # Main search interface
├── sum/
│   └── index.php          # Summaries search interface
├── setup.php              # Initial setup wizard
├── admin.php              # Admin panel
├── config.php             # Configuration functions
├── config_data/           # Configuration storage (auto-created)
│   ├── site_config.json   # Site settings
│   └── admin.json         # Admin credentials
├── text/                  # Document files (.txt and .srt)
├── sum/text/              # Summary document files (.txt and .srt)
├── logo.png               # Site logo
├── fav.ico                # Favicon
└── README.md              # This file
```

---

## Usage

### For Site Administrators

#### Accessing Admin Panel
1. Click "Admin" link in header, or go to `/admin.php`
2. Login with your credentials
3. Modify site settings
4. Changes take effect immediately

#### Customizing Content
- **Site Title:** Appears in browser tab, header, and meta tags
- **Site Description:** Used for SEO and social sharing
- **About Content:**
  - Supports line breaks (press Enter)
  - Supports HTML tags (e.g., `<strong>`, `<a>`, `<br>`)
  - Special characters are preserved
  - Live preview shows how it will appear
- **Footer Text:** Copyright or attribution

#### Example About Content
```html
This is a full-text search interface for document records.

<strong>Features:</strong>
- Boolean search with AND, OR, NOT, XOR, NAND, NOR
- Context lines above and below matches
- Lazy-loading for large result sets

Visit our <a href="https://example.com">main site</a> for more information.
```

### For Search Users

#### Basic Search
```
kennedy
```
Finds all lines containing "kennedy" (case-insensitive)

#### Boolean Operators

| Operator | Example | Description |
|----------|---------|-------------|
| **AND** | `kennedy AND oswald` | Both terms must be present |
| **OR** | `kennedy OR johnson` | At least one term must be present |
| **NOT** | `kennedy NOT oswald` | First term present, second absent |
| **XOR** | `kennedy XOR johnson` | Exactly one term must be present |
| **NAND** | `kennedy NAND oswald` | At least one term must be absent |
| **NOR** | `kennedy NOR johnson` | Both terms must be absent |

#### Operator Precedence
1. **NOT** (highest priority)
2. **AND, NAND**
3. **OR, XOR, NOR** (lowest priority)

#### Example Queries

**Simple AND:**
```
CIA AND FBI
```
Finds lines with both "CIA" and "FBI"

**Complex combination:**
```
assassination AND kennedy NOT conspiracy
```
Finds lines with "assassination" AND "kennedy" BUT NOT "conspiracy"

**Multiple operators:**
```
kennedy OR johnson AND assassination
```
Evaluates as: `kennedy OR (johnson AND assassination)`

**Using NOT prefix:**
```
document NOT classified AND released
```
Finds: documents that are NOT classified AND are released

#### Context Settings
- **Above:** Number of lines to show before match (default: 2)
- **Below:** Number of lines to show after match (default: 2)

#### Case Sensitive Search
By default, searches are **case-insensitive**:
- `CIA` will match "CIA", "cia", "Cia", and also words containing these letters like "spe**cia**l"

Enable **Case Sensitive** checkbox for exact case matching:
- `CIA` will **only** match "CIA" (not "cia" or words like "special")
- `kennedy` will match "kennedy" but not "Kennedy" or "KENNEDY"

**When to use Case Sensitive:**
- Searching for acronyms (CIA, FBI, NSA)
- Searching for proper nouns with specific capitalization
- Avoiding false matches in words (e.g., "CIA" matching "special")
- When case matters for your search terms

#### File Selection
- Select specific files to search (`.txt` and `.srt`)
- "Check/Uncheck All" toggles all files
- Only selected files are searched
- Download any file directly using the ⬇ button next to its name

#### Lazy Loading
- First 100 results load immediately
- Click "Load More Results" for next batch, or scroll near the bottom to auto-load
- Seamless pagination without page reload

#### Downloading Results
- Click **Download Results** to save a formatted `.txt` copy of your search output
- The file includes the query, timestamp, match count, and all matched lines
- Filename format: `YYYY-MM-DD-HHMMSS-searchterms.txt`

#### Printing Results
- Click **Print Results** for a print-optimized view
- UI controls are hidden; only search results are printed

---

## Technical Details

### Search Logic Implementation

The new search engine uses a **precedence-based stack algorithm**:

```php
// Precedence levels
NOT: 3 (highest)
AND, NAND: 2
OR, XOR, NOR: 1 (lowest)
```

### Example Evaluation

Query: `apple AND orange OR banana`

1. Push `apple` → Stack: `[contains(apple)]`
2. See `AND` → Queue operator
3. Push `orange` → Stack: `[contains(apple), contains(orange)]`
4. Apply `AND` → Stack: `[apple && orange]`
5. See `OR` → Queue operator
6. Push `banana` → Stack: `[(apple && orange), contains(banana)]`
7. Apply `OR` → Stack: `[(apple && orange) || banana]`

### Case-Sensitive vs Case-Insensitive

**Case-Insensitive (default):** Uses PHP's `stripos()` function
```php
stripos($text, "CIA") !== false
// Matches: "CIA", "cia", "Cia", "special" (contains "cia")
```

**Case-Sensitive (when checkbox enabled):** Uses PHP's `strpos()` function
```php
strpos($text, "CIA") !== false
// Matches: "CIA" only (not "cia" or "special")
```

### Streaming Results

Results stream using PHP's output buffering:

```php
echo "<div class='match'>...</div>";
ob_flush();
flush();
```

This sends results to browser immediately, even while PHP is still processing.

### SRT File Parsing

SRT subtitle/transcript files are parsed into searchable entries:

```
Entry format: "$number | $timeframe | $text"
Example:  "42 | 00:03:15,000 --> 00:03:18,500 | Kennedy addressed the nation"
```

Context lines for SRT results show surrounding subtitle entries (not raw line numbers).

### Lazy Loading

AJAX-based pagination:
- Initial request: `?q=query&page=1`
- Load more: `?q=query&page=2&ajax=1`
- JavaScript appends new results to existing ones
- **Auto-scroll:** page automatically fetches next batch when user scrolls within 800px of the "Load More" button (debounced at 150ms)

### URL Parameters

All search parameters are encoded in the URL for shareability:

| Parameter | Description | Default |
|-----------|-------------|---------|
| `?q=` | Search query | — |
| `?b=` | Context lines above match | 2 |
| `?a=` | Context lines below match | 2 |
| `?files[]=` | Selected files (repeatable) | all |
| `?page=` | Pagination page | 1 |
| `?cs=` | Case sensitivity (0 or 1) | 0 |
| `?ajax=` | AJAX/lazy-load request flag | — |

### Configuration Storage

Uses flat JSON files (no database required):

**site_config.json:**
```json
{
    "site_title": "My Search Site",
    "site_description": "Search documents easily",
    "about_content": "Welcome to our search engine...",
    "footer_text": "Copyright 2025",
    "updated_at": "2025-11-17 12:34:56"
}
```

**admin.json:**
```json
{
    "username": "admin",
    "password_hash": "$2y$10$...",
    "created_at": "2025-11-17 12:30:00"
}
```

---

## Security Features

1. **Password Hashing:** Uses PHP's `password_hash()` with bcrypt
2. **Session Management:** Secure session-based authentication
3. **HTML Escaping:** All user input is escaped with `htmlspecialchars()`
4. **XSS Protection:** Prevents cross-site scripting attacks
5. **Setup Protection:** Setup page only accessible if not completed
6. **Admin Protection:** Admin panel requires authentication

---

## Browser Compatibility

- Chrome/Edge (latest)
- Firefox (latest)
- Safari (latest)
- Mobile browsers (iOS Safari, Chrome Mobile)

---

## Performance

- **Streaming:** Results appear immediately
- **Lazy Loading:** Only 50 results loaded at a time
- **File-based:** No database overhead
- **Efficient:** Minimal memory footprint

---

## Troubleshooting

### Setup Issues

**Problem:** Setup page shows "Setup Already Completed"
- **Solution:** Delete files in `config_data/` directory to reset

**Problem:** Permission denied when creating config files
- **Solution:** Ensure web server has write permissions:
  ```bash
  chmod 755 /path/to/site
  ```

### Admin Panel Issues

**Problem:** Can't login
- **Solution:** Reset credentials by deleting `config_data/admin.json` and running setup again

**Problem:** Changes not appearing
- **Solution:** Clear browser cache and reload page

### Search Issues

**Problem:** No results found
- **Solution:**
  - Check that `.txt` files exist in `text/` directory
  - Verify file permissions are readable by web server
  - Try simpler search query first

**Problem:** Boolean operators not working
- **Solution:**
  - Operators must be UPPERCASE (AND, OR, NOT)
  - Operators must have spaces around them
  - Example: `term1 AND term2` (correct)
  - Example: `term1AND term2` (incorrect)

**Problem:** Getting false matches (e.g., "CIA" matching "special")
- **Solution:**
  - Enable the "Case Sensitive" checkbox
  - This will only match exact case: "CIA" won't match "special"
  - Case-insensitive search matches substrings anywhere in words

### Performance Issues

**Problem:** Search is slow
- **Solution:**
  - Use file selection to limit search scope
  - Reduce context lines (Above/Below settings)
  - Consider breaking large files into smaller chunks

---

## Migration from JFK Version

If upgrading from the original JFK-specific version:

1. **Backup your data:**
   ```bash
   cp -r text/ text_backup/
   cp -r sum/text/ sum/text_backup/
   ```

2. **Run setup:**
   - Navigate to `setup.php`
   - Configure with your preferred branding

3. **Update content:**
   - Copy your text files back to `text/` directories
   - Your data remains unchanged

---

## Customization Examples

### Example 1: Legal Document Search

**Setup:**
- Title: "Legal Case Search"
- Description: "Search court documents and legal records"
- About:
  ```
  Search thousands of legal documents using boolean operators.

  <strong>Available Collections:</strong>
  - Supreme Court Cases (1950-2025)
  - Federal Court Rulings
  - State Court Documents
  ```

### Example 2: Academic Research

**Setup:**
- Title: "Research Paper Archive"
- Description: "Full-text search of academic papers"
- About:
  ```
  Search academic papers and research documents.

  <strong>Search Tips:</strong>
  Use AND to combine concepts: <em>machine AND learning</em>
  Use OR for alternatives: <em>neural OR network</em>
  Use NOT to exclude: <em>AI NOT robotics</em>
  ```

### Example 3: Corporate Knowledge Base

**Setup:**
- Title: "Company Document Search"
- Description: "Internal document search portal"
- About:
  ```
  <strong>Confidential - Internal Use Only</strong>

  Search company policies, procedures, and documentation.

  For support, contact IT at: <a href="mailto:it@company.com">it@company.com</a>
  ```

---

## Support

For issues or questions:
1. Check the Troubleshooting section above
2. Review the code comments in `config.php`, `index.php`, etc.
3. Ensure all file permissions are correct

---

## License

This software is provided as-is. Modify and use as needed for your purposes.

---

## Changelog

### Version 1.10 (2026-03-01)
- **Added SRT subtitle/transcript file support** with custom parser
- **Added file download buttons** — download individual source files directly from the file list
- **Added search result download** — save formatted results as a `.txt` file with date-stamped filename
- **Added auto-scroll lazy loading** — automatically fetches next result batch when scrolling near the "Load More" button
- Increased lazy-load batch size from 50 to 100 results per page
- **Added Open Graph and SEO meta tags** for social sharing and search engine indexing
- **Added print CSS** for clean, UI-free printouts
- Improved admin panel UI with better section organization and button grouping
- Enhanced live preview in admin panel with real-time HTML rendering

### Version 1.07 (2025-11-17)
- Added customizable branding system
- Implemented flat-file configuration (no database)
- Fixed boolean search logic with proper precedence
- Added case-sensitive search option to prevent false matches
- Added lazy-loading for large result sets
- Implemented streaming results with ob_flush()
- Created admin panel with secure authentication
- Added setup wizard for initial configuration
- Improved mobile responsiveness
- Enhanced security with password hashing

### Version 1.06 and earlier
- Original JFK-specific implementation
- Basic boolean search
- File-based document storage
- Simple result display

---

## Credits

Original concept: JFK Query Gram by Page Telegram Volunteer Services
Enhanced version: Customizable Query Gram with advanced features
