Initial commit: KavCorp infrastructure documentation

- CLAUDE.md: Project configuration for Claude Code
- docs/: Infrastructure documentation
  - INFRASTRUCTURE.md: Service map, storage, network
  - CONFIGURATIONS.md: Service configs and credentials
  - CHANGELOG.md: Change history
  - DECISIONS.md: Architecture decisions
  - TASKS.md: Task tracking
- scripts/: Automation scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-07 22:07:01 -05:00
commit 120c2ec809
19 changed files with 3448 additions and 0 deletions

239
CLAUDE.md Normal file
View File

@@ -0,0 +1,239 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Repository Purpose
Infrastructure documentation and management repository for the **KavCorp** Proxmox cluster - a 5-node homelab cluster running self-hosted services. This repository supports migration from Docker containers to Proxmox LXCs where appropriate.
## Cluster Architecture
**Cluster Name**: KavCorp
**Nodes**: 5 (pm1, pm2, pm3, pm4, elantris)
**Network**: 10.4.2.0/24
**Primary Management Node**: pm2 (10.4.2.6)
### Node IP Mapping
- pm1: 10.4.2.2
- pm2: 10.4.2.6 (primary for new LXC deployment)
- pm3: 10.4.2.3
- pm4: 10.4.2.5
- elantris: 10.4.2.14 (largest node, 128GB RAM, ZFS storage)
## Common Commands
### Cluster Management
```bash
# Access cluster (use pm2 as primary management node)
ssh pm2
# View cluster status
pvecm status
pvecm nodes
# List all VMs/LXCs across cluster
pvesh get /cluster/resources --type vm --output-format json
# List all nodes
pvesh get /cluster/resources --type node --output-format json
# List storage
pvesh get /cluster/resources --type storage --output-format json
```
### LXC Management
```bash
# List LXCs on a specific node
pct list
# Get LXC configuration
pvesh get /nodes/<node>/lxc/<vmid>/config
pct config <vmid>
# Start/stop/restart LXC
pct start <vmid>
pct stop <vmid>
pct restart <vmid>
# Execute command in LXC
pct exec <vmid> -- <command>
# Enter LXC console
pct enter <vmid>
# Create LXC from template
pct create <vmid> <template> --hostname <name> --cores <n> --memory <mb> --rootfs <storage>:<size>
```
### Network Configuration
```bash
# View network interfaces
ip -br addr show
# Network config location
/etc/network/interfaces
# Standard bridge: vmbr0 (connected to eno1 physical interface)
# Gateway: 10.4.2.254
```
## Storage Architecture
### Storage Pools
**Local Storage** (per-node):
- `local`: Directory storage on each node, for backups/templates/ISOs (~100GB each)
- `local-lvm`: LVM thin pool on each node, for VM/LXC disks (~350-375GB each)
**ZFS Pools**:
- `el-pool`: ZFS pool on elantris (24TB), used for large data storage
**NFS Mounts** (shared):
- `KavNas`: Primary NFS share from Synology NAS (10.4.2.13), ~23TB - used for backups, ISOs, and LXC storage
- `elantris-downloads`: NFS share from elantris, ~23TB - used for media downloads
### Storage Recommendations
- **New LXC containers**: Use `KavNas` for rootfs (NFS, easily backed up)
- **High-performance workloads**: Use `local-lvm` on the host node
- **Large data storage**: Use `elantris-downloads` or `el-pool`
- **Templates and ISOs**: Store in `KavNas` or node's `local`
## Service Categories (by tags)
- **arr**: Media automation (*arr stack - Sonarr, Radarr, Prowlarr, Bazarr, Whisparr)
- **media**: Media servers (Jellyfin, Jellyseerr, Kometa)
- **proxy**: Reverse proxy (Traefik)
- **authenticator**: Authentication (Authelia)
- **nvr**: Network Video Recorder (Shinobi)
- **docker**: Docker host LXCs (docker-pm2, docker-pm4) and VMs (docker-pm3)
- **proxmox-helper-scripts**: Deployed via community scripts
- **community-script**: Deployed via ProxmoxVE Helper Scripts
## Migration Strategy
**Goal**: Move services from Docker containers to dedicated LXCs where it makes sense.
**Good candidates for LXC migration**:
- Single-purpose services
- Services with simple dependencies
- Stateless applications
- Services that benefit from isolation
**Keep in Docker**:
- Complex multi-container stacks
- Services requiring Docker-specific features
- Temporary/experimental services
**Current Docker Hosts**:
- VM 109: docker-pm3 (on pm3, 4 CPU, 12GB RAM)
- LXC 110: docker-pm4 (on pm4, 4 CPU, 8GB RAM)
- LXC 113: docker-pm2 (on pm2, 4 CPU, 8GB RAM)
- LXC 107: dockge (on pm3, 12 CPU, 8GB RAM) - Docker management UI
## IP Address Allocation
**Infrastructure Services**:
- 10.4.2.10: traefik (LXC 104)
- 10.4.2.13: KavNas (Synology NAS)
- 10.4.2.14: elantris
**Media Stack**:
- 10.4.2.15: sonarr (LXC 105)
- 10.4.2.16: radarr (LXC 108)
- 10.4.2.17: prowlarr (LXC 114)
- 10.4.2.18: bazarr (LXC 119)
- 10.4.2.19: whisparr (LXC 117)
- 10.4.2.20: jellyseerr (LXC 115)
- 10.4.2.21: kometa (LXC 120)
- 10.4.2.22: jellyfin (LXC 121)
**Other Services**:
- 10.4.2.23: authelia (LXC 116)
- 10.4.2.24: notifiarr (LXC 118)
*Note: Update docs/network.md when allocating new IPs*
## Documentation Structure
**CRITICAL**: Always read `docs/README.md` first to understand the documentation system.
### Core Documentation Files (ALWAYS UPDATE, NEVER CREATE NEW)
1. **`docs/INFRASTRUCTURE.md`** - Single source of truth
- **CHECK THIS FIRST** for node IPs, service locations, storage paths
- Update whenever infrastructure changes
2. **`docs/CONFIGURATIONS.md`** - Service configurations
- API keys, config snippets, copy/paste ready configs
- Update when service configs change
3. **`docs/DECISIONS.md`** - Architecture decisions
- Why we made choices, common patterns, troubleshooting
- Update when making decisions or discovering patterns
4. **`docs/TASKS.md`** - Current work tracking
- Active, pending, blocked, and completed tasks
- Update at start and end of work sessions
5. **`docs/CHANGELOG.md`** - Historical record
- Date-stamped entries for all changes
- Update after completing any significant work
### Documentation Workflow
**MANDATORY - Before ANY work session**:
1. Read `docs/README.md` - Understand the documentation system
2. Check `docs/INFRASTRUCTURE.md` - Get current infrastructure state
3. Check `docs/TASKS.md` - See what's already in progress or pending
**MANDATORY - During work**:
- When you need node IPs, service locations, or paths → Read `docs/INFRASTRUCTURE.md`
- When you need config snippets or API keys → Read `docs/CONFIGURATIONS.md`
- When wondering "why is it done this way?" → Read `docs/DECISIONS.md`
- When you discover a pattern or make a decision → Immediately update `docs/DECISIONS.md`
- When you encounter issues → Check `docs/DECISIONS.md` Known Issues section first
**MANDATORY - After completing ANY work**:
1. Update the relevant core doc:
- Infrastructure change? → Update `docs/INFRASTRUCTURE.md`
- Config change? → Update `docs/CONFIGURATIONS.md`
- New pattern/decision? → Update `docs/DECISIONS.md`
2. Add dated entry to `docs/CHANGELOG.md` describing what changed
3. Update `docs/TASKS.md` to mark work complete or add new tasks
4. Update "Last Updated" date in `docs/README.md`
**STRICTLY FORBIDDEN**:
- Creating new documentation files without explicit user approval
- Leaving documentation outdated after making changes
- Creating session-specific notes files (use CHANGELOG for history)
- Skipping documentation updates "to save time"
- Assuming you remember infrastructure details (always check docs)
### When to Update Which File
| You just did... | Update this file |
|----------------|------------------|
| Added/removed a service | `INFRASTRUCTURE.md` (service map) |
| Changed an IP address | `INFRASTRUCTURE.md` (service map) |
| Modified service config | `CONFIGURATIONS.md` (add/update config snippet) |
| Changed API key | `CONFIGURATIONS.md` (update credentials) |
| Made architectural decision | `DECISIONS.md` (add to decisions section) |
| Discovered troubleshooting pattern | `DECISIONS.md` (add to common patterns) |
| Hit a recurring issue | `DECISIONS.md` (add to known issues) |
| Completed a task | `TASKS.md` (mark complete) + `CHANGELOG.md` (add entry) |
| Started new work | `TASKS.md` (add to in progress) |
| ANY significant change | `CHANGELOG.md` (always add dated entry) |
## Scripts
- `scripts/provisioning/`: LXC/VM creation scripts
- `scripts/backup/`: Backup automation scripts
- `scripts/monitoring/`: Monitoring and health check scripts
## Workflow Notes
- New LXCs are primarily deployed on **pm2**
- Use ProxmoxVE Helper Scripts (https://helper-scripts.com) for common services
- Always tag LXCs appropriately for organization
- Document service URLs and access details in `docs/services.md`
- Keep inventory documentation in sync with changes

116
README.md Normal file
View File

@@ -0,0 +1,116 @@
# KavCorp Proxmox Infrastructure
Documentation and management repository for the KavCorp Proxmox cluster.
## Quick Start
```bash
# Connect to primary management node
ssh pm2
# View cluster status
pvecm status
# List all containers
pvesh get /cluster/resources --type vm --output-format json
```
## Repository Structure
```
proxmox-infra/
├── CLAUDE.md # Development guidance for Claude Code
├── README.md # This file
├── docs/ # Documentation
│ ├── cluster-state.md # Current cluster topology
│ ├── inventory.md # VM/LXC inventory with specs
│ ├── network.md # Network topology and IP assignments
│ ├── storage.md # Storage layout and usage
│ └── services.md # Service mappings and dependencies
└── scripts/ # Management scripts
├── backup/ # Backup automation
├── provisioning/ # LXC/VM creation scripts
└── monitoring/ # Health checks and monitoring
```
## Cluster Overview
- **Cluster Name**: KavCorp
- **Nodes**: 5 (pm1, pm2, pm3, pm4, elantris)
- **Total VMs**: 2
- **Total LXCs**: 19
- **Primary Network**: 10.4.2.0/24
- **Management Node**: pm2 (10.4.2.6)
### Nodes
| Node | IP | CPU | RAM | Role |
|---|---|---|---|---|
| pm1 | 10.4.2.2 | 4 cores | 16GB | General purpose |
| pm2 | 10.4.2.6 | 12 cores | 31GB | **Primary management, media stack** |
| pm3 | 10.4.2.3 | 16 cores | 33GB | Docker, NVR, gaming |
| pm4 | 10.4.2.5 | 12 cores | 31GB | Docker, NVR |
| elantris | 10.4.2.14 | 16 cores | 128GB | **Storage node, media server** |
## Key Services
- **Traefik** (10.4.2.10): Reverse proxy
- **Jellyfin** (10.4.2.22): Media server - **Recently added to Traefik**
- **Media Automation**: Sonarr, Radarr, Prowlarr, Bazarr, Whisparr (on pm2)
- **Home Assistant** (VMID 100): Home automation
- **Frigate** (VMID 111): NVR with object detection
## Recent Changes
**2025-11-16**:
- ✅ Created initial repository structure and documentation
- ✅ Documented 5-node cluster configuration
- ✅ Added Jellyfin to Traefik configuration (jellyfin.kavcorp.com)
- ✅ Inventoried 21 containers (2 VMs, 19 LXCs)
## Documentation
See the `docs/` directory for detailed information:
- [Cluster State](docs/cluster-state.md) - Node details and health
- [Inventory](docs/inventory.md) - Complete VM/LXC listing
- [Network](docs/network.md) - IP allocations and network topology
- [Storage](docs/storage.md) - Storage pools and usage
- [Services](docs/services.md) - Service mappings and access URLs
## Common Tasks
### Managing LXCs
```bash
# Start/stop/restart
pct start <vmid>
pct stop <vmid>
pct restart <vmid>
# View config
pct config <vmid>
# Execute command
pct exec <vmid> -- <command>
```
### Checking Resources
```bash
# Cluster-wide resources
pvesh get /cluster/resources --output-format json
# Storage usage
pvesh get /cluster/resources --type storage --output-format json
```
## Access
- **Web UI**: https://pm2.kavcorp.com:8006 (or any node)
- **Traefik Dashboard**: https://traefik.kavcorp.com
- **Jellyfin**: https://jellyfin.kavcorp.com
## Notes
- This is a migration project from a messy `~/infrastructure` repo
- Goal: Move services from Docker to LXCs where appropriate
- Primary new LXC deployment node: **pm2**
- Most services use community helper scripts from https://helper-scripts.com

153
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,153 @@
# Changelog
> **Purpose**: Historical record of all significant infrastructure changes
## 2025-12-07
### Service Additions
- **Vaultwarden**: Created new password manager LXC
- LXC 125 on pm4
- IP: 10.4.2.212
- Domain: vtw.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/vaultwarden.yaml`
- Tagged: community-script, password-manager
- **Immich**: Migrated from Docker (dockge LXC 107 on pm3) to native LXC
- LXC 126 on pm4
- IP: 10.4.2.24:2283
- Domain: immich.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/immich.yaml`
- Library storage: NFS mount from elantris (`/el-pool/downloads/immich/`)
- 38GB photo library transferred via rsync
- Fresh database (version incompatibility: old v1.129.0 → new v2.3.1)
- Services: immich-web.service, immich-ml.service
- Tagged: community-script, photos
### Infrastructure Maintenance
- **Traefik (LXC 104)**: Fixed disk full issue
- Truncated 895MB access log that filled 2GB rootfs
- Added logrotate config to prevent recurrence (50MB max, 7 day rotation)
- Cleaned apt cache and journal logs
## 2025-11-20
### Service Changes
- **AMP**: Added to Traefik reverse proxy
- LXC 124 on elantris (10.4.2.26:8080)
- Domain: amp.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/amp.yaml`
- Purpose: Game server management via CubeCoders AMP
## 2025-11-19
### Service Changes
- **LXC 123 (elantris)**: Migrated from Ollama to llama.cpp
- Removed Ollama installation and service
- Built llama.cpp from source with CURL support
- Downloaded TinyLlama 1.1B Q4_K_M model (~667MB)
- Created systemd service for llama.cpp server
- Server running on port 11434 (OpenAI-compatible API)
- Model path: `/opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
- Service: `llama-cpp.service`
- Domain remains: ollama.kavcorp.com (pointing to llama.cpp now)
- **LXC 124 (elantris)**: Created new AMP (Application Management Panel) container
- IP: 10.4.2.26
- Resources: 4 CPU cores, 4GB RAM, 16GB storage
- Storage: local-lvm on elantris
- OS: Ubuntu 24.04 LTS
- Purpose: Game server management via CubeCoders AMP
- Tagged: gaming, amp
## 2025-11-17
### Service Additions
- **Ollama**: Added to Traefik reverse proxy
- LXC 123 on elantris
- IP: 10.4.2.224:11434
- Domain: ollama.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/ollama.yaml`
- Downloaded Qwen 3 Coder 30B model
- **Frigate**: Added to Traefik reverse proxy
- LXC 111 on pm3
- IP: 10.4.2.215:5000
- Domain: frigate.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/frigate.yaml`
- **Foundry VTT**: Added to Traefik reverse proxy
- LXC 112 on pm3
- IP: 10.4.2.37:30000
- Domain: vtt.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/foundry.yaml`
### Infrastructure Changes
- **SSH Access**: Regenerated SSH keys on pm2 and distributed to all cluster nodes
- pm3 SSH service was down, enabled and configured
- All nodes (pm1, pm2, pm3, pm4, elantris) now accessible from pm2 via Proxmox web UI
### Service Configuration
- **NZBGet**: Fixed file permissions
- Set `UMask=0000` in nzbget.conf to create files with 777 permissions
- Fixed permission issues causing Sonarr import failures
- **Sonarr**: Enabled automatic permission setting
- Media Management → Set Permissions → chmod 777
- Ensures imported files are accessible by Jellyfin
- **Jellyseerr**: Fixed Traefik routing
- Corrected IP from 10.4.2.20 to 10.4.2.18 in media-services.yaml
- **Jellyfin**: Fixed LXC mount issues
- Restarted LXC 121 to activate media mounts
- Media now visible in `/media/tv`, `/media/movies`, `/media/anime`
### Documentation
- **Major Reorganization**: Consolidated scattered docs into structured system
- Created `README.md` - Documentation index and guide
- Created `INFRASTRUCTURE.md` - All infrastructure details
- Created `CONFIGURATIONS.md` - Service configurations
- Created `DECISIONS.md` - Architecture decisions and patterns
- Created `TASKS.md` - Current and pending tasks
- Created `CHANGELOG.md` - This file
- Updated `CLAUDE.md` - Added documentation policy
## 2025-11-16
### Service Deployments
- **Home Assistant**: Added to Traefik reverse proxy
- Domain: hass.kavcorp.com
- Configured trusted proxies in Home Assistant
- **Frigate**: Added to Traefik reverse proxy
- Domain: frigate.kavcorp.com
- **Proxmox**: Added to Traefik reverse proxy
- Domain: pm.kavcorp.com
- Backend: pm2 (10.4.2.6:8006)
- **Recyclarr**: Configured TRaSH Guides automation
- Sonarr and Radarr quality profiles synced
- Dolby Vision blocking implemented
- Daily sync schedule via cron
### Configuration Changes
- **Traefik**: Removed Authelia from *arr services
- Services now use only built-in authentication
- Simplified access for Sonarr, Radarr, Prowlarr, Bazarr, Whisparr, NZBGet
### Issues Encountered
- Media organization script moved files incorrectly
- Sonarr database corruption (lost TV series tracking)
- Permission issues with NZBGet downloads
- Jellyfin LXC mount not active after deployment
### Lessons Learned
- Always verify file permissions (777 required for NFS media)
- Backup service databases before running automation scripts
- LXC mounts may need container restart to activate
- Traefik auto-reloads configs, no restart needed
## Earlier History
*To be documented from previous sessions if needed*

312
docs/CONFIGURATIONS.md Normal file
View File

@@ -0,0 +1,312 @@
# Configuration Reference
> **Purpose**: Detailed configuration for all services - copy/paste ready configs and settings
> **Update Frequency**: When service configurations change
## Traefik
### SSL/TLS with Let's Encrypt
**Location**: LXC 104 on pm2
**Environment Variables** (`/etc/systemd/system/traefik.service.d/override.conf`):
```bash
NAMECHEAP_API_USER=kavren
NAMECHEAP_API_KEY=8156f3d9ef664c91b95f029dfbb62ad5
NAMECHEAP_PROPAGATION_TIMEOUT=3600
NAMECHEAP_POLLING_INTERVAL=30
NAMECHEAP_TTL=300
```
**Main Config** (`/etc/traefik/traefik.yaml`):
```yaml
certificatesResolvers:
letsencrypt:
acme:
email: cory.bailey87@gmail.com
storage: /etc/traefik/ssl/acme.json
dnsChallenge:
provider: namecheap
resolvers:
- "1.1.1.1:53"
- "8.8.8.8:53"
```
### Service Routing Examples
**Home Assistant** (`/etc/traefik/conf.d/home-automation.yaml`):
```yaml
http:
routers:
homeassistant:
rule: "Host(`hass.kavcorp.com`)"
entryPoints:
- websecure
service: homeassistant
tls:
certResolver: letsencrypt
services:
homeassistant:
loadBalancer:
servers:
- url: "http://10.4.2.62:8123"
```
**Ollama** (`/etc/traefik/conf.d/ollama.yaml`):
```yaml
http:
routers:
ollama:
rule: "Host(`ollama.kavcorp.com`)"
entryPoints:
- websecure
service: ollama
tls:
certResolver: letsencrypt
services:
ollama:
loadBalancer:
servers:
- url: "http://10.4.2.224:11434"
```
**Frigate** (`/etc/traefik/conf.d/frigate.yaml`):
```yaml
http:
routers:
frigate:
rule: "Host(`frigate.kavcorp.com`)"
entryPoints:
- websecure
service: frigate
tls:
certResolver: letsencrypt
services:
frigate:
loadBalancer:
servers:
- url: "http://10.4.2.215:5000"
```
**Foundry VTT** (`/etc/traefik/conf.d/foundry.yaml`):
```yaml
http:
routers:
foundry:
rule: "Host(`vtt.kavcorp.com`)"
entryPoints:
- websecure
service: foundry
tls:
certResolver: letsencrypt
services:
foundry:
loadBalancer:
servers:
- url: "http://10.4.2.37:30000"
```
**Proxmox** (`/etc/traefik/conf.d/proxmox.yaml`):
```yaml
http:
routers:
proxmox:
rule: "Host(`pm.kavcorp.com`)"
entryPoints:
- websecure
service: proxmox
tls:
certResolver: letsencrypt
services:
proxmox:
loadBalancer:
servers:
- url: "https://10.4.2.6:8006"
serversTransport: proxmox-transport
serversTransports:
proxmox-transport:
insecureSkipVerify: true
```
## AMP (Application Management Panel)
**Location**: LXC 124 on elantris
**IP**: 10.4.2.26:8080
**Domain**: amp.kavcorp.com
**Traefik Config** (`/etc/traefik/conf.d/amp.yaml`):
```yaml
http:
routers:
amp:
rule: "Host(`amp.kavcorp.com`)"
entryPoints:
- websecure
service: amp
tls:
certResolver: letsencrypt
services:
amp:
loadBalancer:
servers:
- url: "http://10.4.2.26:8080"
```
## Home Assistant
**Location**: VM 100 on pm1
**IP**: 10.4.2.62:8123
**Reverse Proxy Config** (`/config/configuration.yaml`):
```yaml
http:
use_x_forwarded_for: true
trusted_proxies:
- 10.4.2.10 # Traefik IP
- 172.30.0.0/16 # Home Assistant internal network (for add-ons)
```
## Sonarr
**Location**: LXC 105 on pm2
**IP**: 10.4.2.15:8989
**API Key**: b331fe18ec2144148a41645d9ce8b249
**Media Management Settings**:
- Permissions: Enabled, chmod 777
- Hardlinks: Enabled
- Episode title required: Always
- Free space check: 100MB minimum
## Radarr
**Location**: LXC 108
**IP**: 10.4.2.16:7878
**API Key**: 5e6796988abf4d6d819a2b506a44f422
## NZBGet
**Location**: Docker on kavnas (10.4.2.13)
**Port**: 6789
**Web User**: kavren
**Web Password**: fre8ub2ax8
**Key Settings** (`/volume1/docker/nzbget/config/nzbget.conf`):
```ini
MainDir=/config
DestDir=/downloads/completed
InterDir=/downloads/intermediate
UMask=0000 # Creates files with 777 permissions
```
**Docker Mounts**:
- Config: `/volume1/docker/nzbget/config:/config`
- Downloads: `/volume1/Media/downloads:/downloads`
## Recyclarr
**Location**: LXC 122 on pm2
**IP**: 10.4.2.25
**Binary**: `/usr/local/bin/recyclarr`
**Config**: `/root/.config/recyclarr/recyclarr.yml`
**Sync Schedule**: Daily at 3 AM via cron
**Configured Profiles**:
- **Radarr**: HD Bluray + WEB (1080p), Remux-1080p - Anime
- **Sonarr**: WEB-1080p, Remux-1080p - Anime
- **Custom Formats**: TRaSH Guides synced (Dolby Vision blocked, release group tiers)
## Jellyfin
**Location**: LXC 121 on elantris
**IP**: 10.4.2.21:8096
**Media Mounts** (inside LXC):
- `/media/tv``/el-pool/media/tv`
- `/media/anime``/el-pool/media/anime`
- `/media/movies``/el-pool/media/movies`
**Permissions**: Files must be 777 for Jellyfin user (UID 100107 in LXC) to access
## Vaultwarden
**Location**: LXC 125 on pm4
**IP**: 10.4.2.212:80
**Domain**: vtw.kavcorp.com
**Traefik Config** (`/etc/traefik/conf.d/vaultwarden.yaml`):
```yaml
http:
routers:
vaultwarden:
rule: "Host(`vtw.kavcorp.com`)"
entryPoints:
- websecure
service: vaultwarden
tls:
certResolver: letsencrypt
services:
vaultwarden:
loadBalancer:
servers:
- url: "http://10.4.2.212:80"
```
## Immich
**Location**: LXC 126 on pm4
**IP**: 10.4.2.24:2283
**Domain**: immich.kavcorp.com
**Config** (`/opt/immich/.env`):
```bash
TZ=America/Indiana/Indianapolis
IMMICH_VERSION=release
NODE_ENV=production
DB_HOSTNAME=127.0.0.1
DB_USERNAME=immich
DB_PASSWORD=AulF5JhgWXrRxtaV05
DB_DATABASE_NAME=immich
DB_VECTOR_EXTENSION=pgvector
REDIS_HOSTNAME=127.0.0.1
IMMICH_MACHINE_LEARNING_URL=http://127.0.0.1:3003
MACHINE_LEARNING_CACHE_FOLDER=/opt/immich/cache
IMMICH_MEDIA_LOCATION=/mnt/immich-library
```
**NFS Mount** (configured via `pct set 126 -mp0`):
- Host path: `/mnt/pve/elantris-downloads/immich`
- Container path: `/mnt/immich-library`
- Source: elantris (`/el-pool/downloads/immich/`)
**Systemd Services**:
- `immich-web.service` - Web UI and API
- `immich-ml.service` - Machine learning service
**Traefik Config** (`/etc/traefik/conf.d/immich.yaml`):
```yaml
http:
routers:
immich:
rule: "Host(`immich.kavcorp.com`)"
entryPoints:
- websecure
service: immich
tls:
certResolver: letsencrypt
services:
immich:
loadBalancer:
servers:
- url: "http://10.4.2.24:2283"
```

163
docs/DECISIONS.md Normal file
View File

@@ -0,0 +1,163 @@
# Architecture Decisions & Patterns
> **Purpose**: Record of important decisions, patterns, and "why we do it this way"
> **Update Frequency**: When making significant architectural choices
## Service Organization
### Authentication Strategy
**Decision**: Services use their own built-in authentication, not Authelia
**Reason**: Most *arr services and media tools have robust auth systems
**Exception**: Consider Authelia for future services that lack authentication
### LXC vs Docker
**Keep in Docker**:
- NZBGet (requires specific volume mapping, works well in Docker)
- Multi-container stacks
- Services requiring Docker-specific features
**Migrate to LXC**:
- Single-purpose services (Sonarr, Radarr, etc.)
- Services benefiting from isolation
- Stateless applications
## File Permissions
### Media Files
**Standard**: All media files and folders must be 777
**Reason**:
- NFS mounts between multiple systems with different UID mappings
- Jellyfin runs in LXC with UID namespace mapping (100107)
- Sonarr runs in LXC with different UID mapping
- NZBGet runs in Docker with UID 1000
**Implementation**:
- NZBGet: `UMask=0000` to create files with 777
- Sonarr: Media management → Set permissions → chmod 777
- Manual fixes: `chmod -R 777` on media directories as needed
## Network Architecture
### Reverse Proxy
**Decision**: Single Traefik instance handles all external access
**Location**: LXC 104 on pm2
**Benefits**:
- Single point for SSL/TLS management
- Automatic Let's Encrypt certificate renewal
- Centralized routing configuration
- DNS-01 challenge for wildcard certificates
### Service Domains
**Pattern**: `<service>.kavcorp.com`
**DNS**: All subdomains point to public IP (99.74.188.161)
**Routing**: Traefik inspects Host header and routes internally
## Storage Architecture
### Media Storage
**Decision**: NFS mount from elantris for all media
**Path**: `/mnt/pve/elantris-media` → elantris `/el-pool/media`
**Reason**:
- Centralized storage
- Accessible from all cluster nodes
- Large capacity (24TB ZFS pool)
- Easy to backup/snapshot
### LXC Root Filesystems
**Decision**: Store on KavNas NFS for most services
**Reason**:
- Easy backups
- Portable between nodes
- Network storage sufficient for most workloads
**Exception**: High I/O services use local-lvm
## Monitoring & Maintenance
### Configuration Management
**Decision**: Manual configuration with documentation
**Reason**: Small scale doesn't justify Ansible/Terraform complexity
**Trade-off**: Requires disciplined documentation updates
### Backup Strategy
**Decision**: Proxmox built-in backup to KavNas
**Frequency**: [To be determined]
**Retention**: [To be determined]
## Common Patterns
### Adding a New Service Behind Traefik
1. Deploy service with static IP in 10.4.2.0/24 range
2. Create Traefik config in `/etc/traefik/conf.d/<service>.yaml`
3. Use pattern:
```yaml
http:
routers:
<service>:
rule: "Host(`<service>.kavcorp.com`)"
entryPoints: [websecure]
service: <service>
tls:
certResolver: letsencrypt
services:
<service>:
loadBalancer:
servers:
- url: "http://<ip>:<port>"
```
4. Traefik auto-reloads (no restart needed)
5. Update `docs/INFRASTRUCTURE.md` with service details
### Troubleshooting Permission Issues
1. Check file ownership: `ls -la /path/to/file`
2. Check if 777: `stat /path/to/file`
3. Fix permissions: `chmod -R 777 /path/to/directory`
4. For NZBGet: Verify `UMask=0000` in nzbget.conf
5. For Sonarr/Radarr: Check Settings → Media Management → Set Permissions
### Node SSH Access
**From local machine**:
- User: `kavren`
- Key: `~/.ssh/id_ed25519`
**Between cluster nodes**:
- User: `root`
- Each node has other nodes' keys in `/root/.ssh/authorized_keys`
- Proxmox web UI uses node SSH for shell access
## Known Issues & Workarounds
### Jellyfin Not Seeing Media After Import
**Symptom**: Files imported to `/media/tv` but Jellyfin shows empty
**Cause**: Jellyfin LXC mount not active or permissions wrong
**Fix**:
1. Restart Jellyfin LXC: `pct stop 121 && pct start 121`
2. Verify mount inside LXC: `pct exec 121 -- ls -la /media/tv/`
3. Fix permissions if needed: `chmod -R 777 /mnt/pve/elantris-media/tv/`
### Sonarr/Radarr Import Failures
**Symptom**: "Access denied" errors in logs
**Cause**: Permission mismatch between download client and *arr service
**Fix**: Ensure download folder has 777 permissions
## Future Considerations
- [ ] Automated backup strategy
- [ ] Monitoring/alerting system (Prometheus + Grafana?)
- [ ] Consider Authelia for future services without built-in auth
- [ ] Document disaster recovery procedures
- [ ] Consider consolidating Docker hosts

120
docs/INFRASTRUCTURE.md Normal file
View File

@@ -0,0 +1,120 @@
# Infrastructure Reference
> **Purpose**: Single source of truth for all infrastructure details - nodes, IPs, services, storage, network
> **Update Frequency**: Immediately when infrastructure changes
## Proxmox Cluster Nodes
| Hostname | IP Address | Role | Resources |
|----------|-------------|------|-----------|
| pm1 | 10.4.2.2 | Proxmox cluster node | - |
| pm2 | 10.4.2.6 | Proxmox cluster node (primary management) | - |
| pm3 | 10.4.2.3 | Proxmox cluster node | - |
| pm4 | 10.4.2.5 | Proxmox cluster node | - |
| elantris | 10.4.2.14 | Proxmox cluster node (Debian-based) | 128GB RAM, ZFS storage (24TB) |
**Cluster Name**: KavCorp
**Network**: 10.4.2.0/24
**Gateway**: 10.4.2.254
## Service Map
| Service | IP:Port | Location | Domain | Auth |
|---------|---------|----------|--------|------|
| **Proxmox Web UI** | 10.4.2.6:8006 | pm2 | pm.kavcorp.com | Proxmox built-in |
| **Traefik** | 10.4.2.10 | LXC 104 (pm2) | - | None (reverse proxy) |
| **Authelia** | 10.4.2.19 | LXC 116 (pm2) | auth.kavcorp.com | SSO provider |
| **Sonarr** | 10.4.2.15:8989 | LXC 105 (pm2) | sonarr.kavcorp.com | Built-in |
| **Radarr** | 10.4.2.16:7878 | LXC 108 (pm2) | radarr.kavcorp.com | Built-in |
| **Prowlarr** | 10.4.2.17:9696 | LXC 114 (pm2) | prowlarr.kavcorp.com | Built-in |
| **Jellyseerr** | 10.4.2.18:5055 | LXC 115 (pm2) | jellyseerr.kavcorp.com | Built-in |
| **Whisparr** | 10.4.2.20:6969 | LXC 117 (pm2) | whisparr.kavcorp.com | Built-in |
| **Notifiarr** | 10.4.2.21 | LXC 118 (pm2) | - | API key |
| **Jellyfin** | 10.4.2.21:8096 | LXC 121 (elantris) | jellyfin.kavcorp.com | Built-in |
| **Bazarr** | 10.4.2.22:6767 | LXC 119 (pm2) | bazarr.kavcorp.com | Built-in |
| **Kometa** | 10.4.2.23 | LXC 120 (pm2) | - | N/A |
| **Recyclarr** | 10.4.2.25 | LXC 122 (pm2) | - | CLI only |
| **NZBGet** | 10.4.2.13:6789 | Docker (kavnas) | nzbget.kavcorp.com | Built-in |
| **Home Assistant** | 10.4.2.62:8123 | VM 100 (pm1) | hass.kavcorp.com | Built-in |
| **Frigate** | 10.4.2.215:5000 | LXC 111 (pm3) | frigate.kavcorp.com | Built-in |
| **Foundry VTT** | 10.4.2.37:30000 | LXC 112 (pm3) | vtt.kavcorp.com | Built-in |
| **llama.cpp** | 10.4.2.224:11434 | LXC 123 (elantris) | ollama.kavcorp.com | None (API) |
| **AMP** | 10.4.2.26:8080 | LXC 124 (elantris) | amp.kavcorp.com | Built-in |
| **Vaultwarden** | 10.4.2.212 | LXC 125 (pm4) | vtw.kavcorp.com | Built-in |
| **Immich** | 10.4.2.24:2283 | LXC 126 (pm4) | immich.kavcorp.com | Built-in |
| **KavNas** | 10.4.2.13 | Synology NAS | - | NAS auth |
## Storage Architecture
### NFS Mounts (Shared)
| Mount Name | Source | Mount Point | Size | Usage |
|------------|--------|-------------|------|-------|
| elantris-media | elantris:/el-pool/media | /mnt/pve/elantris-media | ~24TB | Media files (movies, TV, anime) |
| KavNas | kavnas:10.4.2.13:/volume1 | /mnt/pve/KavNas | ~23TB | Backups, ISOs, LXC storage, downloads |
### Local Storage (Per-Node)
| Storage | Type | Size | Usage |
|---------|------|------|-------|
| local | Directory | ~100GB | Backups, templates, ISOs |
| local-lvm | LVM thin pool | ~350-375GB | VM/LXC disks |
### ZFS Pools
| Pool | Location | Size | Usage |
|------|----------|------|-------|
| el-pool | elantris | 24TB | Large data storage |
### Media Folders
| Path | Type | Permissions | Notes |
|------|------|-------------|-------|
| /mnt/pve/elantris-media/movies | NFS | 777 | Movie library |
| /mnt/pve/elantris-media/tv | NFS | 777 | TV show library |
| /mnt/pve/elantris-media/anime | NFS | 777 | Anime library |
| /mnt/pve/elantris-media/processing | NFS | 777 | Processing/cleanup folder |
| /mnt/pve/KavNas/downloads | NFS | 777 | Download client output |
## Network Configuration
### DNS & Domains
**Domain**: kavcorp.com
**DNS Provider**: Namecheap
**Public IP**: 99.74.188.161
All `*.kavcorp.com` subdomains route through Traefik reverse proxy (10.4.2.10) for SSL termination and routing.
### Standard Bridge
**Bridge**: vmbr0
**Physical Interface**: eno1
**CIDR**: 10.4.2.0/24
**Gateway**: 10.4.2.254
## Access & Credentials
### SSH Access
- **User**: kavren (from local machine)
- **User**: root (between cluster nodes)
- **Key Type**: ed25519
- **Node-to-Node**: Passwordless SSH configured for cluster operations
### Important Paths
**Traefik (LXC 104)**:
- Config: `/etc/traefik/traefik.yaml`
- Service configs: `/etc/traefik/conf.d/*.yaml`
- SSL certs: `/etc/traefik/ssl/acme.json`
- Service file: `/etc/systemd/system/traefik.service.d/override.conf`
**Media Services**:
- Sonarr config: `/var/lib/sonarr/`
- Radarr config: `/var/lib/radarr/`
- Recyclarr config: `/root/.config/recyclarr/recyclarr.yml`
**NZBGet (Docker on kavnas)**:
- Config: `/volume1/docker/nzbget/config/nzbget.conf`
- Downloads: `/volume1/Media/downloads/`

145
docs/README.md Normal file
View File

@@ -0,0 +1,145 @@
# Documentation Index
> **Last Updated**: 2025-11-17 (Added Frigate and Foundry VTT to Traefik)
> **IMPORTANT**: Update this index whenever you modify documentation files
## Quick Reference
Need to know... | Check this file
--- | ---
Node IPs, service locations, storage paths | `INFRASTRUCTURE.md`
Service configs, API keys, copy/paste configs | `CONFIGURATIONS.md`
Why we made a decision, common patterns | `DECISIONS.md`
What's currently being worked on | `TASKS.md`
Recent changes and when they happened | `CHANGELOG.md`
## Core Documentation Files
### INFRASTRUCTURE.md
**Purpose**: Single source of truth for all infrastructure
**Contains**:
- Cluster node IPs and specs
- Complete service map with IPs, ports, domains
- Storage architecture (NFS mounts, local storage, ZFS)
- Network configuration
- Important file paths
**Update when**: Infrastructure changes (new service, IP change, storage mount)
---
### CONFIGURATIONS.md
**Purpose**: Detailed service configurations
**Contains**:
- Traefik SSL/TLS setup
- Service routing examples
- API keys and credentials
- Copy/paste ready config snippets
- Service-specific settings
**Update when**: Service configuration changes, API keys rotate, new services added
---
### DECISIONS.md
**Purpose**: Architecture decisions and patterns
**Contains**:
- Why we chose LXC vs Docker for services
- Authentication strategy
- File permission standards (777 for media)
- Common troubleshooting patterns
- Known issues and workarounds
**Update when**: Making architectural decisions, discovering new patterns, solving recurring issues
---
### TASKS.md
**Purpose**: Track ongoing work and TODO items
**Contains**:
- Active tasks being worked on
- Pending tasks
- Blocked items
- Task priority
**Update when**: Starting new work, completing tasks, discovering new work
---
### CHANGELOG.md
**Purpose**: Historical record of changes
**Contains**:
- Date-stamped entries for all significant changes
- Who made the change (user/Claude)
- What was changed and why
- Links to relevant commits or files
**Update when**: After completing any significant work
---
## Legacy Files (To Be Removed)
These files will be consolidated into the core docs above:
- ~~`infrastructure-map.md`~~ → Merged into `INFRASTRUCTURE.md`
- ~~`home-assistant-traefik.md`~~ → Merged into `CONFIGURATIONS.md`
- ~~`traefik-ssl-setup.md`~~ → Merged into `CONFIGURATIONS.md`
- ~~`recyclarr-setup.md`~~ → Merged into `CONFIGURATIONS.md`
Keep for reference (detailed info):
- `cluster-state.md` - Detailed cluster topology
- `inventory.md` - Complete VM/LXC inventory
- `network.md` - Detailed network info
- `storage.md` - Detailed storage info
- `services.md` - Service dependencies and details
## Documentation Workflow
### When Making Changes
1. **Before starting**: Check `INFRASTRUCTURE.md` for current state
2. **During work**: Note what you're changing
3. **After completing**:
- Update relevant core doc (`INFRASTRUCTURE.md`, `CONFIGURATIONS.md`, or `DECISIONS.md`)
- Add entry to `CHANGELOG.md` with date and description
- Update `TASKS.md` to mark work complete
- Update `README.md` (this file) Last Updated date
### Example Workflow
```
Task: Add new service "Tautulli" to monitor Jellyfin
1. Check INFRASTRUCTURE.md → Find next available IP
2. Deploy service
3. Update INFRASTRUCTURE.md → Add Tautulli to service map
4. Update CONFIGURATIONS.md → Add Tautulli config snippet
5. Update CHANGELOG.md → "2025-11-17: Added Tautulli LXC..."
6. Update TASKS.md → Mark "Deploy Tautulli" as complete
7. Update README.md → Change Last Updated date
```
## File Organization
```
docs/
├── README.md ← You are here (index and guide)
├── INFRASTRUCTURE.md ← Infrastructure reference
├── CONFIGURATIONS.md ← Service configurations
├── DECISIONS.md ← Architecture decisions
├── TASKS.md ← Current/ongoing tasks
├── CHANGELOG.md ← Historical changes
├── cluster-state.md ← [Keep] Detailed topology
├── inventory.md ← [Keep] Full VM/LXC list
├── network.md ← [Keep] Network details
├── storage.md ← [Keep] Storage details
└── services.md ← [Keep] Service details
```
## Maintenance
- Review and update docs weekly
- Clean up completed tasks monthly
- Archive old changelog entries yearly
- Verify INFRASTRUCTURE.md matches reality regularly

37
docs/TASKS.md Normal file
View File

@@ -0,0 +1,37 @@
# Current Tasks
> **Last Updated**: 2025-11-17
## In Progress
None currently.
## Pending
### Media Organization
- [ ] Verify Jellyfin can see all imported media
- [ ] Clean up `.processing-loose-episodes` folder
- [ ] Review and potentially restore TV shows from processing folder
### Configuration
- [ ] Consider custom format to prefer English audio releases
- [ ] Review Sonarr language profiles for non-English releases
### Infrastructure
- [ ] Define backup strategy and schedule
- [ ] Set up monitoring/alerting system
- [ ] Document disaster recovery procedures
## Completed (Recent)
- [x] Fixed SSH access between cluster nodes (pm2 can access all nodes)
- [x] Fixed NZBGet permissions (UMask=0000 for 777 files)
- [x] Fixed Sonarr permissions (chmod 777 on imports)
- [x] Fixed Jellyfin LXC mounts (restarted LXC)
- [x] Fixed Jellyseerr IP in Traefik config
- [x] Consolidated documentation structure
- [x] Created documentation index
## Blocked
None currently.

115
docs/cluster-state.md Normal file
View File

@@ -0,0 +1,115 @@
# KavCorp Proxmox Cluster State
**Last Updated**: 2025-11-16
## Cluster Overview
- **Cluster Name**: KavCorp
- **Config Version**: 6
- **Transport**: knet
- **Quorum Status**: Quorate (5/5 nodes online)
- **Total Nodes**: 5
- **Total VMs**: 2
- **Total LXCs**: 19
## Node Details
### pm1 (10.4.2.2)
- **CPU**: 4 cores
- **Memory**: 16GB (15.4 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~52 hours
- **Status**: Online
- **Running Containers**:
- VMID 100: haos12.1 (VM - Home Assistant OS)
- VMID 101: twingate (LXC)
- VMID 102: zwave-js-ui (LXC)
### pm2 (10.4.2.6) - Primary Management Node
- **CPU**: 12 cores
- **Memory**: 31GB (29.3 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~52 hours
- **Status**: Online
- **Running Containers**:
- VMID 104: traefik (LXC - Reverse Proxy)
- VMID 105: sonarr (LXC)
- VMID 108: radarr (LXC)
- VMID 113: docker-pm2 (LXC - Docker host)
- VMID 114: prowlarr (LXC)
- VMID 115: jellyseerr (LXC)
- VMID 116: authelia (LXC)
- VMID 117: whisparr (LXC)
- VMID 118: notifiarr (LXC)
- VMID 119: bazarr (LXC)
- VMID 120: kometa (LXC)
### pm3 (10.4.2.3)
- **CPU**: 16 cores
- **Memory**: 33GB (30.7 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~319 hours (~13 days)
- **Status**: Online
- **Running Containers**:
- VMID 106: mqtt (LXC)
- VMID 107: dockge (LXC - Docker management UI, 12 CPU, 8GB RAM)
- VMID 109: docker-pm3 (VM - Docker host, 4 CPU, 12GB RAM)
- VMID 111: frigate (LXC - NVR)
- VMID 112: foundryvtt (LXC - Virtual tabletop)
### pm4 (10.4.2.5)
- **CPU**: 12 cores
- **Memory**: 31GB (29.3 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~52 hours
- **Status**: Online
- **Running Containers**:
- VMID 103: shinobi (LXC - NVR)
- VMID 110: docker-pm4 (LXC - Docker host)
### elantris (10.4.2.14) - Storage Node
- **CPU**: 16 cores
- **Memory**: 128GB (125.7 GiB) - **Largest node**
- **Storage**: ~100GB local + 24TB ZFS pool (el-pool)
- **Uptime**: ~26 minutes (recently rebooted)
- **Status**: Online
- **Running Containers**:
- VMID 121: jellyfin (LXC - Media server)
## Cluster Health
- **Quorum**: Yes (3/5 required, 5/5 available)
- **Expected Votes**: 5
- **Total Votes**: 5
- **All Nodes**: Online and healthy
## Network Architecture
- **Primary Network**: 10.4.2.0/24
- **Gateway**: 10.4.2.254
- **Bridge**: vmbr0 (on all nodes, bridged to eno1)
- **DNS**: Managed by gateway/router
## Storage Summary
### Shared Storage
- **KavNas** (NFS): 23TB total, ~9.2TB used - Primary shared storage from Synology DS918+
- **elantris-downloads** (NFS): 23TB total, ~10.6TB used - Download storage from elantris
### Node-Local Storage
Each node has:
- **local**: ~100GB directory storage (backups, templates, ISOs)
- **local-lvm**: ~350-375GB LVM thin pool (VM/LXC disks)
### ZFS Storage
- **el-pool** (elantris only): 24TB ZFS pool, ~13.8TB used
## Migration Status
Currently migrating services from Docker containers to dedicated LXCs. Most media stack services (Sonarr, Radarr, etc.) have been successfully migrated to LXCs on pm2.
**Active Docker Hosts**:
- docker-pm2 (LXC 113): Currently empty/minimal usage
- docker-pm3 (VM 109): Active, running containerized services
- docker-pm4 (LXC 110): Active
- dockge (LXC 107): Docker management UI with web interface

View File

@@ -0,0 +1,304 @@
# Home Assistant + Traefik Configuration
**Last Updated**: 2025-11-16
## Overview
Home Assistant is configured to work behind Traefik as a reverse proxy, accessible via `https://hass.kavcorp.com`.
## Configuration Details
### Home Assistant
- **VMID**: 100
- **Node**: pm1
- **Type**: QEMU VM (Home Assistant OS)
- **Internal IP**: 10.4.2.62
- **Internal Port**: 8123
- **External URL**: https://hass.kavcorp.com
### Traefik Configuration
**Location**: `/etc/traefik/conf.d/home-automation.yaml` (inside Traefik LXC 104)
```yaml
http:
routers:
homeassistant:
rule: "Host(`hass.kavcorp.com`)"
entryPoints:
- websecure
service: homeassistant
tls:
certResolver: letsencrypt
# Home Assistant has its own auth
services:
homeassistant:
loadBalancer:
servers:
- url: "http://10.4.2.62:8123"
```
### Home Assistant Configuration
**File**: `/config/configuration.yaml` (inside Home Assistant VM)
Add or merge the following section:
```yaml
http:
use_x_forwarded_for: true
trusted_proxies:
- 10.4.2.10 # Traefik IP
- 172.30.0.0/16 # Home Assistant internal network (for add-ons)
```
#### Configuration Explanation:
- **`use_x_forwarded_for: true`**: Enables Home Assistant to read the real client IP from the `X-Forwarded-For` header that Traefik adds. This is important for:
- Accurate logging of client IPs
- IP-based authentication and blocking
- Geolocation features
- **`trusted_proxies`**: Whitelist of proxy IPs that Home Assistant will trust
- `10.4.2.10` - Traefik reverse proxy
- `172.30.0.0/16` - Home Assistant's internal Docker network (needed for add-ons to communicate)
## Setup Steps
### Method 1: Web UI (Recommended)
1. **Install File Editor Add-on** (if not already installed):
- Go to **Settings****Add-ons****Add-on Store**
- Search for "File editor"
- Click **Install**
2. **Edit Configuration**:
- Open the **File editor** add-on
- Navigate to `/config/configuration.yaml`
- Add the `http:` section shown above
- If an `http:` section already exists, merge the settings
- Save the file
3. **Check Configuration**:
- Go to **Developer Tools****YAML**
- Click **Check Configuration**
- Fix any errors if shown
4. **Restart Home Assistant**:
- Go to **Settings****System****Restart**
- Wait for Home Assistant to come back online
### Method 2: Terminal & SSH Add-on
If you have the **Terminal & SSH** add-on installed:
```bash
# Edit the configuration
nano /config/configuration.yaml
# Add the http section shown above
# Save with Ctrl+X, Y, Enter
# Check configuration
ha core check
# Restart Home Assistant
ha core restart
```
### Method 3: SSH to VM (Advanced)
If you have SSH access to the Home Assistant VM:
```bash
# SSH to pm1 first, then to the VM
ssh pm1
ssh root@10.4.2.62
# Edit configuration
vi /config/configuration.yaml
# Restart Home Assistant
ha core restart
```
## Verification
After configuration and restart:
1. **Test Internal Access**:
```bash
curl -I http://10.4.2.62:8123
```
Should return `HTTP/1.1 200 OK` or `405 Method Not Allowed`
2. **Test Traefik Proxy**:
```bash
curl -I https://hass.kavcorp.com
```
Should return `HTTP/2 200` with valid SSL certificate
3. **Check Logs**:
- In Home Assistant: **Settings** → **System** → **Logs**
- Look for any errors related to HTTP or trusted proxies
- Client IPs should now show actual client IPs, not Traefik's IP
4. **Verify Headers**:
- Open browser developer tools (F12)
- Go to **Network** tab
- Access `https://hass.kavcorp.com`
- Check response headers for `X-Forwarded-For`, `X-Forwarded-Proto`, etc.
## Troubleshooting
### 400 Bad Request / Untrusted Proxy
**Symptom**: Home Assistant returns 400 errors when accessing via Traefik
**Solution**: Verify the `trusted_proxies` configuration includes Traefik's IP (`10.4.2.10`)
```yaml
http:
trusted_proxies:
- 10.4.2.10
```
### Wrong Client IP in Logs
**Symptom**: All requests show Traefik's IP (10.4.2.10) instead of real client IP
**Solution**: Enable `use_x_forwarded_for`:
```yaml
http:
use_x_forwarded_for: true
```
### Configuration Check Fails
**Symptom**: YAML validation fails with syntax errors
**Solution**:
- Ensure proper indentation (2 spaces per level, no tabs)
- Check for special characters that need quoting
- Use `ha core check` to see detailed error messages
### Cannot Access via Domain
**Symptom**: `https://hass.kavcorp.com` doesn't work but direct IP does
**Solution**:
1. Check Traefik logs:
```bash
ssh pm2 "pct exec 104 -- tail -f /var/log/traefik/traefik.log"
```
2. Verify DNS resolves correctly:
```bash
nslookup hass.kavcorp.com
```
3. Check Traefik config was loaded:
```bash
ssh pm2 "pct exec 104 -- cat /etc/traefik/conf.d/home-automation.yaml"
```
### SSL Certificate Issues
**Symptom**: Browser shows SSL certificate errors
**Solution**:
1. Check if Let's Encrypt certificate was generated:
```bash
ssh pm2 "pct exec 104 -- cat /etc/traefik/ssl/acme.json | grep hass"
```
2. Allow time for DNS propagation (up to 1 hour with Namecheap)
3. Check Traefik logs for ACME errors
## Security Considerations
### Authentication
- Home Assistant has its own authentication system
- No Authelia middleware is applied to this route
- Users must log in to Home Assistant directly
- Consider enabling **Multi-Factor Authentication** in Home Assistant:
- **Settings** → **People** → Your User → **Enable MFA**
### Trusted Networks
If you want to bypass authentication for local network access, add to `configuration.yaml`:
```yaml
homeassistant:
auth_providers:
- type: trusted_networks
trusted_networks:
- 10.4.2.0/24 # Local network
allow_bypass_login: true
- type: homeassistant
```
**Warning**: Only use this if your local network is secure!
### IP Banning
Home Assistant can automatically ban IPs after failed login attempts. Ensure `use_x_forwarded_for` is enabled so it bans the actual attacker's IP, not Traefik's IP.
## Related Services
### Frigate Integration
If Frigate is integrated with Home Assistant:
- Frigate is accessible via `https://frigate.kavcorp.com` (see separate Frigate documentation)
- Home Assistant can embed Frigate camera streams
- Both services trust Traefik as reverse proxy
### Add-ons and Internal Communication
Home Assistant add-ons communicate via the internal Docker network (`172.30.0.0/16`). This network must be in `trusted_proxies` for add-ons to work correctly when accessing the Home Assistant API.
## Updating Configuration
When making changes to Home Assistant configuration:
1. **Always check configuration** before restarting:
```bash
ha core check
```
2. **Back up configuration** before major changes:
- **Settings** → **System** → **Backups** → **Create Backup**
3. **Test changes** in a development environment if possible
4. **Monitor logs** after restarting for errors
## DNS Configuration
Ensure your DNS provider (Namecheap) has the correct A record:
```
hass.kavcorp.com → Your public IP (99.74.188.161)
```
Or use a CNAME if you have a wildcard:
```
*.kavcorp.com → Your public IP
```
Traefik handles the Let's Encrypt DNS-01 challenge automatically.
## Additional Resources
- [Home Assistant Reverse Proxy Documentation](https://www.home-assistant.io/integrations/http/#reverse-proxies)
- [Traefik Documentation](https://doc.traefik.io/traefik/)
- [TRaSH Guides - Traefik Setup](https://trash-guides.info/Hardlinks/Examples/)
## Change Log
**2025-11-16**:
- Initial configuration created
- Added Home Assistant to Traefik
- Configured trusted proxies
- Set up `hass.kavcorp.com` domain

View File

@@ -0,0 +1,44 @@
# Infrastructure Map
## Proxmox Cluster Nodes
| Hostname | IP Address | Role |
|----------|-------------|------|
| pm1 | 10.4.2.2 | Proxmox cluster node |
| pm2 | 10.4.2.6 | Proxmox cluster node |
| pm3 | 10.4.2.3 | Proxmox cluster node |
| pm4 | 10.4.2.5 | Proxmox cluster node |
| elantris | 10.4.2.14 | Proxmox cluster node (Debian-based) |
## Key Services
| Service | IP:Port | Location | Notes |
|---------|---------|----------|-------|
| Sonarr | 10.4.2.15:8989 | LXC 105 on pm2 | TV shows |
| Radarr | 10.4.2.16:7878 | - | Movies |
| Prowlarr | 10.4.2.17:9696 | - | Indexer manager |
| Bazarr | 10.4.2.18:6767 | - | Subtitles |
| Whisparr | 10.4.2.19:6969 | - | Adult content |
| Jellyseerr | 10.4.2.20:5055 | LXC 115 on pm2 | Request management |
| Jellyfin | 10.4.2.21:8096 | LXC 121 on elantris | Media server |
| NZBGet | 10.4.2.13:6789 | Docker on kavnas | Download client |
| Traefik | 10.4.2.10 | LXC 104 on pm2 | Reverse proxy |
| Home Assistant | 10.4.2.62:8123 | VM 100 on pm1 | Home automation |
| Frigate | 10.4.2.63:5000 | - | NVR/Camera system |
## Storage
| Mount | Path | Notes |
|-------|------|-------|
| elantris-media | /mnt/pve/elantris-media | NFS from elantris:/el-pool/media |
| KavNas | /mnt/pve/KavNas | NFS from kavnas:/volume1 |
## Domain Mappings
All services accessible via `*.kavcorp.com` through Traefik reverse proxy:
- pm.kavcorp.com → pm2 (10.4.2.6:8006)
- sonarr.kavcorp.com → 10.4.2.15:8989
- radarr.kavcorp.com → 10.4.2.16:7878
- jellyfin.kavcorp.com → 10.4.2.21:8096
- hass.kavcorp.com → 10.4.2.62:8123
- etc.

289
docs/inventory.md Normal file
View File

@@ -0,0 +1,289 @@
# VM and LXC Inventory
**Last Updated**: 2025-11-16
## Virtual Machines
### VMID 100 - haos12.1 (Home Assistant OS)
- **Node**: pm1
- **Type**: QEMU VM
- **CPU**: 2 cores
- **Memory**: 4GB
- **Disk**: 32GB
- **Status**: Running
- **Uptime**: ~52 hours
- **Tags**: proxmox-helper-scripts
- **Purpose**: Home automation platform
### VMID 109 - docker-pm3
- **Node**: pm3
- **Type**: QEMU VM
- **CPU**: 4 cores
- **Memory**: 12GB
- **Disk**: 100GB
- **Status**: Running
- **Uptime**: ~190 hours (~8 days)
- **Purpose**: Docker host for containerized services
- **Notes**: Primary Docker host, high network traffic
## LXC Containers
### Infrastructure Services
#### VMID 104 - traefik
- **Node**: pm2
- **IP**: 10.4.2.10
- **CPU**: 2 cores
- **Memory**: 2GB
- **Disk**: 10GB (KavNas)
- **Status**: Running
- **Tags**: community-script, proxy
- **Purpose**: Reverse proxy and load balancer
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~2.5 hours
#### VMID 106 - mqtt
- **Node**: pm3
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Purpose**: MQTT message broker for IoT devices
- **Uptime**: ~319 hours (~13 days)
- **Notes**: High inbound network traffic (3.4GB)
#### VMID 116 - authelia
- **Node**: pm2
- **IP**: 10.4.2.23
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (KavNas)
- **Status**: Running
- **Tags**: authenticator, community-script
- **Purpose**: Authentication and authorization server
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~1.9 hours
### Media Stack (*arr services)
#### VMID 105 - sonarr
- **Node**: pm2
- **IP**: 10.4.2.15
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Mount Points**:
- /media → elantris-media (NFS)
- /mnt/kavnas → KavNas (NFS)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~56 minutes
#### VMID 108 - radarr
- **Node**: pm2
- **IP**: 10.4.2.16
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Mount Points**:
- /media → elantris-media (NFS)
- /mnt/kavnas → KavNas (NFS)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~56 minutes
#### VMID 114 - prowlarr
- **Node**: pm2
- **IP**: 10.4.2.17
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Indexer manager for *arr services
- **Uptime**: ~56 minutes
#### VMID 117 - whisparr
- **Node**: pm2
- **IP**: 10.4.2.19
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~56 minutes
#### VMID 119 - bazarr
- **Node**: pm2
- **IP**: 10.4.2.18
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Subtitle management for Sonarr/Radarr
- **Uptime**: ~56 minutes
### Media Servers
#### VMID 115 - jellyseerr
- **Node**: pm2
- **IP**: 10.4.2.20
- **CPU**: 4 cores
- **Memory**: 4GB
- **Disk**: 8GB (KavNas)
- **Status**: Running
- **Tags**: community-script, media
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Request management for Jellyfin
- **Uptime**: ~56 minutes
#### VMID 120 - kometa
- **Node**: pm2
- **IP**: 10.4.2.21
- **CPU**: 2 cores
- **Memory**: 4GB
- **Disk**: 8GB (KavNas)
- **Status**: Running
- **Tags**: community-script, media, streaming
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Media library metadata manager
- **Uptime**: ~1.9 hours
#### VMID 121 - jellyfin
- **Node**: elantris
- **IP**: 10.4.2.22
- **CPU**: 2 cores
- **Memory**: 2GB
- **Disk**: 16GB (el-pool)
- **Status**: Running
- **Tags**: community-script, media
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Media server
- **Uptime**: ~19 minutes
- **Notes**: Recently migrated to elantris
#### VMID 118 - notifiarr
- **Node**: pm2
- **IP**: 10.4.2.24
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Notification service for *arr apps
- **Uptime**: ~1.9 hours
### Docker Hosts
#### VMID 107 - dockge
- **Node**: pm3
- **CPU**: 12 cores
- **Memory**: 8GB
- **Disk**: 120GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Docker Compose management UI
- **Uptime**: ~319 hours (~13 days)
#### VMID 110 - docker-pm4
- **Node**: pm4
- **CPU**: 4 cores
- **Memory**: 8GB
- **Disk**: 10GB (local-lvm)
- **Status**: Running
- **Tags**: community-script, docker
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Docker host
- **Uptime**: ~45 hours
#### VMID 113 - docker-pm2
- **Node**: pm2
- **CPU**: 4 cores
- **Memory**: 8GB
- **Disk**: 10GB (local-lvm)
- **Status**: Running
- **Tags**: community-script, docker
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Docker host
- **Uptime**: ~45 hours
- **Notes**: Currently empty/minimal usage
### Smart Home & IoT
#### VMID 101 - twingate
- **Node**: pm1
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 8GB (local-lvm)
- **Status**: Running
- **Features**: Unprivileged
- **Purpose**: Zero-trust network access
- **Uptime**: ~52 hours
#### VMID 102 - zwave-js-ui
- **Node**: pm1
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Z-Wave device management
- **Uptime**: ~52 hours
### Surveillance & NVR
#### VMID 103 - shinobi
- **Node**: pm4
- **CPU**: 2 cores
- **Memory**: 2GB
- **Disk**: 8GB (local-lvm)
- **Status**: Running
- **Tags**: community-script, nvr
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Network Video Recorder
- **Uptime**: ~52 hours
- **Notes**: Very high network traffic (407GB in, 162GB out)
#### VMID 111 - frigate
- **Node**: pm3
- **CPU**: 4 cores
- **Memory**: 8GB
- **Disk**: 120GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Features**: Unprivileged, nesting enabled
- **Purpose**: NVR with object detection
- **Uptime**: ~18 hours
- **Notes**: High storage and network usage
### Gaming
#### VMID 112 - foundryvtt
- **Node**: pm3
- **CPU**: 4 cores
- **Memory**: 6GB
- **Disk**: 100GB (local-lvm)
- **Status**: Running
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Virtual tabletop gaming platform
- **Uptime**: ~116 hours (~5 days)
## Summary Statistics
- **Total Containers**: 21 (2 VMs + 19 LXCs)
- **All Running**: Yes
- **Total CPU Allocation**: 62 cores
- **Total Memory Allocation**: 63.5GB
- **Primary Storage**: KavNas (NFS) for most LXCs
- **Most Active Node**: pm2 (11 containers)
- **Newest Deployments**: Media stack on pm2 (mostly < 2 hours uptime)

132
docs/network.md Normal file
View File

@@ -0,0 +1,132 @@
# Network Architecture
**Last Updated**: 2025-11-16
## Network Overview
- **Primary Network**: 10.4.2.0/24
- **Gateway**: 10.4.2.254
- **Bridge**: vmbr0 (standard on all nodes)
## Node Network Configuration
All Proxmox nodes use a similar network configuration:
- **Physical Interface**: eno1 (1Gbps Ethernet)
- **Bridge**: vmbr0 (Linux bridge)
- **Bridge Config**: STP off, forward delay 0
### Example Configuration (pm2)
```
auto vmbr0
iface vmbr0 inet static
address 10.4.2.6/24
gateway 10.4.2.254
bridge-ports eno1
bridge-stp off
bridge-fd 0
```
## IP Address Allocation
### Infrastructure Devices
| IP | Device | Type | Notes |
|---|---|---|---|
| 10.4.2.2 | pm1 | Proxmox Node | 4 cores, 16GB RAM |
| 10.4.2.3 | pm3 | Proxmox Node | 16 cores, 33GB RAM |
| 10.4.2.5 | pm4 | Proxmox Node | 12 cores, 31GB RAM |
| 10.4.2.6 | pm2 | Proxmox Node | 12 cores, 31GB RAM (primary mgmt) |
| 10.4.2.13 | KavNas | Synology DS918+ | Primary NFS storage |
| 10.4.2.14 | elantris | Proxmox Node | 16 cores, 128GB RAM, Storage node |
| 10.4.2.254 | Gateway | Router | Network gateway |
### Service IPs (LXC/VM)
#### Reverse Proxy & Auth
| IP | Service | VMID | Node | Purpose |
|---|---|---|---|---|
| 10.4.2.10 | traefik | 104 | pm2 | Reverse proxy |
| 10.4.2.23 | authelia | 116 | pm2 | Authentication |
#### Media Automation Stack
| IP | Service | VMID | Node | Purpose |
|---|---|---|---|---|
| 10.4.2.15 | sonarr | 105 | pm2 | TV show management |
| 10.4.2.16 | radarr | 108 | pm2 | Movie management |
| 10.4.2.17 | prowlarr | 114 | pm2 | Indexer manager |
| 10.4.2.18 | bazarr | 119 | pm2 | Subtitle management |
| 10.4.2.19 | whisparr | 117 | pm2 | Adult content management |
| 10.4.2.24 | notifiarr | 118 | pm2 | Notification service |
#### Media Servers
| IP | Service | VMID | Node | Purpose |
|---|---|---|---|---|
| 10.4.2.20 | jellyseerr | 115 | pm2 | Request management |
| 10.4.2.21 | kometa | 120 | pm2 | Metadata manager |
| 10.4.2.22 | jellyfin | 121 | elantris | Media server |
### Dynamic/DHCP Services
The following services currently use DHCP or don't have static IPs documented:
- VMID 100: haos12.1 (Home Assistant)
- VMID 101: twingate
- VMID 102: zwave-js-ui
- VMID 103: shinobi
- VMID 106: mqtt
- VMID 107: dockge
- VMID 109: docker-pm3
- VMID 110: docker-pm4
- VMID 111: frigate
- VMID 112: foundryvtt
- VMID 113: docker-pm2
## Reserved IP Ranges
**Recommendation**: Reserve IP ranges for different service types:
- `10.4.2.1-10.4.2.20`: Infrastructure and core services
- `10.4.2.21-10.4.2.50`: Media services
- `10.4.2.51-10.4.2.100`: Home automation and IoT
- `10.4.2.101-10.4.2.150`: General applications
- `10.4.2.151-10.4.2.200`: Testing and development
## NFS Mounts
### KavNas (10.4.2.13)
- **Source**: Synology DS918+ NAS
- **Mount**: Available on all Proxmox nodes
- **Capacity**: 23TB total
- **Usage**: ~9.2TB used
- **Purpose**: Primary shared storage for LXC rootfs, backups, ISOs, templates
- **Mount Point on Nodes**: `/mnt/pve/KavNas`
### elantris-downloads (10.4.2.14)
- **Source**: elantris node
- **Mount**: Available on all Proxmox nodes
- **Capacity**: 23TB total
- **Usage**: ~10.6TB used
- **Purpose**: Download storage, media staging
- **Mount Point on Nodes**: `/mnt/pve/elantris-downloads`
### elantris-media
- **Source**: elantris node
- **Mount**: Used by media services
- **Purpose**: Media library storage
- **Mounted in LXCs**: sonarr, radarr (mounted at `/media`)
## Firewall Notes
*TODO: Document firewall rules and port forwarding as configured*
## VLAN Configuration
Currently using a flat network (no VLANs configured). Consider implementing VLANs for:
- Management network (Proxmox nodes)
- Service network (LXC/VM services)
- IoT network (smart home devices)
- Storage network (NFS traffic)
## Future Network Improvements
- [ ] Implement VLANs for network segmentation
- [ ] Document all static IP assignments
- [ ] Set up monitoring for network traffic
- [ ] Consider 10GbE for storage traffic between nodes
- [ ] Implement proper DNS (currently using gateway)

178
docs/recyclarr-setup.md Normal file
View File

@@ -0,0 +1,178 @@
# Recyclarr Setup - TRaSH Guides Automation
**Last Updated**: 2025-11-16
## Overview
Recyclarr automatically syncs TRaSH Guides recommended custom formats and quality profiles to Radarr and Sonarr.
## Installation Details
- **LXC**: VMID 122 on pm2
- **IP Address**: 10.4.2.25
- **Binary**: `/usr/local/bin/recyclarr`
- **Config**: `/root/.config/recyclarr/recyclarr.yml`
## Configuration Summary
### Radarr (Movies)
- **URL**: http://10.4.2.16:7878
- **API Key**: 5e6796988abf4d6d819a2b506a44f422
- **Quality Profiles**:
- HD Bluray + WEB (1080p standard)
- Remux-1080p - Anime
- **Custom Formats**: 34 formats synced
- **Dolby Vision**: **BLOCKED** (DV w/o HDR fallback scored at -10000)
**Key Settings**:
- Standard profile prefers 1080p Bluray and WEB releases
- Anime profile includes Remux with merged quality groups
- Blocks Dolby Vision Profile 5 (no HDR fallback) on standard profile
- Blocks unwanted formats (BR-DISK, LQ, x265 HD, 3D, AV1, Extras)
- Uses TRaSH Guides release group tiers (BD, WEB, Anime BD, Anime WEB)
### Sonarr (TV Shows)
- **URL**: http://10.4.2.15:8989
- **API Key**: b331fe18ec2144148a41645d9ce8b249
- **Quality Profiles**:
- WEB-1080p (standard)
- Remux-1080p - Anime
- **Custom Formats**: 29 formats synced
- **Dolby Vision**: **BLOCKED** (DV w/o HDR fallback scored at -10000)
**Key Settings**:
- Standard profile prefers 1080p WEB releases (WEB-DL and WEBRip)
- Anime profile includes Bluray Remux with merged quality groups
- Blocks Dolby Vision Profile 5 (no HDR fallback) on standard profile
- Blocks unwanted formats (BR-DISK, LQ, x265 HD, AV1, Extras)
- Uses TRaSH Guides WEB release group tiers and Anime tiers
## Automated Sync Schedule
Recyclarr runs daily at 6:00 AM via cron:
```bash
0 6 * * * /usr/local/bin/recyclarr sync > /dev/null 2>&1
```
## Manual Sync
To manually trigger a sync:
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync"
```
## Dolby Vision Blocking
Both Radarr and Sonarr are configured to **completely block** Dolby Vision releases without HDR10 fallback (Profile 5). These releases will receive a score of **-10000**, ensuring they are never downloaded.
**What this blocks**:
- WEB-DL releases with Dolby Vision Profile 5 (no HDR10 fallback)
- Any release that only plays back in DV without falling back to HDR10
**What this allows**:
- HDR10 releases
- HDR10+ releases
- Dolby Vision Profile 7 with HDR10 fallback (from UHD Blu-ray)
## Custom Format Details
### Blocked Formats (Score: -10000)
- **DV (w/o HDR fallback)**: Blocks DV Profile 5
- **BR-DISK**: Blocks full BluRay disc images
- **LQ**: Blocks low-quality releases
- **x265 (HD)**: Blocks x265 encoded HD content (720p/1080p)
- **3D**: Blocks 3D releases
- **AV1**: Blocks AV1 codec
- **Extras**: Blocks extras, featurettes, etc.
### Preferred Formats
- **WEB Tier 01-03**: Scored 1600-1700 (high-quality WEB groups)
- **UHD Bluray Tier 01-03**: Scored 1700 (Radarr only)
- **Streaming Services**: Neutral score (AMZN, ATVP, DSNP, HBO, etc.)
- **Repack/Proper**: Scored 5-7 (prefers repacks over originals)
## Monitoring
Check Recyclarr logs:
```bash
ssh pm2 "pct exec 122 -- cat /root/.config/recyclarr/logs/recyclarr.log"
```
View last sync results:
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync --preview"
```
## Updating Configuration
1. Edit config: `ssh pm2 "pct exec 122 -- nano /root/.config/recyclarr/recyclarr.yml"`
2. Test config: `ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr config check"`
3. Run sync: `ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync"`
## Troubleshooting
### Check if sync is working
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync --preview"
```
### Verify API connectivity
```bash
# Test Radarr
curl -H "X-Api-Key: 5e6796988abf4d6d819a2b506a44f422" http://10.4.2.16:7878/api/v3/system/status
# Test Sonarr
curl -H "X-Api-Key: b331fe18ec2144148a41645d9ce8b249" http://10.4.2.15:8989/api/v3/system/status
```
### Force resync all custom formats
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync --force"
```
## Important Notes
- **Do not modify custom format scores manually** in Radarr/Sonarr web UI - they will be overwritten on next sync
- **Quality profile changes** made in the web UI may be preserved unless they conflict with Recyclarr config
- **The DV blocking is automatic** - no manual intervention needed
- Recyclarr keeps custom formats up-to-date with TRaSH Guides automatically
## Next Steps
- Monitor downloads to ensure DV content is properly blocked
- Adjust quality profiles in Recyclarr config if needed (e.g., prefer 1080p over 4K)
- Review TRaSH Guides for additional custom formats: https://trash-guides.info/
## Anime Configuration
Both Radarr and Sonarr include a dedicated "Remux-1080p - Anime" quality profile for anime content.
**Key Anime Settings**:
- **Quality groups merged** per TRaSH Guides (Remux + Bluray + WEB + HDTV in combined groups)
- **Anime BD Tiers 01-08**: Scored 1300-1400 (SeaDex muxers, remuxes, fansubs, P2P, mini encodes)
- **Anime WEB Tiers 01-06**: Scored 150-350 (muxers, top fansubs, official subs)
- **Dual Audio preferred**: +101 score for releases with both Japanese and English audio
- **Unwanted blocked**: Same as standard profile (BR-DISK, LQ, x265 HD, AV1, Extras)
**Scoring Differences from Standard Profile**:
- Anime Web Tier 01 scores 350 (vs 1600 for standard WEB Tier 01)
- Emphasizes BD quality over WEB for anime (BD Tier 01 = 1400)
- Merged quality groups allow HDTV to be considered alongside WEB for anime releases
**To use anime profile**:
1. In Radarr/Sonarr, edit a movie or series
2. Change quality profile to "Remux-1080p - Anime"
3. Recyclarr will automatically manage custom format scores
## Inventory Update
Added to cluster inventory:
- **VMID**: 122
- **Name**: recyclarr
- **Node**: pm2
- **IP**: 10.4.2.25
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (KavNas)
- **Purpose**: TRaSH Guides automation for Radarr/Sonarr
- **Tags**: arr, community-script

222
docs/services.md Normal file
View File

@@ -0,0 +1,222 @@
# Service Mappings and Dependencies
**Last Updated**: 2025-11-16
## Service Categories
### Reverse Proxy & Authentication
#### Traefik (VMID 104)
- **Node**: pm2
- **IP**: 10.4.2.10
- **Port**: 80, 443
- **Purpose**: Reverse proxy and load balancer
- **Config Location**: *TODO: Document Traefik config location*
- **Dependencies**: None
- **Backends**: Routes traffic to all web services
#### Authelia (VMID 116)
- **Node**: pm2
- **IP**: 10.4.2.23
- **Purpose**: Single sign-on and authentication
- **Dependencies**: Traefik
- **Protected Services**: *TODO: Document which services require auth*
### Media Automation Stack
#### Prowlarr (VMID 114)
- **Node**: pm2
- **IP**: 10.4.2.17
- **Port**: 9696 (default)
- **Purpose**: Indexer manager for *arr services
- **Dependencies**: None
- **Integrated With**: Sonarr, Radarr, Whisparr
#### Sonarr (VMID 105)
- **Node**: pm2
- **IP**: 10.4.2.15
- **Port**: 8989 (default)
- **Purpose**: TV show automation
- **Dependencies**: Prowlarr
- **Mount Points**:
- `/media` - Media library
- `/mnt/kavnas` - Download staging
- **Integrated With**: Jellyfin, Jellyseerr, Bazarr
#### Radarr (VMID 108)
- **Node**: pm2
- **IP**: 10.4.2.16
- **Port**: 7878 (default)
- **Purpose**: Movie automation
- **Dependencies**: Prowlarr
- **Mount Points**:
- `/media` - Media library
- `/mnt/kavnas` - Download staging
- **Integrated With**: Jellyfin, Jellyseerr, Bazarr
#### Whisparr (VMID 117)
- **Node**: pm2
- **IP**: 10.4.2.19
- **Port**: 6969 (default)
- **Purpose**: Adult content automation
- **Dependencies**: Prowlarr
- **Integrated With**: Jellyfin
#### Bazarr (VMID 119)
- **Node**: pm2
- **IP**: 10.4.2.18
- **Port**: 6767 (default)
- **Purpose**: Subtitle automation
- **Dependencies**: Sonarr, Radarr
- **Integrated With**: Jellyfin
### Media Servers & Requests
#### Jellyfin (VMID 121)
- **Node**: elantris
- **IP**: 10.4.2.22
- **Port**: 8096 (default)
- **Purpose**: Media server
- **Dependencies**: None (reads media library)
- **Media Sources**: *TODO: Document media library paths*
- **Status**: Needs to be added to Traefik config
#### Jellyseerr (VMID 115)
- **Node**: pm2
- **IP**: 10.4.2.20
- **Port**: 5055 (default)
- **Purpose**: Media request management
- **Dependencies**: Jellyfin, Sonarr, Radarr
- **Integrated With**: Jellyfin (for library data)
#### Kometa (VMID 120)
- **Node**: pm2
- **IP**: 10.4.2.21
- **Purpose**: Automated metadata and collection management for Jellyfin
- **Dependencies**: Jellyfin
- **Run Mode**: Scheduled/automated (not web UI)
#### Notifiarr (VMID 118)
- **Node**: pm2
- **IP**: 10.4.2.24
- **Purpose**: Notification relay for *arr apps
- **Dependencies**: Sonarr, Radarr, Prowlarr, etc.
- **Notifications For**: Downloads, upgrades, errors
### Docker Hosts
#### dockge (VMID 107)
- **Node**: pm3
- **Purpose**: Docker Compose management web UI
- **Port**: 5001 (default)
- **Manages**: Docker containers across docker-pm2, docker-pm3, docker-pm4
- **Web UI**: Accessible via browser
#### docker-pm2 (VMID 113)
- **Node**: pm2
- **Purpose**: Docker host (currently empty/minimal)
- **Status**: Available for new containerized services
#### docker-pm3 (VMID 109)
- **Node**: pm3
- **Purpose**: Primary Docker host
- **Status**: Running containerized services (details TBD)
#### docker-pm4 (VMID 110)
- **Node**: pm4
- **Purpose**: Docker host
- **Status**: Running containerized services
### Smart Home & IoT
#### Home Assistant (VMID 100)
- **Node**: pm1
- **Purpose**: Home automation platform
- **Port**: 8123 (default)
- **Type**: Full VM (HAOS)
- **Integrations**: Z-Wave, MQTT, Twingate
#### Z-Wave JS UI (VMID 102)
- **Node**: pm1
- **Purpose**: Z-Wave device management
- **Port**: 8091 (default)
- **Dependencies**: USB Z-Wave stick
- **Integrated With**: Home Assistant
#### MQTT (VMID 106)
- **Node**: pm3
- **Port**: 1883 (MQTT), 9001 (WebSocket)
- **Purpose**: Message broker for IoT devices
- **Dependencies**: None
- **Clients**: Home Assistant, IoT devices
#### Twingate (VMID 101)
- **Node**: pm1
- **Purpose**: Zero-trust network access
- **Type**: VPN alternative
### Surveillance & NVR
#### Frigate (VMID 111)
- **Node**: pm3
- **Port**: 5000 (default)
- **Purpose**: NVR with AI object detection
- **Dependencies**: None
- **Storage**: High (120GB allocated)
- **Features**: Object detection, motion detection
- **Integrated With**: Home Assistant
#### Shinobi (VMID 103)
- **Node**: pm4
- **Port**: 8080 (default)
- **Purpose**: Network Video Recorder
- **Storage**: High network traffic (407GB in)
- **Status**: May be deprecated in favor of Frigate
### Gaming
#### FoundryVTT (VMID 112)
- **Node**: pm3
- **Port**: 30000 (default)
- **Purpose**: Virtual tabletop for RPG gaming
- **Storage**: 100GB (for assets, maps, modules)
- **Access**: Password protected
## Service Access URLs
*TODO: Document Traefik routes for each service*
Expected format:
- Jellyfin: https://jellyfin.yourdomain.com
- Sonarr: https://sonarr.yourdomain.com
- Radarr: https://radarr.yourdomain.com
- etc.
## Service Dependencies Map
```
Traefik (proxy)
├── Authelia (auth)
├── Jellyfin (media server)
├── Jellyseerr (requests) → Jellyfin, Sonarr, Radarr
├── Sonarr → Prowlarr, Bazarr
├── Radarr → Prowlarr, Bazarr
├── Whisparr → Prowlarr
├── Prowlarr (indexers)
├── Bazarr → Sonarr, Radarr
├── Home Assistant → MQTT, Z-Wave JS UI
├── Frigate → Home Assistant (optional)
└── FoundryVTT
```
## Migration Candidates (Docker → LXC)
Services currently in Docker that could be migrated to LXC:
- *TODO: Document after reviewing Docker container inventory*
## Service Maintenance Notes
- Most services auto-update or have update notifications
- Monitor Frigate storage usage (generates large video files)
- Dockge provides easy UI for managing Docker stacks
- *arr services should be updated together to maintain compatibility

184
docs/storage.md Normal file
View File

@@ -0,0 +1,184 @@
# Storage Architecture
**Last Updated**: 2025-11-16
## Storage Overview
The KavCorp cluster uses a multi-tiered storage approach:
1. **Local node storage**: For node-specific data, templates, ISOs
2. **NFS shared storage**: For LXC containers, backups, and shared data
3. **ZFS pools**: For high-performance storage on specific nodes
## Storage Pools
### Local Storage (Per-Node)
Each node has two local storage pools:
#### `local` - Directory Storage
- **Type**: Directory
- **Size**: ~100GB per node
- **Content Types**: backup, vztmpl (templates), iso
- **Location**: `/var/lib/vz`
- **Usage**: Node-specific backups, templates, ISO images
- **Shared**: No
**Per-Node Status**:
| Node | Used | Total | Available |
|---|---|---|---|
| pm1 | 10.1GB | 100.9GB | 90.8GB |
| pm2 | 8.0GB | 100.9GB | 92.9GB |
| pm3 | 6.9GB | 100.9GB | 94.0GB |
| pm4 | 7.5GB | 100.9GB | 93.4GB |
| elantris | 4.1GB | 100.9GB | 96.8GB |
#### `local-lvm` - LVM Thin Pool
- **Type**: LVM Thin
- **Size**: ~350-375GB per node (varies)
- **Content Types**: rootdir, images
- **Usage**: High-performance VM/LXC disks
- **Shared**: No
- **Best For**: Services requiring fast local storage
**Per-Node Status**:
| Node | Used | Total | Available |
|---|---|---|---|
| pm1 | 16.9GB | 374.5GB | 357.6GB |
| pm2 | 0GB | 374.5GB | 374.5GB |
| pm3 | 178.8GB | 362.8GB | 184.0GB |
| pm4 | 0GB | 374.5GB | 374.5GB |
| elantris | 0GB | 362.8GB | 362.8GB |
**Note**: pm3's local-lvm is heavily used (178.8GB) due to:
- VMID 107: dockge (120GB)
- VMID 111: frigate (120GB)
- VMID 112: foundryvtt (100GB)
### NFS Shared Storage
#### `KavNas` - Primary Shared Storage
- **Type**: NFS
- **Source**: 10.4.2.13 (Synology DS918+ NAS)
- **Size**: 23TB (23,029,958,311,936 bytes)
- **Used**: 9.2TB (9,241,738,215,424 bytes)
- **Available**: 13.8TB
- **Content Types**: snippets, iso, images, backup, rootdir, vztmpl
- **Shared**: Yes (available on all nodes)
- **Best For**:
- LXC container rootfs (most new containers use this)
- Backups
- ISO images
- Templates
- Data that needs to be accessible across nodes
**Current Usage**:
- Most LXC containers on pm2 use KavNas for rootfs
- Provides easy migration between nodes
- Centralized backup location
#### `elantris-downloads` - Download Storage
- **Type**: NFS
- **Source**: 10.4.2.14 (elantris node)
- **Size**: 23TB (23,116,582,486,016 bytes)
- **Used**: 10.6TB (10,630,966,804,480 bytes)
- **Available**: 12.5TB
- **Content Types**: rootdir, images
- **Shared**: Yes (available on all nodes)
- **Best For**:
- Download staging area
- Media downloads
- Large file operations
### ZFS Storage
#### `el-pool` - ZFS Pool (elantris)
- **Type**: ZFS
- **Node**: elantris only
- **Size**: 24TB (26,317,550,091,635 bytes)
- **Used**: 13.8TB (13,831,934,311,603 bytes)
- **Available**: 12.5TB
- **Content Types**: images, rootdir
- **Shared**: No (elantris only)
- **Best For**:
- High-performance storage on elantris
- Large data sets requiring ZFS features
- Services that benefit from compression/deduplication
**Current Usage**:
- VMID 121: jellyfin (16GB on el-pool)
**Status on Other Nodes**: Shows as "unknown" - ZFS pool is local to elantris only
## Storage Recommendations
### For New LXC Containers
**General Purpose Services** (web apps, APIs, small databases):
- **Storage**: `KavNas`
- **Disk Size**: 4-10GB
- **Rationale**: Shared, easy to migrate, automatically backed up
**High-Performance Services** (databases, caches):
- **Storage**: `local-lvm`
- **Disk Size**: As needed
- **Rationale**: Fast local SSD storage
**Large Storage Services** (media, file storage):
- **Storage**: `elantris-downloads` or `el-pool`
- **Disk Size**: As needed
- **Rationale**: Large capacity, optimized for bulk storage
### Mount Points for Media Services
Media-related LXCs typically mount:
```
mp0: /mnt/pve/elantris-media,mp=/media,ro=0
mp1: /mnt/pve/KavNas,mp=/mnt/kavnas
```
This provides:
- Access to media library via `/media`
- Access to NAS storage via `/mnt/kavnas`
## Storage Performance Notes
### Best Performance
1. `local-lvm` (local SSD on each node)
### Best Redundancy/Availability
1. `KavNas` (NAS with RAID, accessible from all nodes)
2. `elantris-downloads` (large capacity, shared)
### Best for Large Files
1. `el-pool` (ZFS on elantris, 24TB)
2. `elantris-downloads` (23TB NFS)
3. `KavNas` (23TB NFS)
## Backup Strategy
**Current Setup**:
- Backups stored on `KavNas` NFS share
- All nodes can write backups to KavNas
- Centralized backup location
**Recommendations**:
- [ ] Document automated backup schedules
- [ ] Implement off-site backup rotation
- [ ] Test restore procedures
- [ ] Monitor KavNas free space (currently 60% used)
## Storage Monitoring
**Watch These Metrics**:
- pm3 `local-lvm`: 49% used (178.8GB / 362.8GB)
- KavNas: 40% used (9.2TB / 23TB)
- elantris-downloads: 46% used (10.6TB / 23TB)
- el-pool: 53% used (13.8TB / 24TB)
## Future Storage Improvements
- [ ] Set up automated cleanup of old backups
- [ ] Implement storage quotas for LXC containers
- [ ] Consider SSD caching for NFS mounts
- [ ] Document backup retention policies
- [ ] Set up alerts for storage thresholds (80%, 90%)

131
docs/traefik-ssl-setup.md Normal file
View File

@@ -0,0 +1,131 @@
# Traefik SSL/TLS Setup with Namecheap
**Last Updated**: 2025-11-16
## Configuration Summary
Traefik is configured to use Let's Encrypt with DNS-01 challenge via Namecheap for wildcard SSL certificates.
### Environment Variables
Located in: `/etc/systemd/system/traefik.service.d/override.conf` (inside Traefik LXC 104)
```bash
NAMECHEAP_API_USER=kavren
NAMECHEAP_API_KEY=8156f3d9ef664c91b95f029dfbb62ad5
NAMECHEAP_PROPAGATION_TIMEOUT=3600 # 1 hour timeout for DNS propagation
NAMECHEAP_POLLING_INTERVAL=30 # Check every 30 seconds
NAMECHEAP_TTL=300 # 5 minute TTL for DNS records
```
### Traefik Configuration
File: `/etc/traefik/traefik.yaml`
```yaml
certificatesResolvers:
letsencrypt:
acme:
email: cory.bailey87@gmail.com
storage: /etc/traefik/ssl/acme.json
dnsChallenge:
provider: namecheap
resolvers:
- "1.1.1.1:53"
- "8.8.8.8:53"
```
### Wildcard Certificate
Configured for:
- Main domain: `kavcorp.com`
- Wildcard: `*.kavcorp.com`
## Namecheap API Requirements
1. **API Access Enabled**: Must have API access enabled in Namecheap account
2. **IP Whitelisting**: Public IP `99.74.188.161` must be whitelisted
3. **API Key**: Must have valid API key with DNS modification permissions
### Verifying API Access
Test Namecheap API from Traefik LXC:
```bash
pct exec 104 -- curl -s 'https://api.namecheap.com/xml.response?ApiUser=kavren&ApiKey=8156f3d9ef664c91b95f029dfbb62ad5&UserName=kavren&Command=namecheap.domains.getList&ClientIp=99.74.188.161'
```
## Existing Certificates
Valid Let's Encrypt certificates already obtained:
- `traefik.kavcorp.com`
- `sonarr.kavcorp.com`
- `radarr.kavcorp.com`
Stored in: `/etc/traefik/ssl/acme.json`
## Troubleshooting
### Common Issues
**DNS Propagation Timeout**:
- Error: "propagation: time limit exceeded"
- Solution: Increased `NAMECHEAP_PROPAGATION_TIMEOUT` to 3600 seconds (1 hour)
**API Authentication Failed**:
- Verify IP whitelisted: 99.74.188.161
- Verify API key is correct
- Check API access is enabled in Namecheap
**Deprecated Configuration Warning**:
- Fixed: Removed deprecated `delayBeforeCheck` option
- Now using default propagation settings controlled by environment variables
### Monitoring Certificate Generation
Check Traefik logs:
```bash
ssh pm2 "pct exec 104 -- tail -f /var/log/traefik/traefik.log"
```
Filter for ACME/certificate errors:
```bash
ssh pm2 "pct exec 104 -- cat /var/log/traefik/traefik.log | grep -i 'acme\|certificate\|error'"
```
### Manual Certificate Renewal
Certificates auto-renew. To force renewal:
```bash
# Delete acme.json and restart Traefik (will regenerate all certs)
ssh pm2 "pct exec 104 -- rm /etc/traefik/ssl/acme.json && systemctl restart traefik"
```
**WARNING**: Only do this if necessary, as Let's Encrypt has rate limits!
## Certificate Request Flow
1. New service added to `/etc/traefik/conf.d/*.yaml`
2. Traefik detects new route requiring HTTPS
3. Checks if certificate exists in acme.json
4. If not, initiates DNS-01 challenge:
- Creates TXT record via Namecheap API: `_acme-challenge.subdomain.kavcorp.com`
- Waits for DNS propagation (up to 1 hour)
- Polls DNS servers every 30 seconds
- Let's Encrypt verifies TXT record
- Certificate issued and stored in acme.json
5. Certificate served for HTTPS connections
## Next Steps
When adding new services:
1. Add route configuration to `/etc/traefik/conf.d/media-services.yaml` (or create new file)
2. Traefik will automatically request certificate on first HTTPS request
3. Monitor logs for any DNS propagation issues
4. Certificate will be cached and auto-renewed before expiration
## Notes
- Traefik v3.6.1 in use
- DNS-01 challenge allows wildcard certificates
- Certificates valid for 90 days, auto-renewed at 60 days
- Rate limit: 50 certificates per domain per week (Let's Encrypt)

155
scripts/cleanup/README.md Normal file
View File

@@ -0,0 +1,155 @@
# Media Organization Script
## Purpose
This script identifies and organizes media files by comparing them against what Radarr and Sonarr are actively managing. Files that are not managed by either service are moved to a processing folder for manual review.
## Location
Script: `/home/kavren/proxmox-infra/scripts/cleanup/organize-media.py`
## Usage
### On pm2 (where media is mounted)
The script needs to be run on pm2 where the media directories are mounted.
```bash
# Copy script to pm2
scp /home/kavren/proxmox-infra/scripts/cleanup/organize-media.py pm2:/root/organize-media.py
# Run in DRY RUN mode (recommended first)
ssh pm2 "python3 /root/organize-media.py"
# Run with execution (actually move files)
ssh pm2 "python3 /root/organize-media.py --execute"
# Run quietly (only show summary)
ssh pm2 "python3 /root/organize-media.py --quiet"
```
## What It Does
1. **Queries Radarr API** (http://10.4.2.16:7878)
- Gets all movies and their file paths
- Identifies which files are actively managed
2. **Queries Sonarr API** (http://10.4.2.15:8989)
- Gets all TV series and their episode files
- Identifies which files are actively managed
3. **Scans Media Directories**
- `/media/movies` - all video files
- `/media/tv` - all video files
- `/media/anime` - all video files
- Supported extensions: .mkv, .mp4, .avi, .m4v, .ts, .wmv, .flv, .webm
4. **Categorizes Files**
- **Managed**: Files that exist in Radarr/Sonarr (kept in place)
- **Unmanaged**: Files not in Radarr/Sonarr (marked for moving)
5. **Processes Unmanaged Files** (when --execute is used)
- Creates `/media/processing/from-movies/`, `/media/processing/from-tv/`, `/media/processing/from-anime/`
- Moves unmanaged files preserving relative directory structure
- Creates log file: `/media/processing/cleanup-log-{timestamp}.txt`
6. **Reports Empty Directories**
- Lists directories that would be empty after cleanup
- Does NOT automatically delete them (for safety)
## Safety Features
- **DRY RUN by default**: Shows what would happen without actually moving files
- **Requires --execute flag**: Must explicitly enable actual file operations
- **Detailed logging**: All operations logged with timestamps
- **Preserves structure**: Maintains relative paths when moving files
- **Permission handling**: Gracefully handles access errors
- **Empty directory detection**: Only reports, doesn't delete
## Output
The script provides:
- Real-time progress updates (unless --quiet is used)
- Summary report showing:
- Total files scanned
- Files managed by Radarr/Sonarr
- Unmanaged files found
- Breakdown by media type
- Empty directories detected
- Log file written to `/media/processing/cleanup-log-{timestamp}.txt`
## Example Output
```
================================================================================
SUMMARY REPORT
================================================================================
Mode: DRY RUN MODE
Total files scanned: 2847
Files managed by Radarr/Sonarr: 2847
Unmanaged files found: 0
Unmanaged files by category:
movies: 0 files
tv: 0 files
anime: 0 files
================================================================================
```
## Configuration
The script has hardcoded configuration at the top:
```python
RADARR_URL = "http://10.4.2.16:7878"
RADARR_API_KEY = "5e6796988abf4d6d819a2b506a44f422"
SONARR_URL = "http://10.4.2.15:8989"
SONARR_API_KEY = "b331fe18ec2144148a41645d9ce8b249"
MEDIA_DIRS = {
"movies": "/media/movies",
"tv": "/media/tv",
"anime": "/media/anime"
}
PROCESSING_DIR = "/media/processing"
VIDEO_EXTENSIONS = {'.mkv', '.mp4', '.avi', '.m4v', '.ts', '.wmv', '.flv', '.webm'}
```
## Troubleshooting
### Permission Errors
If you see permission errors, ensure the script is running as root on pm2:
```bash
ssh pm2 "whoami" # Should show 'root'
```
### API Connection Errors
If the script can't connect to Radarr/Sonarr:
- Verify the services are running
- Check the URLs and API keys are correct
- Ensure network connectivity from pm2 to the services
### Missing Directories
If media directories don't exist, the script will log warnings and skip them.
## Maintenance
After running with --execute and reviewing files in `/media/processing/`:
1. Review the moved files
2. Add them to Radarr/Sonarr if needed
3. Delete if they're truly unwanted
4. Review empty directory list from log
5. Manually remove empty directories if desired
## Future Enhancements
Possible improvements:
- Add support for custom media directories via CLI arguments
- Add configuration file support
- Add ability to automatically delete empty directories
- Add dry-run output to file for review
- Add email notifications on completion

409
scripts/cleanup/organize-media.py Executable file
View File

@@ -0,0 +1,409 @@
#!/usr/bin/env python3
"""
Media Organization Script
Compares media files against Radarr/Sonarr managed files and moves unmanaged files to processing folder.
"""
import argparse
import json
import os
import shutil
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, List, Set, Tuple
import urllib.request
import urllib.error
# Configuration
RADARR_URL = "http://10.4.2.16:7878"
RADARR_API_KEY = "5e6796988abf4d6d819a2b506a44f422"
SONARR_URL = "http://10.4.2.15:8989"
SONARR_API_KEY = "b331fe18ec2144148a41645d9ce8b249"
MEDIA_DIRS = {
"movies": "/mnt/pve/elantris-media/movies",
"tv": "/mnt/pve/elantris-media/tv",
"anime": "/mnt/pve/elantris-media/anime"
}
# Path translation: Radarr/Sonarr see /media/* but files are at /mnt/pve/elantris-media/*
PATH_MAPPING = {
"/media/movies": "/mnt/pve/elantris-media/movies",
"/media/tv": "/mnt/pve/elantris-media/tv",
"/media/anime": "/mnt/pve/elantris-media/anime"
}
PROCESSING_DIR = "/mnt/pve/elantris-media/processing"
VIDEO_EXTENSIONS = {'.mkv', '.mp4', '.avi', '.m4v', '.ts', '.wmv', '.flv', '.webm'}
class MediaOrganizer:
def __init__(self, dry_run: bool = True, verbose: bool = True):
self.dry_run = dry_run
self.verbose = verbose
self.managed_files: Set[str] = set()
self.unmanaged_files: Dict[str, List[Path]] = {
"movies": [],
"tv": [],
"anime": []
}
self.stats = {
"total_scanned": 0,
"managed": 0,
"unmanaged": 0,
"moved": 0,
"errors": 0
}
self.log_entries: List[str] = []
def log(self, message: str, level: str = "INFO"):
"""Log a message to console and internal log"""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
log_entry = f"[{timestamp}] [{level}] {message}"
self.log_entries.append(log_entry)
if self.verbose:
print(log_entry)
def translate_path(self, path: str) -> str:
"""Translate Radarr/Sonarr paths to actual filesystem paths"""
for api_path, real_path in PATH_MAPPING.items():
if path.startswith(api_path):
return path.replace(api_path, real_path, 1)
return path
def api_request(self, url: str, api_key: str, endpoint: str) -> dict:
"""Make an API request to Radarr or Sonarr"""
full_url = f"{url}/api/v3/{endpoint}"
headers = {"X-Api-Key": api_key}
try:
req = urllib.request.Request(full_url, headers=headers)
with urllib.request.urlopen(req, timeout=30) as response:
return json.loads(response.read().decode())
except urllib.error.URLError as e:
self.log(f"API request failed for {full_url}: {e}", "ERROR")
return None
except json.JSONDecodeError as e:
self.log(f"Failed to decode JSON response from {full_url}: {e}", "ERROR")
return None
def get_radarr_files(self) -> Set[str]:
"""Get all file paths managed by Radarr"""
self.log("Querying Radarr for managed movie files...")
managed_files = set()
movies = self.api_request(RADARR_URL, RADARR_API_KEY, "movie")
if not movies:
self.log("Failed to retrieve movies from Radarr", "ERROR")
return managed_files
for movie in movies:
# Get the movie file path if it exists
if movie.get("hasFile") and "movieFile" in movie:
file_path = movie["movieFile"].get("path")
if file_path:
# Translate API path to real filesystem path
real_path = self.translate_path(file_path)
managed_files.add(real_path)
self.log(f" Radarr manages: {file_path} -> {real_path}", "DEBUG")
self.log(f"Found {len(managed_files)} files managed by Radarr")
return managed_files
def get_sonarr_files(self) -> Set[str]:
"""Get all file paths managed by Sonarr"""
self.log("Querying Sonarr for managed TV series files...")
managed_files = set()
series = self.api_request(SONARR_URL, SONARR_API_KEY, "series")
if not series:
self.log("Failed to retrieve series from Sonarr", "ERROR")
return managed_files
for show in series:
series_id = show.get("id")
if not series_id:
continue
# Get episode files for this series
episode_files = self.api_request(
SONARR_URL,
SONARR_API_KEY,
f"episodefile?seriesId={series_id}"
)
if episode_files:
for episode_file in episode_files:
file_path = episode_file.get("path")
if file_path:
# Translate API path to real filesystem path
real_path = self.translate_path(file_path)
managed_files.add(real_path)
self.log(f" Sonarr manages: {file_path} -> {real_path}", "DEBUG")
self.log(f"Found {len(managed_files)} files managed by Sonarr")
return managed_files
def scan_directory(self, directory: Path, media_type: str) -> List[Path]:
"""Scan a directory recursively for video files"""
self.log(f"Scanning {directory} for video files...")
video_files = []
if not directory.exists():
self.log(f"Directory does not exist: {directory}", "WARNING")
return video_files
try:
for root, dirs, files in os.walk(directory):
for file in files:
file_path = Path(root) / file
if file_path.suffix.lower() in VIDEO_EXTENSIONS:
video_files.append(file_path)
self.stats["total_scanned"] += 1
except PermissionError as e:
self.log(f"Permission denied accessing {directory}: {e}", "ERROR")
self.stats["errors"] += 1
except Exception as e:
self.log(f"Error scanning {directory}: {e}", "ERROR")
self.stats["errors"] += 1
self.log(f"Found {len(video_files)} video files in {directory}")
return video_files
def categorize_files(self):
"""Scan media directories and categorize files as managed or unmanaged"""
self.log("\n" + "="*80)
self.log("STEP 1: Querying Radarr and Sonarr for managed files")
self.log("="*80)
# Get managed files from Radarr and Sonarr
radarr_files = self.get_radarr_files()
sonarr_files = self.get_sonarr_files()
self.managed_files = radarr_files | sonarr_files
self.log(f"\nTotal managed files: {len(self.managed_files)}")
self.log("\n" + "="*80)
self.log("STEP 2: Scanning media directories")
self.log("="*80)
# Scan each media directory
for media_type, directory in MEDIA_DIRS.items():
dir_path = Path(directory)
video_files = self.scan_directory(dir_path, media_type)
# Categorize each file
for file_path in video_files:
file_str = str(file_path)
if file_str in self.managed_files:
self.stats["managed"] += 1
self.log(f" MANAGED: {file_path}", "DEBUG")
else:
self.stats["unmanaged"] += 1
self.unmanaged_files[media_type].append(file_path)
self.log(f" UNMANAGED: {file_path}", "DEBUG")
def create_processing_structure(self):
"""Create processing directory structure"""
self.log("\n" + "="*80)
self.log("STEP 3: Creating processing directory structure")
self.log("="*80)
processing_path = Path(PROCESSING_DIR)
for media_type in MEDIA_DIRS.keys():
subdir = processing_path / f"from-{media_type}"
if self.dry_run:
self.log(f"[DRY RUN] Would create directory: {subdir}")
else:
try:
subdir.mkdir(parents=True, exist_ok=True)
self.log(f"Created directory: {subdir}")
except Exception as e:
self.log(f"Failed to create directory {subdir}: {e}", "ERROR")
self.stats["errors"] += 1
def move_unmanaged_files(self):
"""Move unmanaged files to processing folder"""
self.log("\n" + "="*80)
self.log("STEP 4: Moving unmanaged files to processing folder")
self.log("="*80)
processing_path = Path(PROCESSING_DIR)
for media_type, files in self.unmanaged_files.items():
if not files:
self.log(f"No unmanaged files found in {media_type}")
continue
self.log(f"\nProcessing {len(files)} unmanaged files from {media_type}...")
source_dir = Path(MEDIA_DIRS[media_type])
dest_base = processing_path / f"from-{media_type}"
for file_path in files:
try:
# Preserve relative path structure
relative_path = file_path.relative_to(source_dir)
dest_path = dest_base / relative_path
if self.dry_run:
self.log(f"[DRY RUN] Would move: {file_path}")
self.log(f" To: {dest_path}")
else:
# Create destination directory if needed
dest_path.parent.mkdir(parents=True, exist_ok=True)
# Move the file
shutil.move(str(file_path), str(dest_path))
self.log(f"Moved: {file_path} -> {dest_path}")
self.stats["moved"] += 1
except Exception as e:
self.log(f"Failed to move {file_path}: {e}", "ERROR")
self.stats["errors"] += 1
def find_empty_directories(self) -> List[Path]:
"""Find directories that would be empty after moving files"""
self.log("\n" + "="*80)
self.log("STEP 5: Identifying empty directories")
self.log("="*80)
empty_dirs = []
for media_type, directory in MEDIA_DIRS.items():
dir_path = Path(directory)
if not dir_path.exists():
continue
try:
for root, dirs, files in os.walk(dir_path, topdown=False):
root_path = Path(root)
# Skip if this is the root media directory
if root_path == dir_path:
continue
# Check if directory is empty or would be empty
try:
contents = list(root_path.iterdir())
if not contents:
empty_dirs.append(root_path)
self.log(f"Empty directory: {root_path}")
except PermissionError:
self.log(f"Permission denied checking {root_path}", "WARNING")
except Exception as e:
self.log(f"Error finding empty directories in {directory}: {e}", "ERROR")
return empty_dirs
def write_log_file(self):
"""Write log file to processing directory"""
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
log_path = Path(PROCESSING_DIR) / f"cleanup-log-{timestamp}.txt"
try:
if self.dry_run:
self.log(f"\n[DRY RUN] Would write log file to: {log_path}")
else:
with open(log_path, 'w') as f:
f.write('\n'.join(self.log_entries))
self.log(f"\nLog file written to: {log_path}")
except Exception as e:
self.log(f"Failed to write log file: {e}", "ERROR")
def print_summary(self, empty_dirs: List[Path]):
"""Print summary report"""
self.log("\n" + "="*80)
self.log("SUMMARY REPORT")
self.log("="*80)
mode = "DRY RUN MODE" if self.dry_run else "EXECUTION MODE"
self.log(f"\nMode: {mode}")
self.log(f"\nTotal files scanned: {self.stats['total_scanned']}")
self.log(f"Files managed by Radarr/Sonarr: {self.stats['managed']}")
self.log(f"Unmanaged files found: {self.stats['unmanaged']}")
if not self.dry_run:
self.log(f"Files successfully moved: {self.stats['moved']}")
if self.stats['errors'] > 0:
self.log(f"Errors encountered: {self.stats['errors']}", "WARNING")
self.log("\nUnmanaged files by category:")
for media_type, files in self.unmanaged_files.items():
self.log(f" {media_type}: {len(files)} files")
if empty_dirs:
self.log(f"\nEmpty directories found: {len(empty_dirs)}")
self.log("(These directories can be manually removed if desired)")
self.log("\n" + "="*80)
def run(self):
"""Main execution method"""
self.log("="*80)
self.log("MEDIA ORGANIZATION SCRIPT")
self.log("="*80)
self.log(f"Mode: {'DRY RUN' if self.dry_run else 'EXECUTE'}")
self.log(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
# Step 1 & 2: Categorize files
self.categorize_files()
# Step 3: Create processing structure
self.create_processing_structure()
# Step 4: Move unmanaged files
self.move_unmanaged_files()
# Step 5: Find empty directories
empty_dirs = self.find_empty_directories()
# Print summary
self.print_summary(empty_dirs)
# Write log file
self.write_log_file()
return self.stats
def main():
parser = argparse.ArgumentParser(
description="Organize media files by comparing against Radarr/Sonarr managed files"
)
parser.add_argument(
"--execute",
action="store_true",
help="Actually move files (default is dry run mode)"
)
parser.add_argument(
"--quiet",
action="store_true",
help="Reduce verbosity (only show summary)"
)
args = parser.parse_args()
# Create organizer instance
organizer = MediaOrganizer(
dry_run=not args.execute,
verbose=not args.quiet
)
# Run the organization
stats = organizer.run()
# Exit with appropriate code
if stats["errors"] > 0:
sys.exit(1)
else:
sys.exit(0)
if __name__ == "__main__":
main()