Initial commit: KavCorp infrastructure documentation

- CLAUDE.md: Project configuration for Claude Code
- docs/: Infrastructure documentation
  - INFRASTRUCTURE.md: Service map, storage, network
  - CONFIGURATIONS.md: Service configs and credentials
  - CHANGELOG.md: Change history
  - DECISIONS.md: Architecture decisions
  - TASKS.md: Task tracking
- scripts/: Automation scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-07 22:07:01 -05:00
commit 120c2ec809
19 changed files with 3448 additions and 0 deletions

153
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,153 @@
# Changelog
> **Purpose**: Historical record of all significant infrastructure changes
## 2025-12-07
### Service Additions
- **Vaultwarden**: Created new password manager LXC
- LXC 125 on pm4
- IP: 10.4.2.212
- Domain: vtw.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/vaultwarden.yaml`
- Tagged: community-script, password-manager
- **Immich**: Migrated from Docker (dockge LXC 107 on pm3) to native LXC
- LXC 126 on pm4
- IP: 10.4.2.24:2283
- Domain: immich.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/immich.yaml`
- Library storage: NFS mount from elantris (`/el-pool/downloads/immich/`)
- 38GB photo library transferred via rsync
- Fresh database (version incompatibility: old v1.129.0 → new v2.3.1)
- Services: immich-web.service, immich-ml.service
- Tagged: community-script, photos
### Infrastructure Maintenance
- **Traefik (LXC 104)**: Fixed disk full issue
- Truncated 895MB access log that filled 2GB rootfs
- Added logrotate config to prevent recurrence (50MB max, 7 day rotation)
- Cleaned apt cache and journal logs
## 2025-11-20
### Service Changes
- **AMP**: Added to Traefik reverse proxy
- LXC 124 on elantris (10.4.2.26:8080)
- Domain: amp.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/amp.yaml`
- Purpose: Game server management via CubeCoders AMP
## 2025-11-19
### Service Changes
- **LXC 123 (elantris)**: Migrated from Ollama to llama.cpp
- Removed Ollama installation and service
- Built llama.cpp from source with CURL support
- Downloaded TinyLlama 1.1B Q4_K_M model (~667MB)
- Created systemd service for llama.cpp server
- Server running on port 11434 (OpenAI-compatible API)
- Model path: `/opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
- Service: `llama-cpp.service`
- Domain remains: ollama.kavcorp.com (pointing to llama.cpp now)
- **LXC 124 (elantris)**: Created new AMP (Application Management Panel) container
- IP: 10.4.2.26
- Resources: 4 CPU cores, 4GB RAM, 16GB storage
- Storage: local-lvm on elantris
- OS: Ubuntu 24.04 LTS
- Purpose: Game server management via CubeCoders AMP
- Tagged: gaming, amp
## 2025-11-17
### Service Additions
- **Ollama**: Added to Traefik reverse proxy
- LXC 123 on elantris
- IP: 10.4.2.224:11434
- Domain: ollama.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/ollama.yaml`
- Downloaded Qwen 3 Coder 30B model
- **Frigate**: Added to Traefik reverse proxy
- LXC 111 on pm3
- IP: 10.4.2.215:5000
- Domain: frigate.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/frigate.yaml`
- **Foundry VTT**: Added to Traefik reverse proxy
- LXC 112 on pm3
- IP: 10.4.2.37:30000
- Domain: vtt.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/foundry.yaml`
### Infrastructure Changes
- **SSH Access**: Regenerated SSH keys on pm2 and distributed to all cluster nodes
- pm3 SSH service was down, enabled and configured
- All nodes (pm1, pm2, pm3, pm4, elantris) now accessible from pm2 via Proxmox web UI
### Service Configuration
- **NZBGet**: Fixed file permissions
- Set `UMask=0000` in nzbget.conf to create files with 777 permissions
- Fixed permission issues causing Sonarr import failures
- **Sonarr**: Enabled automatic permission setting
- Media Management → Set Permissions → chmod 777
- Ensures imported files are accessible by Jellyfin
- **Jellyseerr**: Fixed Traefik routing
- Corrected IP from 10.4.2.20 to 10.4.2.18 in media-services.yaml
- **Jellyfin**: Fixed LXC mount issues
- Restarted LXC 121 to activate media mounts
- Media now visible in `/media/tv`, `/media/movies`, `/media/anime`
### Documentation
- **Major Reorganization**: Consolidated scattered docs into structured system
- Created `README.md` - Documentation index and guide
- Created `INFRASTRUCTURE.md` - All infrastructure details
- Created `CONFIGURATIONS.md` - Service configurations
- Created `DECISIONS.md` - Architecture decisions and patterns
- Created `TASKS.md` - Current and pending tasks
- Created `CHANGELOG.md` - This file
- Updated `CLAUDE.md` - Added documentation policy
## 2025-11-16
### Service Deployments
- **Home Assistant**: Added to Traefik reverse proxy
- Domain: hass.kavcorp.com
- Configured trusted proxies in Home Assistant
- **Frigate**: Added to Traefik reverse proxy
- Domain: frigate.kavcorp.com
- **Proxmox**: Added to Traefik reverse proxy
- Domain: pm.kavcorp.com
- Backend: pm2 (10.4.2.6:8006)
- **Recyclarr**: Configured TRaSH Guides automation
- Sonarr and Radarr quality profiles synced
- Dolby Vision blocking implemented
- Daily sync schedule via cron
### Configuration Changes
- **Traefik**: Removed Authelia from *arr services
- Services now use only built-in authentication
- Simplified access for Sonarr, Radarr, Prowlarr, Bazarr, Whisparr, NZBGet
### Issues Encountered
- Media organization script moved files incorrectly
- Sonarr database corruption (lost TV series tracking)
- Permission issues with NZBGet downloads
- Jellyfin LXC mount not active after deployment
### Lessons Learned
- Always verify file permissions (777 required for NFS media)
- Backup service databases before running automation scripts
- LXC mounts may need container restart to activate
- Traefik auto-reloads configs, no restart needed
## Earlier History
*To be documented from previous sessions if needed*

312
docs/CONFIGURATIONS.md Normal file
View File

@@ -0,0 +1,312 @@
# Configuration Reference
> **Purpose**: Detailed configuration for all services - copy/paste ready configs and settings
> **Update Frequency**: When service configurations change
## Traefik
### SSL/TLS with Let's Encrypt
**Location**: LXC 104 on pm2
**Environment Variables** (`/etc/systemd/system/traefik.service.d/override.conf`):
```bash
NAMECHEAP_API_USER=kavren
NAMECHEAP_API_KEY=8156f3d9ef664c91b95f029dfbb62ad5
NAMECHEAP_PROPAGATION_TIMEOUT=3600
NAMECHEAP_POLLING_INTERVAL=30
NAMECHEAP_TTL=300
```
**Main Config** (`/etc/traefik/traefik.yaml`):
```yaml
certificatesResolvers:
letsencrypt:
acme:
email: cory.bailey87@gmail.com
storage: /etc/traefik/ssl/acme.json
dnsChallenge:
provider: namecheap
resolvers:
- "1.1.1.1:53"
- "8.8.8.8:53"
```
### Service Routing Examples
**Home Assistant** (`/etc/traefik/conf.d/home-automation.yaml`):
```yaml
http:
routers:
homeassistant:
rule: "Host(`hass.kavcorp.com`)"
entryPoints:
- websecure
service: homeassistant
tls:
certResolver: letsencrypt
services:
homeassistant:
loadBalancer:
servers:
- url: "http://10.4.2.62:8123"
```
**Ollama** (`/etc/traefik/conf.d/ollama.yaml`):
```yaml
http:
routers:
ollama:
rule: "Host(`ollama.kavcorp.com`)"
entryPoints:
- websecure
service: ollama
tls:
certResolver: letsencrypt
services:
ollama:
loadBalancer:
servers:
- url: "http://10.4.2.224:11434"
```
**Frigate** (`/etc/traefik/conf.d/frigate.yaml`):
```yaml
http:
routers:
frigate:
rule: "Host(`frigate.kavcorp.com`)"
entryPoints:
- websecure
service: frigate
tls:
certResolver: letsencrypt
services:
frigate:
loadBalancer:
servers:
- url: "http://10.4.2.215:5000"
```
**Foundry VTT** (`/etc/traefik/conf.d/foundry.yaml`):
```yaml
http:
routers:
foundry:
rule: "Host(`vtt.kavcorp.com`)"
entryPoints:
- websecure
service: foundry
tls:
certResolver: letsencrypt
services:
foundry:
loadBalancer:
servers:
- url: "http://10.4.2.37:30000"
```
**Proxmox** (`/etc/traefik/conf.d/proxmox.yaml`):
```yaml
http:
routers:
proxmox:
rule: "Host(`pm.kavcorp.com`)"
entryPoints:
- websecure
service: proxmox
tls:
certResolver: letsencrypt
services:
proxmox:
loadBalancer:
servers:
- url: "https://10.4.2.6:8006"
serversTransport: proxmox-transport
serversTransports:
proxmox-transport:
insecureSkipVerify: true
```
## AMP (Application Management Panel)
**Location**: LXC 124 on elantris
**IP**: 10.4.2.26:8080
**Domain**: amp.kavcorp.com
**Traefik Config** (`/etc/traefik/conf.d/amp.yaml`):
```yaml
http:
routers:
amp:
rule: "Host(`amp.kavcorp.com`)"
entryPoints:
- websecure
service: amp
tls:
certResolver: letsencrypt
services:
amp:
loadBalancer:
servers:
- url: "http://10.4.2.26:8080"
```
## Home Assistant
**Location**: VM 100 on pm1
**IP**: 10.4.2.62:8123
**Reverse Proxy Config** (`/config/configuration.yaml`):
```yaml
http:
use_x_forwarded_for: true
trusted_proxies:
- 10.4.2.10 # Traefik IP
- 172.30.0.0/16 # Home Assistant internal network (for add-ons)
```
## Sonarr
**Location**: LXC 105 on pm2
**IP**: 10.4.2.15:8989
**API Key**: b331fe18ec2144148a41645d9ce8b249
**Media Management Settings**:
- Permissions: Enabled, chmod 777
- Hardlinks: Enabled
- Episode title required: Always
- Free space check: 100MB minimum
## Radarr
**Location**: LXC 108
**IP**: 10.4.2.16:7878
**API Key**: 5e6796988abf4d6d819a2b506a44f422
## NZBGet
**Location**: Docker on kavnas (10.4.2.13)
**Port**: 6789
**Web User**: kavren
**Web Password**: fre8ub2ax8
**Key Settings** (`/volume1/docker/nzbget/config/nzbget.conf`):
```ini
MainDir=/config
DestDir=/downloads/completed
InterDir=/downloads/intermediate
UMask=0000 # Creates files with 777 permissions
```
**Docker Mounts**:
- Config: `/volume1/docker/nzbget/config:/config`
- Downloads: `/volume1/Media/downloads:/downloads`
## Recyclarr
**Location**: LXC 122 on pm2
**IP**: 10.4.2.25
**Binary**: `/usr/local/bin/recyclarr`
**Config**: `/root/.config/recyclarr/recyclarr.yml`
**Sync Schedule**: Daily at 3 AM via cron
**Configured Profiles**:
- **Radarr**: HD Bluray + WEB (1080p), Remux-1080p - Anime
- **Sonarr**: WEB-1080p, Remux-1080p - Anime
- **Custom Formats**: TRaSH Guides synced (Dolby Vision blocked, release group tiers)
## Jellyfin
**Location**: LXC 121 on elantris
**IP**: 10.4.2.21:8096
**Media Mounts** (inside LXC):
- `/media/tv``/el-pool/media/tv`
- `/media/anime``/el-pool/media/anime`
- `/media/movies``/el-pool/media/movies`
**Permissions**: Files must be 777 for Jellyfin user (UID 100107 in LXC) to access
## Vaultwarden
**Location**: LXC 125 on pm4
**IP**: 10.4.2.212:80
**Domain**: vtw.kavcorp.com
**Traefik Config** (`/etc/traefik/conf.d/vaultwarden.yaml`):
```yaml
http:
routers:
vaultwarden:
rule: "Host(`vtw.kavcorp.com`)"
entryPoints:
- websecure
service: vaultwarden
tls:
certResolver: letsencrypt
services:
vaultwarden:
loadBalancer:
servers:
- url: "http://10.4.2.212:80"
```
## Immich
**Location**: LXC 126 on pm4
**IP**: 10.4.2.24:2283
**Domain**: immich.kavcorp.com
**Config** (`/opt/immich/.env`):
```bash
TZ=America/Indiana/Indianapolis
IMMICH_VERSION=release
NODE_ENV=production
DB_HOSTNAME=127.0.0.1
DB_USERNAME=immich
DB_PASSWORD=AulF5JhgWXrRxtaV05
DB_DATABASE_NAME=immich
DB_VECTOR_EXTENSION=pgvector
REDIS_HOSTNAME=127.0.0.1
IMMICH_MACHINE_LEARNING_URL=http://127.0.0.1:3003
MACHINE_LEARNING_CACHE_FOLDER=/opt/immich/cache
IMMICH_MEDIA_LOCATION=/mnt/immich-library
```
**NFS Mount** (configured via `pct set 126 -mp0`):
- Host path: `/mnt/pve/elantris-downloads/immich`
- Container path: `/mnt/immich-library`
- Source: elantris (`/el-pool/downloads/immich/`)
**Systemd Services**:
- `immich-web.service` - Web UI and API
- `immich-ml.service` - Machine learning service
**Traefik Config** (`/etc/traefik/conf.d/immich.yaml`):
```yaml
http:
routers:
immich:
rule: "Host(`immich.kavcorp.com`)"
entryPoints:
- websecure
service: immich
tls:
certResolver: letsencrypt
services:
immich:
loadBalancer:
servers:
- url: "http://10.4.2.24:2283"
```

163
docs/DECISIONS.md Normal file
View File

@@ -0,0 +1,163 @@
# Architecture Decisions & Patterns
> **Purpose**: Record of important decisions, patterns, and "why we do it this way"
> **Update Frequency**: When making significant architectural choices
## Service Organization
### Authentication Strategy
**Decision**: Services use their own built-in authentication, not Authelia
**Reason**: Most *arr services and media tools have robust auth systems
**Exception**: Consider Authelia for future services that lack authentication
### LXC vs Docker
**Keep in Docker**:
- NZBGet (requires specific volume mapping, works well in Docker)
- Multi-container stacks
- Services requiring Docker-specific features
**Migrate to LXC**:
- Single-purpose services (Sonarr, Radarr, etc.)
- Services benefiting from isolation
- Stateless applications
## File Permissions
### Media Files
**Standard**: All media files and folders must be 777
**Reason**:
- NFS mounts between multiple systems with different UID mappings
- Jellyfin runs in LXC with UID namespace mapping (100107)
- Sonarr runs in LXC with different UID mapping
- NZBGet runs in Docker with UID 1000
**Implementation**:
- NZBGet: `UMask=0000` to create files with 777
- Sonarr: Media management → Set permissions → chmod 777
- Manual fixes: `chmod -R 777` on media directories as needed
## Network Architecture
### Reverse Proxy
**Decision**: Single Traefik instance handles all external access
**Location**: LXC 104 on pm2
**Benefits**:
- Single point for SSL/TLS management
- Automatic Let's Encrypt certificate renewal
- Centralized routing configuration
- DNS-01 challenge for wildcard certificates
### Service Domains
**Pattern**: `<service>.kavcorp.com`
**DNS**: All subdomains point to public IP (99.74.188.161)
**Routing**: Traefik inspects Host header and routes internally
## Storage Architecture
### Media Storage
**Decision**: NFS mount from elantris for all media
**Path**: `/mnt/pve/elantris-media` → elantris `/el-pool/media`
**Reason**:
- Centralized storage
- Accessible from all cluster nodes
- Large capacity (24TB ZFS pool)
- Easy to backup/snapshot
### LXC Root Filesystems
**Decision**: Store on KavNas NFS for most services
**Reason**:
- Easy backups
- Portable between nodes
- Network storage sufficient for most workloads
**Exception**: High I/O services use local-lvm
## Monitoring & Maintenance
### Configuration Management
**Decision**: Manual configuration with documentation
**Reason**: Small scale doesn't justify Ansible/Terraform complexity
**Trade-off**: Requires disciplined documentation updates
### Backup Strategy
**Decision**: Proxmox built-in backup to KavNas
**Frequency**: [To be determined]
**Retention**: [To be determined]
## Common Patterns
### Adding a New Service Behind Traefik
1. Deploy service with static IP in 10.4.2.0/24 range
2. Create Traefik config in `/etc/traefik/conf.d/<service>.yaml`
3. Use pattern:
```yaml
http:
routers:
<service>:
rule: "Host(`<service>.kavcorp.com`)"
entryPoints: [websecure]
service: <service>
tls:
certResolver: letsencrypt
services:
<service>:
loadBalancer:
servers:
- url: "http://<ip>:<port>"
```
4. Traefik auto-reloads (no restart needed)
5. Update `docs/INFRASTRUCTURE.md` with service details
### Troubleshooting Permission Issues
1. Check file ownership: `ls -la /path/to/file`
2. Check if 777: `stat /path/to/file`
3. Fix permissions: `chmod -R 777 /path/to/directory`
4. For NZBGet: Verify `UMask=0000` in nzbget.conf
5. For Sonarr/Radarr: Check Settings → Media Management → Set Permissions
### Node SSH Access
**From local machine**:
- User: `kavren`
- Key: `~/.ssh/id_ed25519`
**Between cluster nodes**:
- User: `root`
- Each node has other nodes' keys in `/root/.ssh/authorized_keys`
- Proxmox web UI uses node SSH for shell access
## Known Issues & Workarounds
### Jellyfin Not Seeing Media After Import
**Symptom**: Files imported to `/media/tv` but Jellyfin shows empty
**Cause**: Jellyfin LXC mount not active or permissions wrong
**Fix**:
1. Restart Jellyfin LXC: `pct stop 121 && pct start 121`
2. Verify mount inside LXC: `pct exec 121 -- ls -la /media/tv/`
3. Fix permissions if needed: `chmod -R 777 /mnt/pve/elantris-media/tv/`
### Sonarr/Radarr Import Failures
**Symptom**: "Access denied" errors in logs
**Cause**: Permission mismatch between download client and *arr service
**Fix**: Ensure download folder has 777 permissions
## Future Considerations
- [ ] Automated backup strategy
- [ ] Monitoring/alerting system (Prometheus + Grafana?)
- [ ] Consider Authelia for future services without built-in auth
- [ ] Document disaster recovery procedures
- [ ] Consider consolidating Docker hosts

120
docs/INFRASTRUCTURE.md Normal file
View File

@@ -0,0 +1,120 @@
# Infrastructure Reference
> **Purpose**: Single source of truth for all infrastructure details - nodes, IPs, services, storage, network
> **Update Frequency**: Immediately when infrastructure changes
## Proxmox Cluster Nodes
| Hostname | IP Address | Role | Resources |
|----------|-------------|------|-----------|
| pm1 | 10.4.2.2 | Proxmox cluster node | - |
| pm2 | 10.4.2.6 | Proxmox cluster node (primary management) | - |
| pm3 | 10.4.2.3 | Proxmox cluster node | - |
| pm4 | 10.4.2.5 | Proxmox cluster node | - |
| elantris | 10.4.2.14 | Proxmox cluster node (Debian-based) | 128GB RAM, ZFS storage (24TB) |
**Cluster Name**: KavCorp
**Network**: 10.4.2.0/24
**Gateway**: 10.4.2.254
## Service Map
| Service | IP:Port | Location | Domain | Auth |
|---------|---------|----------|--------|------|
| **Proxmox Web UI** | 10.4.2.6:8006 | pm2 | pm.kavcorp.com | Proxmox built-in |
| **Traefik** | 10.4.2.10 | LXC 104 (pm2) | - | None (reverse proxy) |
| **Authelia** | 10.4.2.19 | LXC 116 (pm2) | auth.kavcorp.com | SSO provider |
| **Sonarr** | 10.4.2.15:8989 | LXC 105 (pm2) | sonarr.kavcorp.com | Built-in |
| **Radarr** | 10.4.2.16:7878 | LXC 108 (pm2) | radarr.kavcorp.com | Built-in |
| **Prowlarr** | 10.4.2.17:9696 | LXC 114 (pm2) | prowlarr.kavcorp.com | Built-in |
| **Jellyseerr** | 10.4.2.18:5055 | LXC 115 (pm2) | jellyseerr.kavcorp.com | Built-in |
| **Whisparr** | 10.4.2.20:6969 | LXC 117 (pm2) | whisparr.kavcorp.com | Built-in |
| **Notifiarr** | 10.4.2.21 | LXC 118 (pm2) | - | API key |
| **Jellyfin** | 10.4.2.21:8096 | LXC 121 (elantris) | jellyfin.kavcorp.com | Built-in |
| **Bazarr** | 10.4.2.22:6767 | LXC 119 (pm2) | bazarr.kavcorp.com | Built-in |
| **Kometa** | 10.4.2.23 | LXC 120 (pm2) | - | N/A |
| **Recyclarr** | 10.4.2.25 | LXC 122 (pm2) | - | CLI only |
| **NZBGet** | 10.4.2.13:6789 | Docker (kavnas) | nzbget.kavcorp.com | Built-in |
| **Home Assistant** | 10.4.2.62:8123 | VM 100 (pm1) | hass.kavcorp.com | Built-in |
| **Frigate** | 10.4.2.215:5000 | LXC 111 (pm3) | frigate.kavcorp.com | Built-in |
| **Foundry VTT** | 10.4.2.37:30000 | LXC 112 (pm3) | vtt.kavcorp.com | Built-in |
| **llama.cpp** | 10.4.2.224:11434 | LXC 123 (elantris) | ollama.kavcorp.com | None (API) |
| **AMP** | 10.4.2.26:8080 | LXC 124 (elantris) | amp.kavcorp.com | Built-in |
| **Vaultwarden** | 10.4.2.212 | LXC 125 (pm4) | vtw.kavcorp.com | Built-in |
| **Immich** | 10.4.2.24:2283 | LXC 126 (pm4) | immich.kavcorp.com | Built-in |
| **KavNas** | 10.4.2.13 | Synology NAS | - | NAS auth |
## Storage Architecture
### NFS Mounts (Shared)
| Mount Name | Source | Mount Point | Size | Usage |
|------------|--------|-------------|------|-------|
| elantris-media | elantris:/el-pool/media | /mnt/pve/elantris-media | ~24TB | Media files (movies, TV, anime) |
| KavNas | kavnas:10.4.2.13:/volume1 | /mnt/pve/KavNas | ~23TB | Backups, ISOs, LXC storage, downloads |
### Local Storage (Per-Node)
| Storage | Type | Size | Usage |
|---------|------|------|-------|
| local | Directory | ~100GB | Backups, templates, ISOs |
| local-lvm | LVM thin pool | ~350-375GB | VM/LXC disks |
### ZFS Pools
| Pool | Location | Size | Usage |
|------|----------|------|-------|
| el-pool | elantris | 24TB | Large data storage |
### Media Folders
| Path | Type | Permissions | Notes |
|------|------|-------------|-------|
| /mnt/pve/elantris-media/movies | NFS | 777 | Movie library |
| /mnt/pve/elantris-media/tv | NFS | 777 | TV show library |
| /mnt/pve/elantris-media/anime | NFS | 777 | Anime library |
| /mnt/pve/elantris-media/processing | NFS | 777 | Processing/cleanup folder |
| /mnt/pve/KavNas/downloads | NFS | 777 | Download client output |
## Network Configuration
### DNS & Domains
**Domain**: kavcorp.com
**DNS Provider**: Namecheap
**Public IP**: 99.74.188.161
All `*.kavcorp.com` subdomains route through Traefik reverse proxy (10.4.2.10) for SSL termination and routing.
### Standard Bridge
**Bridge**: vmbr0
**Physical Interface**: eno1
**CIDR**: 10.4.2.0/24
**Gateway**: 10.4.2.254
## Access & Credentials
### SSH Access
- **User**: kavren (from local machine)
- **User**: root (between cluster nodes)
- **Key Type**: ed25519
- **Node-to-Node**: Passwordless SSH configured for cluster operations
### Important Paths
**Traefik (LXC 104)**:
- Config: `/etc/traefik/traefik.yaml`
- Service configs: `/etc/traefik/conf.d/*.yaml`
- SSL certs: `/etc/traefik/ssl/acme.json`
- Service file: `/etc/systemd/system/traefik.service.d/override.conf`
**Media Services**:
- Sonarr config: `/var/lib/sonarr/`
- Radarr config: `/var/lib/radarr/`
- Recyclarr config: `/root/.config/recyclarr/recyclarr.yml`
**NZBGet (Docker on kavnas)**:
- Config: `/volume1/docker/nzbget/config/nzbget.conf`
- Downloads: `/volume1/Media/downloads/`

145
docs/README.md Normal file
View File

@@ -0,0 +1,145 @@
# Documentation Index
> **Last Updated**: 2025-11-17 (Added Frigate and Foundry VTT to Traefik)
> **IMPORTANT**: Update this index whenever you modify documentation files
## Quick Reference
Need to know... | Check this file
--- | ---
Node IPs, service locations, storage paths | `INFRASTRUCTURE.md`
Service configs, API keys, copy/paste configs | `CONFIGURATIONS.md`
Why we made a decision, common patterns | `DECISIONS.md`
What's currently being worked on | `TASKS.md`
Recent changes and when they happened | `CHANGELOG.md`
## Core Documentation Files
### INFRASTRUCTURE.md
**Purpose**: Single source of truth for all infrastructure
**Contains**:
- Cluster node IPs and specs
- Complete service map with IPs, ports, domains
- Storage architecture (NFS mounts, local storage, ZFS)
- Network configuration
- Important file paths
**Update when**: Infrastructure changes (new service, IP change, storage mount)
---
### CONFIGURATIONS.md
**Purpose**: Detailed service configurations
**Contains**:
- Traefik SSL/TLS setup
- Service routing examples
- API keys and credentials
- Copy/paste ready config snippets
- Service-specific settings
**Update when**: Service configuration changes, API keys rotate, new services added
---
### DECISIONS.md
**Purpose**: Architecture decisions and patterns
**Contains**:
- Why we chose LXC vs Docker for services
- Authentication strategy
- File permission standards (777 for media)
- Common troubleshooting patterns
- Known issues and workarounds
**Update when**: Making architectural decisions, discovering new patterns, solving recurring issues
---
### TASKS.md
**Purpose**: Track ongoing work and TODO items
**Contains**:
- Active tasks being worked on
- Pending tasks
- Blocked items
- Task priority
**Update when**: Starting new work, completing tasks, discovering new work
---
### CHANGELOG.md
**Purpose**: Historical record of changes
**Contains**:
- Date-stamped entries for all significant changes
- Who made the change (user/Claude)
- What was changed and why
- Links to relevant commits or files
**Update when**: After completing any significant work
---
## Legacy Files (To Be Removed)
These files will be consolidated into the core docs above:
- ~~`infrastructure-map.md`~~ → Merged into `INFRASTRUCTURE.md`
- ~~`home-assistant-traefik.md`~~ → Merged into `CONFIGURATIONS.md`
- ~~`traefik-ssl-setup.md`~~ → Merged into `CONFIGURATIONS.md`
- ~~`recyclarr-setup.md`~~ → Merged into `CONFIGURATIONS.md`
Keep for reference (detailed info):
- `cluster-state.md` - Detailed cluster topology
- `inventory.md` - Complete VM/LXC inventory
- `network.md` - Detailed network info
- `storage.md` - Detailed storage info
- `services.md` - Service dependencies and details
## Documentation Workflow
### When Making Changes
1. **Before starting**: Check `INFRASTRUCTURE.md` for current state
2. **During work**: Note what you're changing
3. **After completing**:
- Update relevant core doc (`INFRASTRUCTURE.md`, `CONFIGURATIONS.md`, or `DECISIONS.md`)
- Add entry to `CHANGELOG.md` with date and description
- Update `TASKS.md` to mark work complete
- Update `README.md` (this file) Last Updated date
### Example Workflow
```
Task: Add new service "Tautulli" to monitor Jellyfin
1. Check INFRASTRUCTURE.md → Find next available IP
2. Deploy service
3. Update INFRASTRUCTURE.md → Add Tautulli to service map
4. Update CONFIGURATIONS.md → Add Tautulli config snippet
5. Update CHANGELOG.md → "2025-11-17: Added Tautulli LXC..."
6. Update TASKS.md → Mark "Deploy Tautulli" as complete
7. Update README.md → Change Last Updated date
```
## File Organization
```
docs/
├── README.md ← You are here (index and guide)
├── INFRASTRUCTURE.md ← Infrastructure reference
├── CONFIGURATIONS.md ← Service configurations
├── DECISIONS.md ← Architecture decisions
├── TASKS.md ← Current/ongoing tasks
├── CHANGELOG.md ← Historical changes
├── cluster-state.md ← [Keep] Detailed topology
├── inventory.md ← [Keep] Full VM/LXC list
├── network.md ← [Keep] Network details
├── storage.md ← [Keep] Storage details
└── services.md ← [Keep] Service details
```
## Maintenance
- Review and update docs weekly
- Clean up completed tasks monthly
- Archive old changelog entries yearly
- Verify INFRASTRUCTURE.md matches reality regularly

37
docs/TASKS.md Normal file
View File

@@ -0,0 +1,37 @@
# Current Tasks
> **Last Updated**: 2025-11-17
## In Progress
None currently.
## Pending
### Media Organization
- [ ] Verify Jellyfin can see all imported media
- [ ] Clean up `.processing-loose-episodes` folder
- [ ] Review and potentially restore TV shows from processing folder
### Configuration
- [ ] Consider custom format to prefer English audio releases
- [ ] Review Sonarr language profiles for non-English releases
### Infrastructure
- [ ] Define backup strategy and schedule
- [ ] Set up monitoring/alerting system
- [ ] Document disaster recovery procedures
## Completed (Recent)
- [x] Fixed SSH access between cluster nodes (pm2 can access all nodes)
- [x] Fixed NZBGet permissions (UMask=0000 for 777 files)
- [x] Fixed Sonarr permissions (chmod 777 on imports)
- [x] Fixed Jellyfin LXC mounts (restarted LXC)
- [x] Fixed Jellyseerr IP in Traefik config
- [x] Consolidated documentation structure
- [x] Created documentation index
## Blocked
None currently.

115
docs/cluster-state.md Normal file
View File

@@ -0,0 +1,115 @@
# KavCorp Proxmox Cluster State
**Last Updated**: 2025-11-16
## Cluster Overview
- **Cluster Name**: KavCorp
- **Config Version**: 6
- **Transport**: knet
- **Quorum Status**: Quorate (5/5 nodes online)
- **Total Nodes**: 5
- **Total VMs**: 2
- **Total LXCs**: 19
## Node Details
### pm1 (10.4.2.2)
- **CPU**: 4 cores
- **Memory**: 16GB (15.4 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~52 hours
- **Status**: Online
- **Running Containers**:
- VMID 100: haos12.1 (VM - Home Assistant OS)
- VMID 101: twingate (LXC)
- VMID 102: zwave-js-ui (LXC)
### pm2 (10.4.2.6) - Primary Management Node
- **CPU**: 12 cores
- **Memory**: 31GB (29.3 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~52 hours
- **Status**: Online
- **Running Containers**:
- VMID 104: traefik (LXC - Reverse Proxy)
- VMID 105: sonarr (LXC)
- VMID 108: radarr (LXC)
- VMID 113: docker-pm2 (LXC - Docker host)
- VMID 114: prowlarr (LXC)
- VMID 115: jellyseerr (LXC)
- VMID 116: authelia (LXC)
- VMID 117: whisparr (LXC)
- VMID 118: notifiarr (LXC)
- VMID 119: bazarr (LXC)
- VMID 120: kometa (LXC)
### pm3 (10.4.2.3)
- **CPU**: 16 cores
- **Memory**: 33GB (30.7 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~319 hours (~13 days)
- **Status**: Online
- **Running Containers**:
- VMID 106: mqtt (LXC)
- VMID 107: dockge (LXC - Docker management UI, 12 CPU, 8GB RAM)
- VMID 109: docker-pm3 (VM - Docker host, 4 CPU, 12GB RAM)
- VMID 111: frigate (LXC - NVR)
- VMID 112: foundryvtt (LXC - Virtual tabletop)
### pm4 (10.4.2.5)
- **CPU**: 12 cores
- **Memory**: 31GB (29.3 GiB)
- **Storage**: ~100GB local
- **Uptime**: ~52 hours
- **Status**: Online
- **Running Containers**:
- VMID 103: shinobi (LXC - NVR)
- VMID 110: docker-pm4 (LXC - Docker host)
### elantris (10.4.2.14) - Storage Node
- **CPU**: 16 cores
- **Memory**: 128GB (125.7 GiB) - **Largest node**
- **Storage**: ~100GB local + 24TB ZFS pool (el-pool)
- **Uptime**: ~26 minutes (recently rebooted)
- **Status**: Online
- **Running Containers**:
- VMID 121: jellyfin (LXC - Media server)
## Cluster Health
- **Quorum**: Yes (3/5 required, 5/5 available)
- **Expected Votes**: 5
- **Total Votes**: 5
- **All Nodes**: Online and healthy
## Network Architecture
- **Primary Network**: 10.4.2.0/24
- **Gateway**: 10.4.2.254
- **Bridge**: vmbr0 (on all nodes, bridged to eno1)
- **DNS**: Managed by gateway/router
## Storage Summary
### Shared Storage
- **KavNas** (NFS): 23TB total, ~9.2TB used - Primary shared storage from Synology DS918+
- **elantris-downloads** (NFS): 23TB total, ~10.6TB used - Download storage from elantris
### Node-Local Storage
Each node has:
- **local**: ~100GB directory storage (backups, templates, ISOs)
- **local-lvm**: ~350-375GB LVM thin pool (VM/LXC disks)
### ZFS Storage
- **el-pool** (elantris only): 24TB ZFS pool, ~13.8TB used
## Migration Status
Currently migrating services from Docker containers to dedicated LXCs. Most media stack services (Sonarr, Radarr, etc.) have been successfully migrated to LXCs on pm2.
**Active Docker Hosts**:
- docker-pm2 (LXC 113): Currently empty/minimal usage
- docker-pm3 (VM 109): Active, running containerized services
- docker-pm4 (LXC 110): Active
- dockge (LXC 107): Docker management UI with web interface

View File

@@ -0,0 +1,304 @@
# Home Assistant + Traefik Configuration
**Last Updated**: 2025-11-16
## Overview
Home Assistant is configured to work behind Traefik as a reverse proxy, accessible via `https://hass.kavcorp.com`.
## Configuration Details
### Home Assistant
- **VMID**: 100
- **Node**: pm1
- **Type**: QEMU VM (Home Assistant OS)
- **Internal IP**: 10.4.2.62
- **Internal Port**: 8123
- **External URL**: https://hass.kavcorp.com
### Traefik Configuration
**Location**: `/etc/traefik/conf.d/home-automation.yaml` (inside Traefik LXC 104)
```yaml
http:
routers:
homeassistant:
rule: "Host(`hass.kavcorp.com`)"
entryPoints:
- websecure
service: homeassistant
tls:
certResolver: letsencrypt
# Home Assistant has its own auth
services:
homeassistant:
loadBalancer:
servers:
- url: "http://10.4.2.62:8123"
```
### Home Assistant Configuration
**File**: `/config/configuration.yaml` (inside Home Assistant VM)
Add or merge the following section:
```yaml
http:
use_x_forwarded_for: true
trusted_proxies:
- 10.4.2.10 # Traefik IP
- 172.30.0.0/16 # Home Assistant internal network (for add-ons)
```
#### Configuration Explanation:
- **`use_x_forwarded_for: true`**: Enables Home Assistant to read the real client IP from the `X-Forwarded-For` header that Traefik adds. This is important for:
- Accurate logging of client IPs
- IP-based authentication and blocking
- Geolocation features
- **`trusted_proxies`**: Whitelist of proxy IPs that Home Assistant will trust
- `10.4.2.10` - Traefik reverse proxy
- `172.30.0.0/16` - Home Assistant's internal Docker network (needed for add-ons to communicate)
## Setup Steps
### Method 1: Web UI (Recommended)
1. **Install File Editor Add-on** (if not already installed):
- Go to **Settings****Add-ons****Add-on Store**
- Search for "File editor"
- Click **Install**
2. **Edit Configuration**:
- Open the **File editor** add-on
- Navigate to `/config/configuration.yaml`
- Add the `http:` section shown above
- If an `http:` section already exists, merge the settings
- Save the file
3. **Check Configuration**:
- Go to **Developer Tools****YAML**
- Click **Check Configuration**
- Fix any errors if shown
4. **Restart Home Assistant**:
- Go to **Settings****System****Restart**
- Wait for Home Assistant to come back online
### Method 2: Terminal & SSH Add-on
If you have the **Terminal & SSH** add-on installed:
```bash
# Edit the configuration
nano /config/configuration.yaml
# Add the http section shown above
# Save with Ctrl+X, Y, Enter
# Check configuration
ha core check
# Restart Home Assistant
ha core restart
```
### Method 3: SSH to VM (Advanced)
If you have SSH access to the Home Assistant VM:
```bash
# SSH to pm1 first, then to the VM
ssh pm1
ssh root@10.4.2.62
# Edit configuration
vi /config/configuration.yaml
# Restart Home Assistant
ha core restart
```
## Verification
After configuration and restart:
1. **Test Internal Access**:
```bash
curl -I http://10.4.2.62:8123
```
Should return `HTTP/1.1 200 OK` or `405 Method Not Allowed`
2. **Test Traefik Proxy**:
```bash
curl -I https://hass.kavcorp.com
```
Should return `HTTP/2 200` with valid SSL certificate
3. **Check Logs**:
- In Home Assistant: **Settings** → **System** → **Logs**
- Look for any errors related to HTTP or trusted proxies
- Client IPs should now show actual client IPs, not Traefik's IP
4. **Verify Headers**:
- Open browser developer tools (F12)
- Go to **Network** tab
- Access `https://hass.kavcorp.com`
- Check response headers for `X-Forwarded-For`, `X-Forwarded-Proto`, etc.
## Troubleshooting
### 400 Bad Request / Untrusted Proxy
**Symptom**: Home Assistant returns 400 errors when accessing via Traefik
**Solution**: Verify the `trusted_proxies` configuration includes Traefik's IP (`10.4.2.10`)
```yaml
http:
trusted_proxies:
- 10.4.2.10
```
### Wrong Client IP in Logs
**Symptom**: All requests show Traefik's IP (10.4.2.10) instead of real client IP
**Solution**: Enable `use_x_forwarded_for`:
```yaml
http:
use_x_forwarded_for: true
```
### Configuration Check Fails
**Symptom**: YAML validation fails with syntax errors
**Solution**:
- Ensure proper indentation (2 spaces per level, no tabs)
- Check for special characters that need quoting
- Use `ha core check` to see detailed error messages
### Cannot Access via Domain
**Symptom**: `https://hass.kavcorp.com` doesn't work but direct IP does
**Solution**:
1. Check Traefik logs:
```bash
ssh pm2 "pct exec 104 -- tail -f /var/log/traefik/traefik.log"
```
2. Verify DNS resolves correctly:
```bash
nslookup hass.kavcorp.com
```
3. Check Traefik config was loaded:
```bash
ssh pm2 "pct exec 104 -- cat /etc/traefik/conf.d/home-automation.yaml"
```
### SSL Certificate Issues
**Symptom**: Browser shows SSL certificate errors
**Solution**:
1. Check if Let's Encrypt certificate was generated:
```bash
ssh pm2 "pct exec 104 -- cat /etc/traefik/ssl/acme.json | grep hass"
```
2. Allow time for DNS propagation (up to 1 hour with Namecheap)
3. Check Traefik logs for ACME errors
## Security Considerations
### Authentication
- Home Assistant has its own authentication system
- No Authelia middleware is applied to this route
- Users must log in to Home Assistant directly
- Consider enabling **Multi-Factor Authentication** in Home Assistant:
- **Settings** → **People** → Your User → **Enable MFA**
### Trusted Networks
If you want to bypass authentication for local network access, add to `configuration.yaml`:
```yaml
homeassistant:
auth_providers:
- type: trusted_networks
trusted_networks:
- 10.4.2.0/24 # Local network
allow_bypass_login: true
- type: homeassistant
```
**Warning**: Only use this if your local network is secure!
### IP Banning
Home Assistant can automatically ban IPs after failed login attempts. Ensure `use_x_forwarded_for` is enabled so it bans the actual attacker's IP, not Traefik's IP.
## Related Services
### Frigate Integration
If Frigate is integrated with Home Assistant:
- Frigate is accessible via `https://frigate.kavcorp.com` (see separate Frigate documentation)
- Home Assistant can embed Frigate camera streams
- Both services trust Traefik as reverse proxy
### Add-ons and Internal Communication
Home Assistant add-ons communicate via the internal Docker network (`172.30.0.0/16`). This network must be in `trusted_proxies` for add-ons to work correctly when accessing the Home Assistant API.
## Updating Configuration
When making changes to Home Assistant configuration:
1. **Always check configuration** before restarting:
```bash
ha core check
```
2. **Back up configuration** before major changes:
- **Settings** → **System** → **Backups** → **Create Backup**
3. **Test changes** in a development environment if possible
4. **Monitor logs** after restarting for errors
## DNS Configuration
Ensure your DNS provider (Namecheap) has the correct A record:
```
hass.kavcorp.com → Your public IP (99.74.188.161)
```
Or use a CNAME if you have a wildcard:
```
*.kavcorp.com → Your public IP
```
Traefik handles the Let's Encrypt DNS-01 challenge automatically.
## Additional Resources
- [Home Assistant Reverse Proxy Documentation](https://www.home-assistant.io/integrations/http/#reverse-proxies)
- [Traefik Documentation](https://doc.traefik.io/traefik/)
- [TRaSH Guides - Traefik Setup](https://trash-guides.info/Hardlinks/Examples/)
## Change Log
**2025-11-16**:
- Initial configuration created
- Added Home Assistant to Traefik
- Configured trusted proxies
- Set up `hass.kavcorp.com` domain

View File

@@ -0,0 +1,44 @@
# Infrastructure Map
## Proxmox Cluster Nodes
| Hostname | IP Address | Role |
|----------|-------------|------|
| pm1 | 10.4.2.2 | Proxmox cluster node |
| pm2 | 10.4.2.6 | Proxmox cluster node |
| pm3 | 10.4.2.3 | Proxmox cluster node |
| pm4 | 10.4.2.5 | Proxmox cluster node |
| elantris | 10.4.2.14 | Proxmox cluster node (Debian-based) |
## Key Services
| Service | IP:Port | Location | Notes |
|---------|---------|----------|-------|
| Sonarr | 10.4.2.15:8989 | LXC 105 on pm2 | TV shows |
| Radarr | 10.4.2.16:7878 | - | Movies |
| Prowlarr | 10.4.2.17:9696 | - | Indexer manager |
| Bazarr | 10.4.2.18:6767 | - | Subtitles |
| Whisparr | 10.4.2.19:6969 | - | Adult content |
| Jellyseerr | 10.4.2.20:5055 | LXC 115 on pm2 | Request management |
| Jellyfin | 10.4.2.21:8096 | LXC 121 on elantris | Media server |
| NZBGet | 10.4.2.13:6789 | Docker on kavnas | Download client |
| Traefik | 10.4.2.10 | LXC 104 on pm2 | Reverse proxy |
| Home Assistant | 10.4.2.62:8123 | VM 100 on pm1 | Home automation |
| Frigate | 10.4.2.63:5000 | - | NVR/Camera system |
## Storage
| Mount | Path | Notes |
|-------|------|-------|
| elantris-media | /mnt/pve/elantris-media | NFS from elantris:/el-pool/media |
| KavNas | /mnt/pve/KavNas | NFS from kavnas:/volume1 |
## Domain Mappings
All services accessible via `*.kavcorp.com` through Traefik reverse proxy:
- pm.kavcorp.com → pm2 (10.4.2.6:8006)
- sonarr.kavcorp.com → 10.4.2.15:8989
- radarr.kavcorp.com → 10.4.2.16:7878
- jellyfin.kavcorp.com → 10.4.2.21:8096
- hass.kavcorp.com → 10.4.2.62:8123
- etc.

289
docs/inventory.md Normal file
View File

@@ -0,0 +1,289 @@
# VM and LXC Inventory
**Last Updated**: 2025-11-16
## Virtual Machines
### VMID 100 - haos12.1 (Home Assistant OS)
- **Node**: pm1
- **Type**: QEMU VM
- **CPU**: 2 cores
- **Memory**: 4GB
- **Disk**: 32GB
- **Status**: Running
- **Uptime**: ~52 hours
- **Tags**: proxmox-helper-scripts
- **Purpose**: Home automation platform
### VMID 109 - docker-pm3
- **Node**: pm3
- **Type**: QEMU VM
- **CPU**: 4 cores
- **Memory**: 12GB
- **Disk**: 100GB
- **Status**: Running
- **Uptime**: ~190 hours (~8 days)
- **Purpose**: Docker host for containerized services
- **Notes**: Primary Docker host, high network traffic
## LXC Containers
### Infrastructure Services
#### VMID 104 - traefik
- **Node**: pm2
- **IP**: 10.4.2.10
- **CPU**: 2 cores
- **Memory**: 2GB
- **Disk**: 10GB (KavNas)
- **Status**: Running
- **Tags**: community-script, proxy
- **Purpose**: Reverse proxy and load balancer
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~2.5 hours
#### VMID 106 - mqtt
- **Node**: pm3
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Purpose**: MQTT message broker for IoT devices
- **Uptime**: ~319 hours (~13 days)
- **Notes**: High inbound network traffic (3.4GB)
#### VMID 116 - authelia
- **Node**: pm2
- **IP**: 10.4.2.23
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (KavNas)
- **Status**: Running
- **Tags**: authenticator, community-script
- **Purpose**: Authentication and authorization server
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~1.9 hours
### Media Stack (*arr services)
#### VMID 105 - sonarr
- **Node**: pm2
- **IP**: 10.4.2.15
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Mount Points**:
- /media → elantris-media (NFS)
- /mnt/kavnas → KavNas (NFS)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~56 minutes
#### VMID 108 - radarr
- **Node**: pm2
- **IP**: 10.4.2.16
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Mount Points**:
- /media → elantris-media (NFS)
- /mnt/kavnas → KavNas (NFS)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~56 minutes
#### VMID 114 - prowlarr
- **Node**: pm2
- **IP**: 10.4.2.17
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Indexer manager for *arr services
- **Uptime**: ~56 minutes
#### VMID 117 - whisparr
- **Node**: pm2
- **IP**: 10.4.2.19
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Uptime**: ~56 minutes
#### VMID 119 - bazarr
- **Node**: pm2
- **IP**: 10.4.2.18
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Subtitle management for Sonarr/Radarr
- **Uptime**: ~56 minutes
### Media Servers
#### VMID 115 - jellyseerr
- **Node**: pm2
- **IP**: 10.4.2.20
- **CPU**: 4 cores
- **Memory**: 4GB
- **Disk**: 8GB (KavNas)
- **Status**: Running
- **Tags**: community-script, media
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Request management for Jellyfin
- **Uptime**: ~56 minutes
#### VMID 120 - kometa
- **Node**: pm2
- **IP**: 10.4.2.21
- **CPU**: 2 cores
- **Memory**: 4GB
- **Disk**: 8GB (KavNas)
- **Status**: Running
- **Tags**: community-script, media, streaming
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Media library metadata manager
- **Uptime**: ~1.9 hours
#### VMID 121 - jellyfin
- **Node**: elantris
- **IP**: 10.4.2.22
- **CPU**: 2 cores
- **Memory**: 2GB
- **Disk**: 16GB (el-pool)
- **Status**: Running
- **Tags**: community-script, media
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Media server
- **Uptime**: ~19 minutes
- **Notes**: Recently migrated to elantris
#### VMID 118 - notifiarr
- **Node**: pm2
- **IP**: 10.4.2.24
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (KavNas)
- **Status**: Running
- **Tags**: arr, community-script
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Notification service for *arr apps
- **Uptime**: ~1.9 hours
### Docker Hosts
#### VMID 107 - dockge
- **Node**: pm3
- **CPU**: 12 cores
- **Memory**: 8GB
- **Disk**: 120GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Docker Compose management UI
- **Uptime**: ~319 hours (~13 days)
#### VMID 110 - docker-pm4
- **Node**: pm4
- **CPU**: 4 cores
- **Memory**: 8GB
- **Disk**: 10GB (local-lvm)
- **Status**: Running
- **Tags**: community-script, docker
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Docker host
- **Uptime**: ~45 hours
#### VMID 113 - docker-pm2
- **Node**: pm2
- **CPU**: 4 cores
- **Memory**: 8GB
- **Disk**: 10GB (local-lvm)
- **Status**: Running
- **Tags**: community-script, docker
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Docker host
- **Uptime**: ~45 hours
- **Notes**: Currently empty/minimal usage
### Smart Home & IoT
#### VMID 101 - twingate
- **Node**: pm1
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 8GB (local-lvm)
- **Status**: Running
- **Features**: Unprivileged
- **Purpose**: Zero-trust network access
- **Uptime**: ~52 hours
#### VMID 102 - zwave-js-ui
- **Node**: pm1
- **CPU**: 2 cores
- **Memory**: 1GB
- **Disk**: 4GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Z-Wave device management
- **Uptime**: ~52 hours
### Surveillance & NVR
#### VMID 103 - shinobi
- **Node**: pm4
- **CPU**: 2 cores
- **Memory**: 2GB
- **Disk**: 8GB (local-lvm)
- **Status**: Running
- **Tags**: community-script, nvr
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Network Video Recorder
- **Uptime**: ~52 hours
- **Notes**: Very high network traffic (407GB in, 162GB out)
#### VMID 111 - frigate
- **Node**: pm3
- **CPU**: 4 cores
- **Memory**: 8GB
- **Disk**: 120GB (local-lvm)
- **Status**: Running
- **Tags**: proxmox-helper-scripts
- **Features**: Unprivileged, nesting enabled
- **Purpose**: NVR with object detection
- **Uptime**: ~18 hours
- **Notes**: High storage and network usage
### Gaming
#### VMID 112 - foundryvtt
- **Node**: pm3
- **CPU**: 4 cores
- **Memory**: 6GB
- **Disk**: 100GB (local-lvm)
- **Status**: Running
- **Features**: Unprivileged, nesting enabled
- **Purpose**: Virtual tabletop gaming platform
- **Uptime**: ~116 hours (~5 days)
## Summary Statistics
- **Total Containers**: 21 (2 VMs + 19 LXCs)
- **All Running**: Yes
- **Total CPU Allocation**: 62 cores
- **Total Memory Allocation**: 63.5GB
- **Primary Storage**: KavNas (NFS) for most LXCs
- **Most Active Node**: pm2 (11 containers)
- **Newest Deployments**: Media stack on pm2 (mostly < 2 hours uptime)

132
docs/network.md Normal file
View File

@@ -0,0 +1,132 @@
# Network Architecture
**Last Updated**: 2025-11-16
## Network Overview
- **Primary Network**: 10.4.2.0/24
- **Gateway**: 10.4.2.254
- **Bridge**: vmbr0 (standard on all nodes)
## Node Network Configuration
All Proxmox nodes use a similar network configuration:
- **Physical Interface**: eno1 (1Gbps Ethernet)
- **Bridge**: vmbr0 (Linux bridge)
- **Bridge Config**: STP off, forward delay 0
### Example Configuration (pm2)
```
auto vmbr0
iface vmbr0 inet static
address 10.4.2.6/24
gateway 10.4.2.254
bridge-ports eno1
bridge-stp off
bridge-fd 0
```
## IP Address Allocation
### Infrastructure Devices
| IP | Device | Type | Notes |
|---|---|---|---|
| 10.4.2.2 | pm1 | Proxmox Node | 4 cores, 16GB RAM |
| 10.4.2.3 | pm3 | Proxmox Node | 16 cores, 33GB RAM |
| 10.4.2.5 | pm4 | Proxmox Node | 12 cores, 31GB RAM |
| 10.4.2.6 | pm2 | Proxmox Node | 12 cores, 31GB RAM (primary mgmt) |
| 10.4.2.13 | KavNas | Synology DS918+ | Primary NFS storage |
| 10.4.2.14 | elantris | Proxmox Node | 16 cores, 128GB RAM, Storage node |
| 10.4.2.254 | Gateway | Router | Network gateway |
### Service IPs (LXC/VM)
#### Reverse Proxy & Auth
| IP | Service | VMID | Node | Purpose |
|---|---|---|---|---|
| 10.4.2.10 | traefik | 104 | pm2 | Reverse proxy |
| 10.4.2.23 | authelia | 116 | pm2 | Authentication |
#### Media Automation Stack
| IP | Service | VMID | Node | Purpose |
|---|---|---|---|---|
| 10.4.2.15 | sonarr | 105 | pm2 | TV show management |
| 10.4.2.16 | radarr | 108 | pm2 | Movie management |
| 10.4.2.17 | prowlarr | 114 | pm2 | Indexer manager |
| 10.4.2.18 | bazarr | 119 | pm2 | Subtitle management |
| 10.4.2.19 | whisparr | 117 | pm2 | Adult content management |
| 10.4.2.24 | notifiarr | 118 | pm2 | Notification service |
#### Media Servers
| IP | Service | VMID | Node | Purpose |
|---|---|---|---|---|
| 10.4.2.20 | jellyseerr | 115 | pm2 | Request management |
| 10.4.2.21 | kometa | 120 | pm2 | Metadata manager |
| 10.4.2.22 | jellyfin | 121 | elantris | Media server |
### Dynamic/DHCP Services
The following services currently use DHCP or don't have static IPs documented:
- VMID 100: haos12.1 (Home Assistant)
- VMID 101: twingate
- VMID 102: zwave-js-ui
- VMID 103: shinobi
- VMID 106: mqtt
- VMID 107: dockge
- VMID 109: docker-pm3
- VMID 110: docker-pm4
- VMID 111: frigate
- VMID 112: foundryvtt
- VMID 113: docker-pm2
## Reserved IP Ranges
**Recommendation**: Reserve IP ranges for different service types:
- `10.4.2.1-10.4.2.20`: Infrastructure and core services
- `10.4.2.21-10.4.2.50`: Media services
- `10.4.2.51-10.4.2.100`: Home automation and IoT
- `10.4.2.101-10.4.2.150`: General applications
- `10.4.2.151-10.4.2.200`: Testing and development
## NFS Mounts
### KavNas (10.4.2.13)
- **Source**: Synology DS918+ NAS
- **Mount**: Available on all Proxmox nodes
- **Capacity**: 23TB total
- **Usage**: ~9.2TB used
- **Purpose**: Primary shared storage for LXC rootfs, backups, ISOs, templates
- **Mount Point on Nodes**: `/mnt/pve/KavNas`
### elantris-downloads (10.4.2.14)
- **Source**: elantris node
- **Mount**: Available on all Proxmox nodes
- **Capacity**: 23TB total
- **Usage**: ~10.6TB used
- **Purpose**: Download storage, media staging
- **Mount Point on Nodes**: `/mnt/pve/elantris-downloads`
### elantris-media
- **Source**: elantris node
- **Mount**: Used by media services
- **Purpose**: Media library storage
- **Mounted in LXCs**: sonarr, radarr (mounted at `/media`)
## Firewall Notes
*TODO: Document firewall rules and port forwarding as configured*
## VLAN Configuration
Currently using a flat network (no VLANs configured). Consider implementing VLANs for:
- Management network (Proxmox nodes)
- Service network (LXC/VM services)
- IoT network (smart home devices)
- Storage network (NFS traffic)
## Future Network Improvements
- [ ] Implement VLANs for network segmentation
- [ ] Document all static IP assignments
- [ ] Set up monitoring for network traffic
- [ ] Consider 10GbE for storage traffic between nodes
- [ ] Implement proper DNS (currently using gateway)

178
docs/recyclarr-setup.md Normal file
View File

@@ -0,0 +1,178 @@
# Recyclarr Setup - TRaSH Guides Automation
**Last Updated**: 2025-11-16
## Overview
Recyclarr automatically syncs TRaSH Guides recommended custom formats and quality profiles to Radarr and Sonarr.
## Installation Details
- **LXC**: VMID 122 on pm2
- **IP Address**: 10.4.2.25
- **Binary**: `/usr/local/bin/recyclarr`
- **Config**: `/root/.config/recyclarr/recyclarr.yml`
## Configuration Summary
### Radarr (Movies)
- **URL**: http://10.4.2.16:7878
- **API Key**: 5e6796988abf4d6d819a2b506a44f422
- **Quality Profiles**:
- HD Bluray + WEB (1080p standard)
- Remux-1080p - Anime
- **Custom Formats**: 34 formats synced
- **Dolby Vision**: **BLOCKED** (DV w/o HDR fallback scored at -10000)
**Key Settings**:
- Standard profile prefers 1080p Bluray and WEB releases
- Anime profile includes Remux with merged quality groups
- Blocks Dolby Vision Profile 5 (no HDR fallback) on standard profile
- Blocks unwanted formats (BR-DISK, LQ, x265 HD, 3D, AV1, Extras)
- Uses TRaSH Guides release group tiers (BD, WEB, Anime BD, Anime WEB)
### Sonarr (TV Shows)
- **URL**: http://10.4.2.15:8989
- **API Key**: b331fe18ec2144148a41645d9ce8b249
- **Quality Profiles**:
- WEB-1080p (standard)
- Remux-1080p - Anime
- **Custom Formats**: 29 formats synced
- **Dolby Vision**: **BLOCKED** (DV w/o HDR fallback scored at -10000)
**Key Settings**:
- Standard profile prefers 1080p WEB releases (WEB-DL and WEBRip)
- Anime profile includes Bluray Remux with merged quality groups
- Blocks Dolby Vision Profile 5 (no HDR fallback) on standard profile
- Blocks unwanted formats (BR-DISK, LQ, x265 HD, AV1, Extras)
- Uses TRaSH Guides WEB release group tiers and Anime tiers
## Automated Sync Schedule
Recyclarr runs daily at 6:00 AM via cron:
```bash
0 6 * * * /usr/local/bin/recyclarr sync > /dev/null 2>&1
```
## Manual Sync
To manually trigger a sync:
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync"
```
## Dolby Vision Blocking
Both Radarr and Sonarr are configured to **completely block** Dolby Vision releases without HDR10 fallback (Profile 5). These releases will receive a score of **-10000**, ensuring they are never downloaded.
**What this blocks**:
- WEB-DL releases with Dolby Vision Profile 5 (no HDR10 fallback)
- Any release that only plays back in DV without falling back to HDR10
**What this allows**:
- HDR10 releases
- HDR10+ releases
- Dolby Vision Profile 7 with HDR10 fallback (from UHD Blu-ray)
## Custom Format Details
### Blocked Formats (Score: -10000)
- **DV (w/o HDR fallback)**: Blocks DV Profile 5
- **BR-DISK**: Blocks full BluRay disc images
- **LQ**: Blocks low-quality releases
- **x265 (HD)**: Blocks x265 encoded HD content (720p/1080p)
- **3D**: Blocks 3D releases
- **AV1**: Blocks AV1 codec
- **Extras**: Blocks extras, featurettes, etc.
### Preferred Formats
- **WEB Tier 01-03**: Scored 1600-1700 (high-quality WEB groups)
- **UHD Bluray Tier 01-03**: Scored 1700 (Radarr only)
- **Streaming Services**: Neutral score (AMZN, ATVP, DSNP, HBO, etc.)
- **Repack/Proper**: Scored 5-7 (prefers repacks over originals)
## Monitoring
Check Recyclarr logs:
```bash
ssh pm2 "pct exec 122 -- cat /root/.config/recyclarr/logs/recyclarr.log"
```
View last sync results:
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync --preview"
```
## Updating Configuration
1. Edit config: `ssh pm2 "pct exec 122 -- nano /root/.config/recyclarr/recyclarr.yml"`
2. Test config: `ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr config check"`
3. Run sync: `ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync"`
## Troubleshooting
### Check if sync is working
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync --preview"
```
### Verify API connectivity
```bash
# Test Radarr
curl -H "X-Api-Key: 5e6796988abf4d6d819a2b506a44f422" http://10.4.2.16:7878/api/v3/system/status
# Test Sonarr
curl -H "X-Api-Key: b331fe18ec2144148a41645d9ce8b249" http://10.4.2.15:8989/api/v3/system/status
```
### Force resync all custom formats
```bash
ssh pm2 "pct exec 122 -- /usr/local/bin/recyclarr sync --force"
```
## Important Notes
- **Do not modify custom format scores manually** in Radarr/Sonarr web UI - they will be overwritten on next sync
- **Quality profile changes** made in the web UI may be preserved unless they conflict with Recyclarr config
- **The DV blocking is automatic** - no manual intervention needed
- Recyclarr keeps custom formats up-to-date with TRaSH Guides automatically
## Next Steps
- Monitor downloads to ensure DV content is properly blocked
- Adjust quality profiles in Recyclarr config if needed (e.g., prefer 1080p over 4K)
- Review TRaSH Guides for additional custom formats: https://trash-guides.info/
## Anime Configuration
Both Radarr and Sonarr include a dedicated "Remux-1080p - Anime" quality profile for anime content.
**Key Anime Settings**:
- **Quality groups merged** per TRaSH Guides (Remux + Bluray + WEB + HDTV in combined groups)
- **Anime BD Tiers 01-08**: Scored 1300-1400 (SeaDex muxers, remuxes, fansubs, P2P, mini encodes)
- **Anime WEB Tiers 01-06**: Scored 150-350 (muxers, top fansubs, official subs)
- **Dual Audio preferred**: +101 score for releases with both Japanese and English audio
- **Unwanted blocked**: Same as standard profile (BR-DISK, LQ, x265 HD, AV1, Extras)
**Scoring Differences from Standard Profile**:
- Anime Web Tier 01 scores 350 (vs 1600 for standard WEB Tier 01)
- Emphasizes BD quality over WEB for anime (BD Tier 01 = 1400)
- Merged quality groups allow HDTV to be considered alongside WEB for anime releases
**To use anime profile**:
1. In Radarr/Sonarr, edit a movie or series
2. Change quality profile to "Remux-1080p - Anime"
3. Recyclarr will automatically manage custom format scores
## Inventory Update
Added to cluster inventory:
- **VMID**: 122
- **Name**: recyclarr
- **Node**: pm2
- **IP**: 10.4.2.25
- **CPU**: 1 core
- **Memory**: 512MB
- **Disk**: 2GB (KavNas)
- **Purpose**: TRaSH Guides automation for Radarr/Sonarr
- **Tags**: arr, community-script

222
docs/services.md Normal file
View File

@@ -0,0 +1,222 @@
# Service Mappings and Dependencies
**Last Updated**: 2025-11-16
## Service Categories
### Reverse Proxy & Authentication
#### Traefik (VMID 104)
- **Node**: pm2
- **IP**: 10.4.2.10
- **Port**: 80, 443
- **Purpose**: Reverse proxy and load balancer
- **Config Location**: *TODO: Document Traefik config location*
- **Dependencies**: None
- **Backends**: Routes traffic to all web services
#### Authelia (VMID 116)
- **Node**: pm2
- **IP**: 10.4.2.23
- **Purpose**: Single sign-on and authentication
- **Dependencies**: Traefik
- **Protected Services**: *TODO: Document which services require auth*
### Media Automation Stack
#### Prowlarr (VMID 114)
- **Node**: pm2
- **IP**: 10.4.2.17
- **Port**: 9696 (default)
- **Purpose**: Indexer manager for *arr services
- **Dependencies**: None
- **Integrated With**: Sonarr, Radarr, Whisparr
#### Sonarr (VMID 105)
- **Node**: pm2
- **IP**: 10.4.2.15
- **Port**: 8989 (default)
- **Purpose**: TV show automation
- **Dependencies**: Prowlarr
- **Mount Points**:
- `/media` - Media library
- `/mnt/kavnas` - Download staging
- **Integrated With**: Jellyfin, Jellyseerr, Bazarr
#### Radarr (VMID 108)
- **Node**: pm2
- **IP**: 10.4.2.16
- **Port**: 7878 (default)
- **Purpose**: Movie automation
- **Dependencies**: Prowlarr
- **Mount Points**:
- `/media` - Media library
- `/mnt/kavnas` - Download staging
- **Integrated With**: Jellyfin, Jellyseerr, Bazarr
#### Whisparr (VMID 117)
- **Node**: pm2
- **IP**: 10.4.2.19
- **Port**: 6969 (default)
- **Purpose**: Adult content automation
- **Dependencies**: Prowlarr
- **Integrated With**: Jellyfin
#### Bazarr (VMID 119)
- **Node**: pm2
- **IP**: 10.4.2.18
- **Port**: 6767 (default)
- **Purpose**: Subtitle automation
- **Dependencies**: Sonarr, Radarr
- **Integrated With**: Jellyfin
### Media Servers & Requests
#### Jellyfin (VMID 121)
- **Node**: elantris
- **IP**: 10.4.2.22
- **Port**: 8096 (default)
- **Purpose**: Media server
- **Dependencies**: None (reads media library)
- **Media Sources**: *TODO: Document media library paths*
- **Status**: Needs to be added to Traefik config
#### Jellyseerr (VMID 115)
- **Node**: pm2
- **IP**: 10.4.2.20
- **Port**: 5055 (default)
- **Purpose**: Media request management
- **Dependencies**: Jellyfin, Sonarr, Radarr
- **Integrated With**: Jellyfin (for library data)
#### Kometa (VMID 120)
- **Node**: pm2
- **IP**: 10.4.2.21
- **Purpose**: Automated metadata and collection management for Jellyfin
- **Dependencies**: Jellyfin
- **Run Mode**: Scheduled/automated (not web UI)
#### Notifiarr (VMID 118)
- **Node**: pm2
- **IP**: 10.4.2.24
- **Purpose**: Notification relay for *arr apps
- **Dependencies**: Sonarr, Radarr, Prowlarr, etc.
- **Notifications For**: Downloads, upgrades, errors
### Docker Hosts
#### dockge (VMID 107)
- **Node**: pm3
- **Purpose**: Docker Compose management web UI
- **Port**: 5001 (default)
- **Manages**: Docker containers across docker-pm2, docker-pm3, docker-pm4
- **Web UI**: Accessible via browser
#### docker-pm2 (VMID 113)
- **Node**: pm2
- **Purpose**: Docker host (currently empty/minimal)
- **Status**: Available for new containerized services
#### docker-pm3 (VMID 109)
- **Node**: pm3
- **Purpose**: Primary Docker host
- **Status**: Running containerized services (details TBD)
#### docker-pm4 (VMID 110)
- **Node**: pm4
- **Purpose**: Docker host
- **Status**: Running containerized services
### Smart Home & IoT
#### Home Assistant (VMID 100)
- **Node**: pm1
- **Purpose**: Home automation platform
- **Port**: 8123 (default)
- **Type**: Full VM (HAOS)
- **Integrations**: Z-Wave, MQTT, Twingate
#### Z-Wave JS UI (VMID 102)
- **Node**: pm1
- **Purpose**: Z-Wave device management
- **Port**: 8091 (default)
- **Dependencies**: USB Z-Wave stick
- **Integrated With**: Home Assistant
#### MQTT (VMID 106)
- **Node**: pm3
- **Port**: 1883 (MQTT), 9001 (WebSocket)
- **Purpose**: Message broker for IoT devices
- **Dependencies**: None
- **Clients**: Home Assistant, IoT devices
#### Twingate (VMID 101)
- **Node**: pm1
- **Purpose**: Zero-trust network access
- **Type**: VPN alternative
### Surveillance & NVR
#### Frigate (VMID 111)
- **Node**: pm3
- **Port**: 5000 (default)
- **Purpose**: NVR with AI object detection
- **Dependencies**: None
- **Storage**: High (120GB allocated)
- **Features**: Object detection, motion detection
- **Integrated With**: Home Assistant
#### Shinobi (VMID 103)
- **Node**: pm4
- **Port**: 8080 (default)
- **Purpose**: Network Video Recorder
- **Storage**: High network traffic (407GB in)
- **Status**: May be deprecated in favor of Frigate
### Gaming
#### FoundryVTT (VMID 112)
- **Node**: pm3
- **Port**: 30000 (default)
- **Purpose**: Virtual tabletop for RPG gaming
- **Storage**: 100GB (for assets, maps, modules)
- **Access**: Password protected
## Service Access URLs
*TODO: Document Traefik routes for each service*
Expected format:
- Jellyfin: https://jellyfin.yourdomain.com
- Sonarr: https://sonarr.yourdomain.com
- Radarr: https://radarr.yourdomain.com
- etc.
## Service Dependencies Map
```
Traefik (proxy)
├── Authelia (auth)
├── Jellyfin (media server)
├── Jellyseerr (requests) → Jellyfin, Sonarr, Radarr
├── Sonarr → Prowlarr, Bazarr
├── Radarr → Prowlarr, Bazarr
├── Whisparr → Prowlarr
├── Prowlarr (indexers)
├── Bazarr → Sonarr, Radarr
├── Home Assistant → MQTT, Z-Wave JS UI
├── Frigate → Home Assistant (optional)
└── FoundryVTT
```
## Migration Candidates (Docker → LXC)
Services currently in Docker that could be migrated to LXC:
- *TODO: Document after reviewing Docker container inventory*
## Service Maintenance Notes
- Most services auto-update or have update notifications
- Monitor Frigate storage usage (generates large video files)
- Dockge provides easy UI for managing Docker stacks
- *arr services should be updated together to maintain compatibility

184
docs/storage.md Normal file
View File

@@ -0,0 +1,184 @@
# Storage Architecture
**Last Updated**: 2025-11-16
## Storage Overview
The KavCorp cluster uses a multi-tiered storage approach:
1. **Local node storage**: For node-specific data, templates, ISOs
2. **NFS shared storage**: For LXC containers, backups, and shared data
3. **ZFS pools**: For high-performance storage on specific nodes
## Storage Pools
### Local Storage (Per-Node)
Each node has two local storage pools:
#### `local` - Directory Storage
- **Type**: Directory
- **Size**: ~100GB per node
- **Content Types**: backup, vztmpl (templates), iso
- **Location**: `/var/lib/vz`
- **Usage**: Node-specific backups, templates, ISO images
- **Shared**: No
**Per-Node Status**:
| Node | Used | Total | Available |
|---|---|---|---|
| pm1 | 10.1GB | 100.9GB | 90.8GB |
| pm2 | 8.0GB | 100.9GB | 92.9GB |
| pm3 | 6.9GB | 100.9GB | 94.0GB |
| pm4 | 7.5GB | 100.9GB | 93.4GB |
| elantris | 4.1GB | 100.9GB | 96.8GB |
#### `local-lvm` - LVM Thin Pool
- **Type**: LVM Thin
- **Size**: ~350-375GB per node (varies)
- **Content Types**: rootdir, images
- **Usage**: High-performance VM/LXC disks
- **Shared**: No
- **Best For**: Services requiring fast local storage
**Per-Node Status**:
| Node | Used | Total | Available |
|---|---|---|---|
| pm1 | 16.9GB | 374.5GB | 357.6GB |
| pm2 | 0GB | 374.5GB | 374.5GB |
| pm3 | 178.8GB | 362.8GB | 184.0GB |
| pm4 | 0GB | 374.5GB | 374.5GB |
| elantris | 0GB | 362.8GB | 362.8GB |
**Note**: pm3's local-lvm is heavily used (178.8GB) due to:
- VMID 107: dockge (120GB)
- VMID 111: frigate (120GB)
- VMID 112: foundryvtt (100GB)
### NFS Shared Storage
#### `KavNas` - Primary Shared Storage
- **Type**: NFS
- **Source**: 10.4.2.13 (Synology DS918+ NAS)
- **Size**: 23TB (23,029,958,311,936 bytes)
- **Used**: 9.2TB (9,241,738,215,424 bytes)
- **Available**: 13.8TB
- **Content Types**: snippets, iso, images, backup, rootdir, vztmpl
- **Shared**: Yes (available on all nodes)
- **Best For**:
- LXC container rootfs (most new containers use this)
- Backups
- ISO images
- Templates
- Data that needs to be accessible across nodes
**Current Usage**:
- Most LXC containers on pm2 use KavNas for rootfs
- Provides easy migration between nodes
- Centralized backup location
#### `elantris-downloads` - Download Storage
- **Type**: NFS
- **Source**: 10.4.2.14 (elantris node)
- **Size**: 23TB (23,116,582,486,016 bytes)
- **Used**: 10.6TB (10,630,966,804,480 bytes)
- **Available**: 12.5TB
- **Content Types**: rootdir, images
- **Shared**: Yes (available on all nodes)
- **Best For**:
- Download staging area
- Media downloads
- Large file operations
### ZFS Storage
#### `el-pool` - ZFS Pool (elantris)
- **Type**: ZFS
- **Node**: elantris only
- **Size**: 24TB (26,317,550,091,635 bytes)
- **Used**: 13.8TB (13,831,934,311,603 bytes)
- **Available**: 12.5TB
- **Content Types**: images, rootdir
- **Shared**: No (elantris only)
- **Best For**:
- High-performance storage on elantris
- Large data sets requiring ZFS features
- Services that benefit from compression/deduplication
**Current Usage**:
- VMID 121: jellyfin (16GB on el-pool)
**Status on Other Nodes**: Shows as "unknown" - ZFS pool is local to elantris only
## Storage Recommendations
### For New LXC Containers
**General Purpose Services** (web apps, APIs, small databases):
- **Storage**: `KavNas`
- **Disk Size**: 4-10GB
- **Rationale**: Shared, easy to migrate, automatically backed up
**High-Performance Services** (databases, caches):
- **Storage**: `local-lvm`
- **Disk Size**: As needed
- **Rationale**: Fast local SSD storage
**Large Storage Services** (media, file storage):
- **Storage**: `elantris-downloads` or `el-pool`
- **Disk Size**: As needed
- **Rationale**: Large capacity, optimized for bulk storage
### Mount Points for Media Services
Media-related LXCs typically mount:
```
mp0: /mnt/pve/elantris-media,mp=/media,ro=0
mp1: /mnt/pve/KavNas,mp=/mnt/kavnas
```
This provides:
- Access to media library via `/media`
- Access to NAS storage via `/mnt/kavnas`
## Storage Performance Notes
### Best Performance
1. `local-lvm` (local SSD on each node)
### Best Redundancy/Availability
1. `KavNas` (NAS with RAID, accessible from all nodes)
2. `elantris-downloads` (large capacity, shared)
### Best for Large Files
1. `el-pool` (ZFS on elantris, 24TB)
2. `elantris-downloads` (23TB NFS)
3. `KavNas` (23TB NFS)
## Backup Strategy
**Current Setup**:
- Backups stored on `KavNas` NFS share
- All nodes can write backups to KavNas
- Centralized backup location
**Recommendations**:
- [ ] Document automated backup schedules
- [ ] Implement off-site backup rotation
- [ ] Test restore procedures
- [ ] Monitor KavNas free space (currently 60% used)
## Storage Monitoring
**Watch These Metrics**:
- pm3 `local-lvm`: 49% used (178.8GB / 362.8GB)
- KavNas: 40% used (9.2TB / 23TB)
- elantris-downloads: 46% used (10.6TB / 23TB)
- el-pool: 53% used (13.8TB / 24TB)
## Future Storage Improvements
- [ ] Set up automated cleanup of old backups
- [ ] Implement storage quotas for LXC containers
- [ ] Consider SSD caching for NFS mounts
- [ ] Document backup retention policies
- [ ] Set up alerts for storage thresholds (80%, 90%)

131
docs/traefik-ssl-setup.md Normal file
View File

@@ -0,0 +1,131 @@
# Traefik SSL/TLS Setup with Namecheap
**Last Updated**: 2025-11-16
## Configuration Summary
Traefik is configured to use Let's Encrypt with DNS-01 challenge via Namecheap for wildcard SSL certificates.
### Environment Variables
Located in: `/etc/systemd/system/traefik.service.d/override.conf` (inside Traefik LXC 104)
```bash
NAMECHEAP_API_USER=kavren
NAMECHEAP_API_KEY=8156f3d9ef664c91b95f029dfbb62ad5
NAMECHEAP_PROPAGATION_TIMEOUT=3600 # 1 hour timeout for DNS propagation
NAMECHEAP_POLLING_INTERVAL=30 # Check every 30 seconds
NAMECHEAP_TTL=300 # 5 minute TTL for DNS records
```
### Traefik Configuration
File: `/etc/traefik/traefik.yaml`
```yaml
certificatesResolvers:
letsencrypt:
acme:
email: cory.bailey87@gmail.com
storage: /etc/traefik/ssl/acme.json
dnsChallenge:
provider: namecheap
resolvers:
- "1.1.1.1:53"
- "8.8.8.8:53"
```
### Wildcard Certificate
Configured for:
- Main domain: `kavcorp.com`
- Wildcard: `*.kavcorp.com`
## Namecheap API Requirements
1. **API Access Enabled**: Must have API access enabled in Namecheap account
2. **IP Whitelisting**: Public IP `99.74.188.161` must be whitelisted
3. **API Key**: Must have valid API key with DNS modification permissions
### Verifying API Access
Test Namecheap API from Traefik LXC:
```bash
pct exec 104 -- curl -s 'https://api.namecheap.com/xml.response?ApiUser=kavren&ApiKey=8156f3d9ef664c91b95f029dfbb62ad5&UserName=kavren&Command=namecheap.domains.getList&ClientIp=99.74.188.161'
```
## Existing Certificates
Valid Let's Encrypt certificates already obtained:
- `traefik.kavcorp.com`
- `sonarr.kavcorp.com`
- `radarr.kavcorp.com`
Stored in: `/etc/traefik/ssl/acme.json`
## Troubleshooting
### Common Issues
**DNS Propagation Timeout**:
- Error: "propagation: time limit exceeded"
- Solution: Increased `NAMECHEAP_PROPAGATION_TIMEOUT` to 3600 seconds (1 hour)
**API Authentication Failed**:
- Verify IP whitelisted: 99.74.188.161
- Verify API key is correct
- Check API access is enabled in Namecheap
**Deprecated Configuration Warning**:
- Fixed: Removed deprecated `delayBeforeCheck` option
- Now using default propagation settings controlled by environment variables
### Monitoring Certificate Generation
Check Traefik logs:
```bash
ssh pm2 "pct exec 104 -- tail -f /var/log/traefik/traefik.log"
```
Filter for ACME/certificate errors:
```bash
ssh pm2 "pct exec 104 -- cat /var/log/traefik/traefik.log | grep -i 'acme\|certificate\|error'"
```
### Manual Certificate Renewal
Certificates auto-renew. To force renewal:
```bash
# Delete acme.json and restart Traefik (will regenerate all certs)
ssh pm2 "pct exec 104 -- rm /etc/traefik/ssl/acme.json && systemctl restart traefik"
```
**WARNING**: Only do this if necessary, as Let's Encrypt has rate limits!
## Certificate Request Flow
1. New service added to `/etc/traefik/conf.d/*.yaml`
2. Traefik detects new route requiring HTTPS
3. Checks if certificate exists in acme.json
4. If not, initiates DNS-01 challenge:
- Creates TXT record via Namecheap API: `_acme-challenge.subdomain.kavcorp.com`
- Waits for DNS propagation (up to 1 hour)
- Polls DNS servers every 30 seconds
- Let's Encrypt verifies TXT record
- Certificate issued and stored in acme.json
5. Certificate served for HTTPS connections
## Next Steps
When adding new services:
1. Add route configuration to `/etc/traefik/conf.d/media-services.yaml` (or create new file)
2. Traefik will automatically request certificate on first HTTPS request
3. Monitor logs for any DNS propagation issues
4. Certificate will be cached and auto-renewed before expiration
## Notes
- Traefik v3.6.1 in use
- DNS-01 challenge allows wildcard certificates
- Certificates valid for 90 days, auto-renewed at 60 days
- Rate limit: 50 certificates per domain per week (Let's Encrypt)