# Architecture Decisions & Patterns > **Purpose**: Record of important decisions, patterns, and "why we do it this way" > **Update Frequency**: When making significant architectural choices ## Service Organization ### Authentication Strategy **Decision**: Services use their own built-in authentication, not Authelia **Reason**: Most *arr services and media tools have robust auth systems **Exception**: Consider Authelia for future services that lack authentication ### LXC vs Docker **Keep in Docker**: - NZBGet (requires specific volume mapping, works well in Docker) - Multi-container stacks - Services requiring Docker-specific features **Migrate to LXC**: - Single-purpose services (Sonarr, Radarr, etc.) - Services benefiting from isolation - Stateless applications ## File Permissions ### Media Files **Standard**: All media files and folders must be 777 **Reason**: - NFS mounts between multiple systems with different UID mappings - Jellyfin runs in LXC with UID namespace mapping (100107) - Sonarr runs in LXC with different UID mapping - NZBGet runs in Docker with UID 1000 **Implementation**: - NZBGet: `UMask=0000` to create files with 777 - Sonarr: Media management → Set permissions → chmod 777 - Manual fixes: `chmod -R 777` on media directories as needed ## Network Architecture ### VLAN Strategy (Planned) **Decision**: Segment network into 4 VLANs **See**: [NETWORK-UPGRADE-PLAN.md](NETWORK-UPGRADE-PLAN.md) | VLAN | Name | Subnet | Purpose | |------|------|--------|---------| | 1 | Default | 10.4.2.0/24 | Management, trusted PCs, Proxmox hosts | | 10 | Servers | 10.4.10.0/24 | Server containers, NAS | | 20 | IoT | 10.4.20.0/24 | Cameras, smart home, Home Assistant | | 30 | Guest | 10.4.30.0/24 | Guest WiFi, isolated | **VLAN Tagging Methods**: - WiFi: UniFi APs (SSID → VLAN mapping) - Cameras: GS308EP (port-based VLAN) - Containers: Proxmox (bridge VLAN tag) - Wired PCs: Untagged (VLAN 1 via unmanaged switches) ### Router/Firewall (Planned) **Decision**: OPNsense VM on Elantris **Reason**: - Free, full-featured firewall/router - VLAN routing and inter-VLAN firewall rules - IDS/IPS capability - Elantris has ample resources (128GB RAM) **Alternative Considered**: Ubiquiti Dream Machine - Rejected due to cost and ecosystem lock-in - OPNsense more flexible for homelab ### 10G Backhaul (Planned) **Decision**: 10G RJ45 between server closet and basement **Hardware**: 2× GiGaPlus 6-Port 10G PoE switches ($101 each) **Why GiGaPlus over UniFi**: - Native 10G RJ45 (no SFP+ transceivers needed) - Includes PoE for APs - $202 total vs $800+ for UniFi equivalent - Cat6 can handle 10G at house distances (<55m) ### WiFi (Planned) **Decision**: UniFi APs with mixed models **Hardware**: - 1× U6 Enterprise (existing) - server closet/upstairs - 2× U7 Pro ($189 each) - basement + main floor **Why UniFi**: - Multiple SSIDs mapped to VLANs - Seamless roaming between APs - Centralized management via controller - Better than Asus mesh for VLAN support **Controller**: LXC on Proxmox (free) via community helper script ### Reverse Proxy **Decision**: Single Traefik instance handles all external access **Location**: LXC 104 on pm2 **Benefits**: - Single point for SSL/TLS management - Automatic Let's Encrypt certificate renewal - Centralized routing configuration - DNS-01 challenge for wildcard certificates ### Service Domains **Pattern**: `.kavcorp.com` **DNS**: All subdomains point to public IP (99.74.188.161) **Routing**: Traefik inspects Host header and routes internally ## Storage Architecture ### Media Storage **Decision**: NFS mount from elantris for all media **Path**: `/mnt/pve/elantris-media` → elantris `/el-pool/media` **Reason**: - Centralized storage - Accessible from all cluster nodes - Large capacity (24TB ZFS pool) - Easy to backup/snapshot ### LXC Root Filesystems **Decision**: Store on KavNas NFS for most services **Reason**: - Easy backups - Portable between nodes - Network storage sufficient for most workloads **Exception**: High I/O services use local-lvm ## Monitoring & Maintenance ### Configuration Management **Decision**: Manual configuration with documentation **Reason**: Small scale doesn't justify Ansible/Terraform complexity **Trade-off**: Requires disciplined documentation updates ### Backup Strategy **Decision**: Proxmox built-in backup to KavNas **Frequency**: [To be determined] **Retention**: [To be determined] ## Common Patterns ### Adding a New Service Behind Traefik 1. Deploy service with static IP in 10.4.2.0/24 range 2. Create Traefik config in `/etc/traefik/conf.d/.yaml` 3. Use pattern: ```yaml http: routers: : rule: "Host(`.kavcorp.com`)" entryPoints: [websecure] service: tls: certResolver: letsencrypt services: : loadBalancer: servers: - url: "http://:" ``` 4. Traefik auto-reloads (no restart needed) 5. Update `docs/INFRASTRUCTURE.md` with service details ### Troubleshooting Permission Issues 1. Check file ownership: `ls -la /path/to/file` 2. Check if 777: `stat /path/to/file` 3. Fix permissions: `chmod -R 777 /path/to/directory` 4. For NZBGet: Verify `UMask=0000` in nzbget.conf 5. For Sonarr/Radarr: Check Settings → Media Management → Set Permissions ### Node SSH Access **From local machine**: - User: `kavren` - Key: `~/.ssh/id_ed25519` **Between cluster nodes**: - User: `root` - Each node has other nodes' keys in `/root/.ssh/authorized_keys` - Proxmox web UI uses node SSH for shell access ## Known Issues & Workarounds ### Jellyfin Not Seeing Media After Import **Symptom**: Files imported to `/media/tv` but Jellyfin shows empty **Cause**: Jellyfin LXC mount not active or permissions wrong **Fix**: 1. Restart Jellyfin LXC: `pct stop 121 && pct start 121` 2. Verify mount inside LXC: `pct exec 121 -- ls -la /media/tv/` 3. Fix permissions if needed: `chmod -R 777 /mnt/pve/elantris-media/tv/` ### Sonarr/Radarr Import Failures **Symptom**: "Access denied" errors in logs **Cause**: Permission mismatch between download client and *arr service **Fix**: Ensure download folder has 777 permissions ## Future Considerations - [ ] Automated backup strategy - [ ] Monitoring/alerting system (Prometheus + Grafana?) - [ ] Consider Authelia for future services without built-in auth - [ ] Document disaster recovery procedures - [ ] Consider consolidating Docker hosts