# Architecture Decisions & Patterns > **Purpose**: Record of important decisions, patterns, and "why we do it this way" > **Update Frequency**: When making significant architectural choices ## Service Organization ### Authentication Strategy **Decision**: Services use their own built-in authentication, not Authelia **Reason**: Most *arr services and media tools have robust auth systems **Exception**: Consider Authelia for future services that lack authentication ### LXC vs Docker **Keep in Docker**: - NZBGet (requires specific volume mapping, works well in Docker) - Multi-container stacks - Services requiring Docker-specific features **Migrate to LXC**: - Single-purpose services (Sonarr, Radarr, etc.) - Services benefiting from isolation - Stateless applications ## File Permissions ### Media Files **Standard**: All media files and folders must be 777 **Reason**: - NFS mounts between multiple systems with different UID mappings - Jellyfin runs in LXC with UID namespace mapping (100107) - Sonarr runs in LXC with different UID mapping - NZBGet runs in Docker with UID 1000 **Implementation**: - NZBGet: `UMask=0000` to create files with 777 - Sonarr: Media management → Set permissions → chmod 777 - Manual fixes: `chmod -R 777` on media directories as needed ## Network Architecture ### Network Isolation Strategy **Goal**: Isolate IoT (KavCorp-IOT) and Guest (KavCorp-Guest) WiFi networks from the main LAN, while allowing Smart Home VMs to access IoT devices. #### Constraint: Unmanaged Gigabyte Switches The Gigabyte 10G switches provide 10G backhaul and 2.5G PoE to UniFi APs, but they are **unmanaged** and don't support VLAN tagging. This means VLAN tags from UniFi APs are stripped when traffic passes through. **Workaround**: DHCP-based isolation (L3 firewall rules instead of L2 VLANs) #### IP Subnet Scheme | Subnet | Range | Purpose | DHCP Source | |--------|-------|---------|-------------| | Main LAN | 10.4.2.0/24 | Trusted devices, Proxmox hosts, services | OPNsense | | IoT | 10.4.10.0/24 | KavCorp-IOT SSID devices | OPNsense or UniFi | | Guest | 10.4.20.0/24 | KavCorp-Guest SSID devices | OPNsense or UniFi | #### OPNsense Firewall Rules (Planned) | Source | Destination | Action | Notes | |--------|-------------|--------|-------| | 10.4.10.0/24 (IoT) | 10.4.2.0/24 (LAN) | **Block** | Isolate IoT from LAN | | 10.4.20.0/24 (Guest) | 10.4.2.0/24 (LAN) | **Block** | Isolate Guest from LAN | | 10.4.20.0/24 (Guest) | 10.4.10.0/24 (IoT) | **Block** | Isolate Guest from IoT | | Smart Home VMs | 10.4.10.0/24 (IoT) | **Allow** | Home Assistant → IoT devices | | 10.4.10.0/24 (IoT) | Internet | **Allow** | IoT internet access | | 10.4.20.0/24 (Guest) | Internet | **Allow** | Guest internet access | #### Limitations of DHCP Workaround - **Not true L2 isolation**: All traffic on same broadcast domain - **IP spoofing possible**: Malicious device could use LAN IP range - **Sufficient for**: IoT devices and guests (low threat actors) - **Future upgrade**: Replace Gigabyte switches with managed 2.5G PoE switches for proper VLANs #### VLAN IDs (For Future Reference) | VLAN | Name | Subnet | Purpose | |------|------|--------|---------| | 1 | Default | 10.4.2.0/24 | Management, trusted PCs, Proxmox hosts | | 10 | IoT | 10.4.10.0/24 | IoT devices, cameras, smart home | | 20 | Guest | 10.4.20.0/24 | Guest WiFi, isolated | ### Router/Firewall **Decision**: OPNsense VM 130 on pm4 (server closet) **Status**: Deployed, pending WAN cutover **Reason**: - Free, full-featured firewall/router - Inter-subnet firewall rules for IoT/Guest isolation - IDS/IPS capability - pm4 is in server closet next to AT&T modem (avoids routing WAN over backhaul) **Network Interfaces (VM 130)**: | Interface | Bridge | Purpose | Status | |-----------|--------|---------|--------| | net0 | vmbr0 | LAN (10.4.2.0/24) | Configured | | net1 | vmbr1 | WAN (to AT&T modem) | Configured | **pm4 Bridge Configuration**: | Bridge | Physical NIC | Purpose | |--------|--------------|---------| | vmbr0 | eno1 (Intel I226-V) | LAN - all VMs/LXCs | | vmbr1 | enx6c1ff76e4d47 (USB 2.5G) | WAN - OPNsense only | **HA/Failover Consideration**: - Current: Single OPNsense on pm4 (SPOF) - Future options: 1. OPNsense HA with CARP (requires second USB NIC on another node) 2. Keep current router as cold standby (swap cables if pm4 fails) **Alternative Considered**: Ubiquiti Dream Machine - Rejected due to cost and ecosystem lock-in - OPNsense more flexible for homelab **Alternative Considered**: OPNsense on Elantris (basement) - Rejected because WAN would need to traverse 10G backhaul - Would require managed switches for WAN VLAN isolation ### 10G Backhaul (Planned) **Decision**: 10G RJ45 between server closet and basement **Hardware**: 2× GiGaPlus 6-Port 10G PoE switches ($101 each) **Why GiGaPlus over UniFi**: - Native 10G RJ45 (no SFP+ transceivers needed) - Includes PoE for APs - $202 total vs $800+ for UniFi equivalent - Cat6 can handle 10G at house distances (<55m) ### WiFi (Planned) **Decision**: UniFi APs with mixed models **Hardware**: - 1× U6 Enterprise (existing) - server closet/upstairs - 2× U7 Pro ($189 each) - basement + main floor **Why UniFi**: - Multiple SSIDs mapped to VLANs - Seamless roaming between APs - Centralized management via controller - Better than Asus mesh for VLAN support **Controller**: LXC on Proxmox (free) via community helper script ### Reverse Proxy **Decision**: Single Traefik instance handles all external access **Location**: LXC 104 on pm2 **Benefits**: - Single point for SSL/TLS management - Automatic Let's Encrypt certificate renewal - Centralized routing configuration - DNS-01 challenge for wildcard certificates ### Service Domains **Pattern**: `.kavcorp.com` **DNS**: All subdomains point to public IP (99.74.188.161) **Routing**: Traefik inspects Host header and routes internally ## Storage Architecture ### Media Storage **Decision**: NFS mount from elantris for all media **Path**: `/mnt/pve/elantris-media` → elantris `/el-pool/media` **Reason**: - Centralized storage - Accessible from all cluster nodes - Large capacity (24TB ZFS pool) - Easy to backup/snapshot ### LXC Root Filesystems **Decision**: Store on KavNas NFS for most services **Reason**: - Easy backups - Portable between nodes - Network storage sufficient for most workloads **Exception**: High I/O services use local-lvm ## Monitoring & Maintenance ### Configuration Management **Decision**: Manual configuration with documentation **Reason**: Small scale doesn't justify Ansible/Terraform complexity **Trade-off**: Requires disciplined documentation updates ### Backup Strategy **Decision**: Proxmox built-in backup to KavNas **Frequency**: [To be determined] **Retention**: [To be determined] ## Common Patterns ### Adding a New Service Behind Traefik 1. Deploy service with static IP in 10.4.2.0/24 range 2. Create Traefik config in `/etc/traefik/conf.d/.yaml` 3. Use pattern: ```yaml http: routers: : rule: "Host(`.kavcorp.com`)" entryPoints: [websecure] service: tls: certResolver: letsencrypt services: : loadBalancer: servers: - url: "http://:" ``` 4. Traefik auto-reloads (no restart needed) 5. Update `docs/INFRASTRUCTURE.md` with service details ### Troubleshooting Permission Issues 1. Check file ownership: `ls -la /path/to/file` 2. Check if 777: `stat /path/to/file` 3. Fix permissions: `chmod -R 777 /path/to/directory` 4. For NZBGet: Verify `UMask=0000` in nzbget.conf 5. For Sonarr/Radarr: Check Settings → Media Management → Set Permissions ### Node SSH Access **From local machine**: - User: `kavren` - Key: `~/.ssh/id_ed25519` **Between cluster nodes**: - User: `root` - Each node has other nodes' keys in `/root/.ssh/authorized_keys` - Proxmox web UI uses node SSH for shell access ## Known Issues & Workarounds ### Jellyfin Not Seeing Media After Import **Symptom**: Files imported to `/media/tv` but Jellyfin shows empty **Cause**: Jellyfin LXC mount not active or permissions wrong **Fix**: 1. Restart Jellyfin LXC: `pct stop 121 && pct start 121` 2. Verify mount inside LXC: `pct exec 121 -- ls -la /media/tv/` 3. Fix permissions if needed: `chmod -R 777 /mnt/pve/elantris-media/tv/` ### Sonarr/Radarr Import Failures **Symptom**: "Access denied" errors in logs **Cause**: Permission mismatch between download client and *arr service **Fix**: Ensure download folder has 777 permissions ## Future Considerations - [ ] Automated backup strategy - [ ] Monitoring/alerting system (Prometheus + Grafana?) - [ ] Consider Authelia for future services without built-in auth - [ ] Document disaster recovery procedures - [ ] Consider consolidating Docker hosts