Files
proxmox-infra/docs/DECISIONS.md
kavren 3674bcc147 docs: Update network plan - OPNsense on pm4 with USB NIC
- OPNsense moves to pm4 (server closet, next to AT&T modem)
- USB 2.5G NIC for WAN (~$25), Intel I226-V for LAN
- pm4 has USB 3.1 (10Gbps) - verified
- Updated topology diagram with pm4/OPNsense placement
- Total cost now ~$605

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 12:41:38 -05:00

6.8 KiB
Raw Blame History

Architecture Decisions & Patterns

Purpose: Record of important decisions, patterns, and "why we do it this way" Update Frequency: When making significant architectural choices

Service Organization

Authentication Strategy

Decision: Services use their own built-in authentication, not Authelia Reason: Most *arr services and media tools have robust auth systems Exception: Consider Authelia for future services that lack authentication

LXC vs Docker

Keep in Docker:

  • NZBGet (requires specific volume mapping, works well in Docker)
  • Multi-container stacks
  • Services requiring Docker-specific features

Migrate to LXC:

  • Single-purpose services (Sonarr, Radarr, etc.)
  • Services benefiting from isolation
  • Stateless applications

File Permissions

Media Files

Standard: All media files and folders must be 777 Reason:

  • NFS mounts between multiple systems with different UID mappings
  • Jellyfin runs in LXC with UID namespace mapping (100107)
  • Sonarr runs in LXC with different UID mapping
  • NZBGet runs in Docker with UID 1000

Implementation:

  • NZBGet: UMask=0000 to create files with 777
  • Sonarr: Media management → Set permissions → chmod 777
  • Manual fixes: chmod -R 777 on media directories as needed

Network Architecture

VLAN Strategy (Planned)

Decision: Segment network into 4 VLANs See: NETWORK-UPGRADE-PLAN.md

VLAN Name Subnet Purpose
1 Default 10.4.2.0/24 Management, trusted PCs, Proxmox hosts
10 Servers 10.4.10.0/24 Server containers, NAS
20 IoT 10.4.20.0/24 Cameras, smart home, Home Assistant
30 Guest 10.4.30.0/24 Guest WiFi, isolated

VLAN Tagging Methods:

  • WiFi: UniFi APs (SSID → VLAN mapping)
  • Cameras: GS308EP (port-based VLAN)
  • Containers: Proxmox (bridge VLAN tag)
  • Wired PCs: Untagged (VLAN 1 via unmanaged switches)

Router/Firewall (Planned)

Decision: OPNsense VM on pm4 (server closet) Reason:

  • Free, full-featured firewall/router
  • VLAN routing and inter-VLAN firewall rules
  • IDS/IPS capability
  • pm4 is in server closet next to AT&T modem (avoids routing WAN over backhaul)
  • pm4 has Intel I226-V (2.5G) + USB 3.1 for second NIC

Network Interfaces:

  • WAN: USB 2.5G NIC (~$25) → AT&T modem
  • LAN: Intel I226-V → GiGaPlus switch (VLAN trunk)

Alternative Considered: Ubiquiti Dream Machine

  • Rejected due to cost and ecosystem lock-in
  • OPNsense more flexible for homelab

Alternative Considered: OPNsense on Elantris (basement)

  • Rejected because WAN would need to traverse 10G backhaul
  • Would require managed switches for WAN VLAN isolation

10G Backhaul (Planned)

Decision: 10G RJ45 between server closet and basement Hardware: 2× GiGaPlus 6-Port 10G PoE switches ($101 each) Why GiGaPlus over UniFi:

  • Native 10G RJ45 (no SFP+ transceivers needed)
  • Includes PoE for APs
  • $202 total vs $800+ for UniFi equivalent
  • Cat6 can handle 10G at house distances (<55m)

WiFi (Planned)

Decision: UniFi APs with mixed models Hardware:

  • 1× U6 Enterprise (existing) - server closet/upstairs
  • 2× U7 Pro ($189 each) - basement + main floor

Why UniFi:

  • Multiple SSIDs mapped to VLANs
  • Seamless roaming between APs
  • Centralized management via controller
  • Better than Asus mesh for VLAN support

Controller: LXC on Proxmox (free) via community helper script

Reverse Proxy

Decision: Single Traefik instance handles all external access Location: LXC 104 on pm2 Benefits:

  • Single point for SSL/TLS management
  • Automatic Let's Encrypt certificate renewal
  • Centralized routing configuration
  • DNS-01 challenge for wildcard certificates

Service Domains

Pattern: <service>.kavcorp.com DNS: All subdomains point to public IP (99.74.188.161) Routing: Traefik inspects Host header and routes internally

Storage Architecture

Media Storage

Decision: NFS mount from elantris for all media Path: /mnt/pve/elantris-media → elantris /el-pool/media Reason:

  • Centralized storage
  • Accessible from all cluster nodes
  • Large capacity (24TB ZFS pool)
  • Easy to backup/snapshot

LXC Root Filesystems

Decision: Store on KavNas NFS for most services Reason:

  • Easy backups
  • Portable between nodes
  • Network storage sufficient for most workloads

Exception: High I/O services use local-lvm

Monitoring & Maintenance

Configuration Management

Decision: Manual configuration with documentation Reason: Small scale doesn't justify Ansible/Terraform complexity Trade-off: Requires disciplined documentation updates

Backup Strategy

Decision: Proxmox built-in backup to KavNas Frequency: [To be determined] Retention: [To be determined]

Common Patterns

Adding a New Service Behind Traefik

  1. Deploy service with static IP in 10.4.2.0/24 range
  2. Create Traefik config in /etc/traefik/conf.d/<service>.yaml
  3. Use pattern:
    http:
      routers:
        <service>:
          rule: "Host(`<service>.kavcorp.com`)"
          entryPoints: [websecure]
          service: <service>
          tls:
            certResolver: letsencrypt
      services:
        <service>:
          loadBalancer:
            servers:
              - url: "http://<ip>:<port>"
    
  4. Traefik auto-reloads (no restart needed)
  5. Update docs/INFRASTRUCTURE.md with service details

Troubleshooting Permission Issues

  1. Check file ownership: ls -la /path/to/file
  2. Check if 777: stat /path/to/file
  3. Fix permissions: chmod -R 777 /path/to/directory
  4. For NZBGet: Verify UMask=0000 in nzbget.conf
  5. For Sonarr/Radarr: Check Settings → Media Management → Set Permissions

Node SSH Access

From local machine:

  • User: kavren
  • Key: ~/.ssh/id_ed25519

Between cluster nodes:

  • User: root
  • Each node has other nodes' keys in /root/.ssh/authorized_keys
  • Proxmox web UI uses node SSH for shell access

Known Issues & Workarounds

Jellyfin Not Seeing Media After Import

Symptom: Files imported to /media/tv but Jellyfin shows empty Cause: Jellyfin LXC mount not active or permissions wrong Fix:

  1. Restart Jellyfin LXC: pct stop 121 && pct start 121
  2. Verify mount inside LXC: pct exec 121 -- ls -la /media/tv/
  3. Fix permissions if needed: chmod -R 777 /mnt/pve/elantris-media/tv/

Sonarr/Radarr Import Failures

Symptom: "Access denied" errors in logs Cause: Permission mismatch between download client and *arr service Fix: Ensure download folder has 777 permissions

Future Considerations

  • Automated backup strategy
  • Monitoring/alerting system (Prometheus + Grafana?)
  • Consider Authelia for future services without built-in auth
  • Document disaster recovery procedures
  • Consider consolidating Docker hosts