Files
proxmox-infra/docs/DECISIONS.md
kavren 120c2ec809 Initial commit: KavCorp infrastructure documentation
- CLAUDE.md: Project configuration for Claude Code
- docs/: Infrastructure documentation
  - INFRASTRUCTURE.md: Service map, storage, network
  - CONFIGURATIONS.md: Service configs and credentials
  - CHANGELOG.md: Change history
  - DECISIONS.md: Architecture decisions
  - TASKS.md: Task tracking
- scripts/: Automation scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:07:01 -05:00

4.6 KiB

Architecture Decisions & Patterns

Purpose: Record of important decisions, patterns, and "why we do it this way" Update Frequency: When making significant architectural choices

Service Organization

Authentication Strategy

Decision: Services use their own built-in authentication, not Authelia Reason: Most *arr services and media tools have robust auth systems Exception: Consider Authelia for future services that lack authentication

LXC vs Docker

Keep in Docker:

  • NZBGet (requires specific volume mapping, works well in Docker)
  • Multi-container stacks
  • Services requiring Docker-specific features

Migrate to LXC:

  • Single-purpose services (Sonarr, Radarr, etc.)
  • Services benefiting from isolation
  • Stateless applications

File Permissions

Media Files

Standard: All media files and folders must be 777 Reason:

  • NFS mounts between multiple systems with different UID mappings
  • Jellyfin runs in LXC with UID namespace mapping (100107)
  • Sonarr runs in LXC with different UID mapping
  • NZBGet runs in Docker with UID 1000

Implementation:

  • NZBGet: UMask=0000 to create files with 777
  • Sonarr: Media management → Set permissions → chmod 777
  • Manual fixes: chmod -R 777 on media directories as needed

Network Architecture

Reverse Proxy

Decision: Single Traefik instance handles all external access Location: LXC 104 on pm2 Benefits:

  • Single point for SSL/TLS management
  • Automatic Let's Encrypt certificate renewal
  • Centralized routing configuration
  • DNS-01 challenge for wildcard certificates

Service Domains

Pattern: <service>.kavcorp.com DNS: All subdomains point to public IP (99.74.188.161) Routing: Traefik inspects Host header and routes internally

Storage Architecture

Media Storage

Decision: NFS mount from elantris for all media Path: /mnt/pve/elantris-media → elantris /el-pool/media Reason:

  • Centralized storage
  • Accessible from all cluster nodes
  • Large capacity (24TB ZFS pool)
  • Easy to backup/snapshot

LXC Root Filesystems

Decision: Store on KavNas NFS for most services Reason:

  • Easy backups
  • Portable between nodes
  • Network storage sufficient for most workloads

Exception: High I/O services use local-lvm

Monitoring & Maintenance

Configuration Management

Decision: Manual configuration with documentation Reason: Small scale doesn't justify Ansible/Terraform complexity Trade-off: Requires disciplined documentation updates

Backup Strategy

Decision: Proxmox built-in backup to KavNas Frequency: [To be determined] Retention: [To be determined]

Common Patterns

Adding a New Service Behind Traefik

  1. Deploy service with static IP in 10.4.2.0/24 range
  2. Create Traefik config in /etc/traefik/conf.d/<service>.yaml
  3. Use pattern:
    http:
      routers:
        <service>:
          rule: "Host(`<service>.kavcorp.com`)"
          entryPoints: [websecure]
          service: <service>
          tls:
            certResolver: letsencrypt
      services:
        <service>:
          loadBalancer:
            servers:
              - url: "http://<ip>:<port>"
    
  4. Traefik auto-reloads (no restart needed)
  5. Update docs/INFRASTRUCTURE.md with service details

Troubleshooting Permission Issues

  1. Check file ownership: ls -la /path/to/file
  2. Check if 777: stat /path/to/file
  3. Fix permissions: chmod -R 777 /path/to/directory
  4. For NZBGet: Verify UMask=0000 in nzbget.conf
  5. For Sonarr/Radarr: Check Settings → Media Management → Set Permissions

Node SSH Access

From local machine:

  • User: kavren
  • Key: ~/.ssh/id_ed25519

Between cluster nodes:

  • User: root
  • Each node has other nodes' keys in /root/.ssh/authorized_keys
  • Proxmox web UI uses node SSH for shell access

Known Issues & Workarounds

Jellyfin Not Seeing Media After Import

Symptom: Files imported to /media/tv but Jellyfin shows empty Cause: Jellyfin LXC mount not active or permissions wrong Fix:

  1. Restart Jellyfin LXC: pct stop 121 && pct start 121
  2. Verify mount inside LXC: pct exec 121 -- ls -la /media/tv/
  3. Fix permissions if needed: chmod -R 777 /mnt/pve/elantris-media/tv/

Sonarr/Radarr Import Failures

Symptom: "Access denied" errors in logs Cause: Permission mismatch between download client and *arr service Fix: Ensure download folder has 777 permissions

Future Considerations

  • Automated backup strategy
  • Monitoring/alerting system (Prometheus + Grafana?)
  • Consider Authelia for future services without built-in auth
  • Document disaster recovery procedures
  • Consider consolidating Docker hosts