Files
proxmox-infra/docs/CHANGELOG.md
kavren ae071a5064 docs: VLAN isolation working, OPNsense WAN cutover complete
- Updated INFRASTRUCTURE.md with VLAN traffic path and required configs
- Updated CHANGELOG.md with WAN cutover and VLAN troubleshooting fixes
- Updated TASKS.md to reflect completed network work
- pm4 bridge VLAN config made persistent via post-up commands
- Pi-hole listeningMode changed to ALL for multi-subnet DNS

Key fixes:
- pm4 vmbr0 bridge-vlan-aware with VLANs 10,20,30 on eno1
- Pi-hole veth added to VLANs for routed traffic
- Pi-hole gateway set to OPNsense (10.4.2.1)
- OPNsense default route fixed to use WAN gateway

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 22:29:19 -05:00

13 KiB
Raw Blame History

Changelog

Purpose: Historical record of all significant infrastructure changes

2025-12-21

Traefik Updates

  • UniFi Controller: Added Traefik route

  • OPNsense: Added Traefik route

    • Domain: opnsense.kavcorp.com
    • Backend: https://10.4.2.1
    • Config: /etc/traefik/conf.d/opnsense.yaml
  • Traefik LXC 104: Resized rootfs from 2GB to 4GB (was filling up repeatedly)

OPNsense WAN Configuration

  • pm4 vmbr1: Created new bridge for OPNsense WAN interface

    • Physical NIC: enx6c1ff76e4d47 (USB 2.5G adapter)
    • Added to /etc/network/interfaces on pm4
    • Bridge is UP and connected to switch
  • OPNsense VM 130: Added second network interface

    • net0: vmbr0 (LAN - 10.4.2.0/24)
    • net1: vmbr1 (WAN - to AT&T modem)
    • Ready for WAN cutover when AT&T modem is connected

OPNsense VLAN Configuration (Implemented)

  • VLANs Created on vtnet0 (LAN interface):

    • VLAN 10 (vlan01): Trusted network - 10.4.10.0/24
    • VLAN 20 (vlan02): IoT network - 10.4.20.0/24
    • VLAN 30 (vlan03): Guest network - 10.4.30.0/24
  • VLAN Interfaces Configured:

    • vlan01: 10.4.10.1/24 (gateway for Trusted)
    • vlan02: 10.4.20.1/24 (gateway for IoT)
    • vlan03: 10.4.30.1/24 (gateway for Guest)
  • DHCP Configured on all interfaces:

    • LAN: 10.4.2.100-200, DNS: 10.4.2.129 (Pi-hole)
    • Trusted: 10.4.10.100-200
    • IoT: 10.4.20.100-200
    • Guest: 10.4.30.100-200
  • Firewall Rules Implemented:

    • Allow DNS: IoT/Guest → 10.4.2.129:53 (Pi-hole)
    • Block IoT → LAN: 10.4.20.0/24 → 10.4.2.0/24
    • Block Guest → LAN: 10.4.30.0/24 → 10.4.2.0/24
    • Block Guest → IoT: 10.4.30.0/24 → 10.4.20.0/24
    • Allow Home Assistant → IoT: 10.4.2.62 → 10.4.20.0/24
    • Allow IoT/Guest → Internet
  • Note: Unmanaged Gigabyte switches pass VLAN tags through (they just don't understand them). UniFi APs tag traffic per SSID, OPNsense receives tagged traffic on VLAN interfaces.

  • Documentation Updated:

    • DECISIONS.md: Complete VLAN architecture and firewall rules
    • INFRASTRUCTURE.md: VLANs and subnets table, pm4 bridges

OPNsense WAN Cutover (Completed)

  • Connected USB NIC (vmbr1) to AT&T modem
  • WAN IP: 192.168.1.183 (DHCP from AT&T gateway 192.168.1.254)
  • Fixed default route to use WAN gateway instead of Asus
  • Internet working through OPNsense

VLAN Troubleshooting & Fixes

  • pm4 vmbr0: Added bridge-vlan-aware yes to enable VLAN filtering
  • Bridge VLAN Memberships: Added VLANs 10, 20, 30 to eno1 and tap130i0
    • Made persistent via post-up commands in /etc/network/interfaces
  • Pi-hole veth: Added VLANs 10, 20, 30 to veth103i0 for routed traffic
  • OPNsense VLANs: Rebooted to fix broken vlan02/vlan03 parent interface
  • Trusted VLAN Firewall: Added allow-all rule for opt2 (Trusted)
  • Pi-hole listeningMode: Changed from "LOCAL" to "ALL" in pihole.toml
    • Required for Pi-hole to accept DNS queries from non-local subnets
  • Pi-hole Gateway: Set to 10.4.2.1 (OPNsense) for proper return routing

Verified Working

  • All VLANs (10, 20, 30) receiving DHCP from OPNsense
  • DNS resolution via Pi-hole from all VLANs
  • Internet access from all VLANs
  • Firewall isolation rules in place

2025-12-19

Network Upgrade Progress

  • UniFi Controller: Deployed LXC 111 on pm4 for AP management

    • IP: 10.4.2.242 (DHCP, will be assigned static via OPNsense later)
    • Port: 8443 (HTTPS web UI)
    • Deployed via ProxmoxVE community helper script
    • Configured 3 SSIDs: KavCorp-Trusted, KavCorp-IOT (2.4GHz only), KavCorp-Guest
  • OPNsense: Deployed VM 130 on pm4 as future router/firewall

    • Hostname: KavSense
    • IP: 10.4.2.1 (WAN interface, static)
    • Gateway: 10.4.2.254 (Asus router as upstream during transition)
    • Memory: 8GB, 2 vCPU, 32GB disk
    • VLAN 10 interface configured: 10.4.10.1/24 with DHCP (10.4.10.100-200)
    • Web UI: https://10.4.2.1
    • Status: Running, ready for migration when GiGaPlus switches arrive
  • pm4 vmbr0: Enabled VLAN-aware bridge for VLAN support

  • VLAN Testing: Attempted VLAN 10 through existing Netgear GS308EP

    • GS308EP trunk mode configuration unsuccessful
    • Decision: Wait for GiGaPlus 10G switches for proper VLAN support
    • UniFi VLAN10-Test network created, ready for use

2025-12-18

Service Additions

  • Pi-hole: Added network-wide ad blocker with recursive DNS
    • LXC 103 on pm4
    • IP: 10.4.2.129
    • Domain: pihole.kavcorp.com
    • Unbound configured for recursive DNS resolution
    • Traefik config: /etc/traefik/conf.d/pihole.yaml
    • Deployed via ProxmoxVE community helper script
    • Tagged: adblock, dns

Planning

  • Network Upgrade Plan: Created comprehensive plan for network overhaul
    • Replace Asus mesh with UniFi APs (U6 Enterprise existing + 2× U7 Pro)
    • Add 10G backhaul between server closet and basement
    • Hardware: 2× GiGaPlus 10G PoE switches ($202), 2× U7 Pro ($378)
    • Total estimated cost: ~$580
    • VLAN segmentation: Trusted (1), Servers (10), IoT (20), Guest (30)
    • OPNsense VM on Elantris for routing/firewall
    • UniFi Controller LXC for AP management
    • See docs/NETWORK-UPGRADE-PLAN.md for full details

2025-12-15

Frigate Migration & Upgrade

  • Frigate: Migrated from source install (LXC 111) to Docker-based (LXC 128)

    • Old: LXC 111 on pm3 (source install, 0.14.1)
    • New: LXC 128 on pm3 (Docker, 0.17.0-beta1)
    • IP: 10.4.2.8
    • Domain: frigate.kavcorp.com
    • Privileged LXC required for USB device passthrough (Coral TPU)
    • Coral USB TPU successfully passed through
    • NFS mount for media storage: /mnt/pve/KavNas/frigate-media
  • Frigate Configuration Updates:

    • Enabled built-in authentication (port 8971)
    • Updated MQTT to correct Home Assistant IP (10.4.2.199)
    • Consolidated camera configs using global defaults
    • Fixed garage stream bug (was using wrong ffmpeg source)
    • Added stationary car filtering (stops tracking after 30 seconds)
  • Traefik Updates:

    • Updated Frigate route to use HTTPS backend (port 8971)
    • Added serversTransport for self-signed cert (insecureSkipVerify)
    • Fixed disk full issue (removed 903MB old access log)
    • Added logrotate config: 50MB max, 3 rotations, daily

Service Recovery

  • Power Outage Recovery: Started all stopped LXCs on pm2, pm3, pm4
  • VM 109 (docker-pm3): Fixed missing onboot setting

Infrastructure Notes

  • LXC 111 (old Frigate) pending deletion after new setup confirmed
  • Port 5000 on Frigate remains available for Home Assistant integration (unauthenticated)
  • Admin credentials logged on first auth-enabled startup

2025-12-08

Service Configuration

  • Shinobi (LXC 103): Configured NVR storage and Traefik endpoint
    • Added to Traefik reverse proxy: shinobi.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/shinobi.yaml
    • Created NFS storage on elantris (/el-pool/shinobi) - 11TB available
    • Added Proxmox NFS storage: elantris-shinobi
    • Mounted NFS to LXC 103: /opt/Shinobi/videos
    • Coral USB TPU device passed through to container
    • Coral object detection plugin attempted but blocked by TensorFlow Lite unavailability for Ubuntu 24.04/Python 3.12
    • Motion detection available and working

Notes

  • Coral TPU native plugin requires building TensorFlow Lite from source, which is complex for Ubuntu 24.04
  • Basic motion detection works out of the box for event recording
  • Object detection may require alternative approach (Frigate, or CPU-based detection)

2025-12-07

Service Additions

  • Vaultwarden: Created new password manager LXC

    • LXC 125 on pm4
    • IP: 10.4.2.212
    • Domain: vtw.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/vaultwarden.yaml
    • Tagged: community-script, password-manager
  • Immich: Migrated from Docker (dockge LXC 107 on pm3) to native LXC

    • LXC 126 on pm4
    • IP: 10.4.2.24:2283
    • Domain: immich.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/immich.yaml
    • Library storage: NFS mount from elantris (/el-pool/downloads/immich/)
    • 38GB photo library transferred via rsync
    • Fresh database (version incompatibility: old v1.129.0 → new v2.3.1)
    • Services: immich-web.service, immich-ml.service
    • Tagged: community-script, photos
  • Gitea: Added self-hosted Git server

    • LXC 127 on pm4
    • IP: 10.4.2.7:3000
    • Domain: git.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/gitea.yaml
    • Config: /etc/gitea/app.ini
    • Push-to-create enabled for users and orgs
    • Initial repo: proxmox-infra (infrastructure documentation)
    • Tagged: community-script, git

Infrastructure Maintenance

  • Traefik (LXC 104): Fixed disk full issue
    • Truncated 895MB access log that filled 2GB rootfs
    • Added logrotate config to prevent recurrence (50MB max, 7 day rotation)
    • Cleaned apt cache and journal logs

2025-11-20

Service Changes

  • AMP: Added to Traefik reverse proxy
    • LXC 124 on elantris (10.4.2.26:8080)
    • Domain: amp.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/amp.yaml
    • Purpose: Game server management via CubeCoders AMP

2025-11-19

Service Changes

  • LXC 123 (elantris): Migrated from Ollama to llama.cpp

    • Removed Ollama installation and service
    • Built llama.cpp from source with CURL support
    • Downloaded TinyLlama 1.1B Q4_K_M model (~667MB)
    • Created systemd service for llama.cpp server
    • Server running on port 11434 (OpenAI-compatible API)
    • Model path: /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
    • Service: llama-cpp.service
    • Domain remains: ollama.kavcorp.com (pointing to llama.cpp now)
  • LXC 124 (elantris): Created new AMP (Application Management Panel) container

    • IP: 10.4.2.26
    • Resources: 4 CPU cores, 4GB RAM, 16GB storage
    • Storage: local-lvm on elantris
    • OS: Ubuntu 24.04 LTS
    • Purpose: Game server management via CubeCoders AMP
    • Tagged: gaming, amp

2025-11-17

Service Additions

  • Ollama: Added to Traefik reverse proxy

    • LXC 123 on elantris
    • IP: 10.4.2.224:11434
    • Domain: ollama.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/ollama.yaml
    • Downloaded Qwen 3 Coder 30B model
  • Frigate: Added to Traefik reverse proxy

    • LXC 111 on pm3
    • IP: 10.4.2.215:5000
    • Domain: frigate.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/frigate.yaml
  • Foundry VTT: Added to Traefik reverse proxy

    • LXC 112 on pm3
    • IP: 10.4.2.37:30000
    • Domain: vtt.kavcorp.com
    • Traefik config: /etc/traefik/conf.d/foundry.yaml

Infrastructure Changes

  • SSH Access: Regenerated SSH keys on pm2 and distributed to all cluster nodes
    • pm3 SSH service was down, enabled and configured
    • All nodes (pm1, pm2, pm3, pm4, elantris) now accessible from pm2 via Proxmox web UI

Service Configuration

  • NZBGet: Fixed file permissions

    • Set UMask=0000 in nzbget.conf to create files with 777 permissions
    • Fixed permission issues causing Sonarr import failures
  • Sonarr: Enabled automatic permission setting

    • Media Management → Set Permissions → chmod 777
    • Ensures imported files are accessible by Jellyfin
  • Jellyseerr: Fixed Traefik routing

    • Corrected IP from 10.4.2.20 to 10.4.2.18 in media-services.yaml
  • Jellyfin: Fixed LXC mount issues

    • Restarted LXC 121 to activate media mounts
    • Media now visible in /media/tv, /media/movies, /media/anime

Documentation

  • Major Reorganization: Consolidated scattered docs into structured system
    • Created README.md - Documentation index and guide
    • Created INFRASTRUCTURE.md - All infrastructure details
    • Created CONFIGURATIONS.md - Service configurations
    • Created DECISIONS.md - Architecture decisions and patterns
    • Created TASKS.md - Current and pending tasks
    • Created CHANGELOG.md - This file
    • Updated CLAUDE.md - Added documentation policy

2025-11-16

Service Deployments

  • Home Assistant: Added to Traefik reverse proxy

    • Domain: hass.kavcorp.com
    • Configured trusted proxies in Home Assistant
  • Frigate: Added to Traefik reverse proxy

    • Domain: frigate.kavcorp.com
  • Proxmox: Added to Traefik reverse proxy

    • Domain: pm.kavcorp.com
    • Backend: pm2 (10.4.2.6:8006)
  • Recyclarr: Configured TRaSH Guides automation

    • Sonarr and Radarr quality profiles synced
    • Dolby Vision blocking implemented
    • Daily sync schedule via cron

Configuration Changes

  • Traefik: Removed Authelia from *arr services
    • Services now use only built-in authentication
    • Simplified access for Sonarr, Radarr, Prowlarr, Bazarr, Whisparr, NZBGet

Issues Encountered

  • Media organization script moved files incorrectly
  • Sonarr database corruption (lost TV series tracking)
  • Permission issues with NZBGet downloads
  • Jellyfin LXC mount not active after deployment

Lessons Learned

  • Always verify file permissions (777 required for NFS media)
  • Backup service databases before running automation scripts
  • LXC mounts may need container restart to activate
  • Traefik auto-reloads configs, no restart needed

Earlier History

To be documented from previous sessions if needed