Files
proxmox-infra/docs/CHANGELOG.md
kavren c9f30559b5 add: Basement HTPC to infrastructure docs
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 15:38:10 -05:00

502 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
> **Purpose**: Historical record of all significant infrastructure changes
## 2026-01-03
### Basement HTPC Added
- Added basement HTPC to infrastructure (10.4.2.190)
- Created SSH profile `htpc` in ~/.ssh/config
- Added new "Clients / Endpoints" section to INFRASTRUCTURE.md for DHCP-range devices
## 2026-01-02
### Synology DSM Traefik Route
- Added Traefik route for `dsm.kavcorp.com` → KavNas DSM (10.4.2.13:5001)
- Config: `/etc/traefik/conf.d/dsm.yaml`
- Note: DSM is serving HTTP on port 5001 (not HTTPS), Traefik terminates TLS
## 2025-12-28
### Guest VLAN Traefik Access
- Added firewall rule allowing Guest VLAN to access Traefik (10.4.2.10:443)
- Guests can now use `https://jellyfin.kavcorp.com` etc. with valid certs
### Internal DNS for kavcorp.com Domains
- Added Pi-hole DNS entries for `*.kavcorp.com` pointing to Traefik (10.4.2.10)
- Internal clients can now access `https://jellyfin.kavcorp.com` etc. with valid Let's Encrypt certs
- No port numbers needed, same URLs work internally and externally
- Also added Traefik `internal` entrypoint on port 8080 for .kav HTTP access (optional)
### Guest VLAN Media Access
- Added firewall rules allowing Guest VLAN to access Jellyseerr (10.4.2.25) and Jellyfin (10.4.2.26)
- Rules inserted before "Block Guest to LAN" to allow media streaming for guests
### Guest VLAN Internet Fix
- Fixed Guest VLAN (10.4.30.0/24) having no internet access
- Root cause: OPNsense DHCP and firewall rules referenced non-existent 10.4.2.129 for DNS
- Fix: Updated all DNS references in OPNsense config.xml from 10.4.2.129 to 10.4.2.11 (Pi-hole)
- Affected: DHCP DNS server settings for all VLANs, firewall DNS allow rules
- Guest clients need DHCP lease renewal to get correct DNS server
### RustDesk Server Deployment
- Deployed RustDesk server LXC 129 on pm2 via ProxmoxVE helper script
- Configured static IP: 10.4.2.36
- Added local DNS: rustdesk.kav
- Public key: `UCLpXJifKwWZRWIPqVkyrVfFH89DE8Ca0iBNZselaSU=`
- Services: hbbs (signal), hbbr (relay), api
- Ports: 21115-21119 (TCP), 21116 (UDP)
### Network Infrastructure Cleanup
#### Static IP Migration Complete
All containers now have static IPs in organized ranges:
- **Core Infrastructure** (10.4.2.10-19): Pi-hole→.11, Authelia→.12, Vaultwarden→.15
- **Media Stack** (10.4.2.20-29): All *arr services, Jellyfin, etc.
- **Services** (10.4.2.30-39): Immich→.30, Gitea→.31, Frigate→.32, Ollama→.34
- **IoT** (10.4.2.50-99): Z-Wave→.50, MQTT→.51
- **Docker Hosts** (10.4.2.200-209): docker-pm2→.200, docker-pm4→.201
#### Pi-hole Local DNS (.kav domain)
- Configured Pi-hole (10.4.2.11) as local DNS resolver
- All services now have `.kav` hostnames (e.g., traefik.kav, sonarr.kav)
- DNS records added via `dns.hosts` array in `/etc/pihole/pihole.toml`
#### SSH Access to All Containers
- Created provisioning script: `scripts/provisioning/setup-ssh-access.sh`
- All LXC containers now have SSH enabled with key-based auth
- Access via: `ssh root@<service>.kav`
#### Traefik Route Updates
- Updated backend IPs for: authelia.yaml, vaultwarden.yaml, pihole.yaml
- All routes now point to new static IPs
#### Documentation Updates
- Created `docs/NETWORK-MAP.md` with complete IP allocation
- Created `scripts/monitoring/network-map.sh` for dynamic map generation
- Updated `docs/INFRASTRUCTURE.md` with new service map
- Updated gateway references from 10.4.2.254 to 10.4.2.1
#### Pending
- Update OPNsense DHCP to distribute Pi-hole (10.4.2.11) as DNS
- Configure Home Assistant static IP (10.4.2.33) via HAOS UI
## 2025-12-22
### NAT Reflection & External Access Fix
- **Root cause**: Traefik (LXC 104) had gateway set to 10.4.2.254 (Asus) instead of 10.4.2.1 (OPNsense)
- **Symptom**: External traffic and VLAN traffic to Traefik via WAN IP failed (asymmetric routing)
- **Fix**: Changed Traefik gateway to 10.4.2.1 in both runtime and `/etc/pve/lxc/104.conf`
### OPNsense NAT Configuration
- Enabled NAT reflection (Pure NAT mode) in Firewall → Settings → Advanced
- Enabled automatic outbound NAT for reflection
- Port forwards for HTTPS (443) → Traefik (10.4.2.10) now work from all VLANs and external
### NFS Storage Issues
- KavNas has two NICs with different IPs; primary is 10.4.2.13
- Fixed stale NFS mounts on pm2 and pm4 by updating `/etc/pve/storage.cfg` to correct IP
- Pi-hole (LXC 103) and other containers recovered after NFS fix
## 2025-12-21
### Traefik Updates
- **UniFi Controller**: Added Traefik route
- Domain: unifi.kavcorp.com
- Backend: https://10.4.2.242:8443
- Config: `/etc/traefik/conf.d/unifi.yaml`
- **OPNsense**: Added Traefik route
- Domain: opnsense.kavcorp.com
- Backend: https://10.4.2.1
- Config: `/etc/traefik/conf.d/opnsense.yaml`
- **Traefik LXC 104**: Resized rootfs from 2GB to 4GB (was filling up repeatedly)
### OPNsense WAN Configuration
- **pm4 vmbr1**: Created new bridge for OPNsense WAN interface
- Physical NIC: enx6c1ff76e4d47 (USB 2.5G adapter)
- Added to `/etc/network/interfaces` on pm4
- Bridge is UP and connected to switch
- **OPNsense VM 130**: Added second network interface
- net0: vmbr0 (LAN - 10.4.2.0/24)
- net1: vmbr1 (WAN - to AT&T modem)
- Ready for WAN cutover when AT&T modem is connected
### OPNsense VLAN Configuration (Implemented)
- **VLANs Created** on vtnet0 (LAN interface):
- VLAN 10 (vlan01): Trusted network - 10.4.10.0/24
- VLAN 20 (vlan02): IoT network - 10.4.20.0/24
- VLAN 30 (vlan03): Guest network - 10.4.30.0/24
- **VLAN Interfaces Configured**:
- vlan01: 10.4.10.1/24 (gateway for Trusted)
- vlan02: 10.4.20.1/24 (gateway for IoT)
- vlan03: 10.4.30.1/24 (gateway for Guest)
- **DHCP Configured** on all interfaces:
- LAN: 10.4.2.100-200, DNS: 10.4.2.129 (Pi-hole)
- Trusted: 10.4.10.100-200
- IoT: 10.4.20.100-200
- Guest: 10.4.30.100-200
- **Firewall Rules Implemented**:
- Allow DNS: IoT/Guest → 10.4.2.129:53 (Pi-hole)
- Block IoT → LAN: 10.4.20.0/24 → 10.4.2.0/24
- Block Guest → LAN: 10.4.30.0/24 → 10.4.2.0/24
- Block Guest → IoT: 10.4.30.0/24 → 10.4.20.0/24
- Allow Home Assistant → IoT: 10.4.2.62 → 10.4.20.0/24
- Allow IoT/Guest → Internet
- **Note**: Unmanaged Gigabyte switches pass VLAN tags through (they just don't understand them). UniFi APs tag traffic per SSID, OPNsense receives tagged traffic on VLAN interfaces.
- **Documentation Updated**:
- DECISIONS.md: Complete VLAN architecture and firewall rules
- INFRASTRUCTURE.md: VLANs and subnets table, pm4 bridges
### OPNsense WAN Cutover (Completed)
- Connected USB NIC (vmbr1) to AT&T modem
- WAN IP: 192.168.1.183 (DHCP from AT&T gateway 192.168.1.254)
- Fixed default route to use WAN gateway instead of Asus
- Internet working through OPNsense
### VLAN Troubleshooting & Fixes
- **pm4 vmbr0**: Added `bridge-vlan-aware yes` to enable VLAN filtering
- **Bridge VLAN Memberships**: Added VLANs 10, 20, 30 to eno1 and tap130i0
- Made persistent via `post-up` commands in /etc/network/interfaces
- **Pi-hole veth**: Added VLANs 10, 20, 30 to veth103i0 for routed traffic
- **OPNsense VLANs**: Rebooted to fix broken vlan02/vlan03 parent interface
- **Trusted VLAN Firewall**: Added allow-all rule for opt2 (Trusted)
- **Pi-hole listeningMode**: Changed from "LOCAL" to "ALL" in pihole.toml
- Required for Pi-hole to accept DNS queries from non-local subnets
- **Pi-hole Gateway**: Set to 10.4.2.1 (OPNsense) for proper return routing
### Asus DHCP Disabled
- Disabled DHCP on Asus router
- OPNsense now sole DHCP server for LAN (10.4.2.0/24)
- LAN DHCP range: 10.4.2.100-200, DNS: 10.4.2.129 (Pi-hole)
### Firewall Rule Fixes
- **LAN → IoT Access**: Added rule allowing LAN net (10.4.2.0/24) to reach IoT subnet (10.4.20.0/24)
- Enables Home Assistant, Frigate, and other LAN services to access IoT devices
- Rule added via OPNsense UI: Firewall → Rules → LAN
- Interface must be `opt1` (not `lan`) in config.xml
- **Broken IPv6 Rule Fix**: Fixed "Default allow LAN IPv6" rule
- Was using IPv4 address (10.4.2.0/24) with inet6 protocol
- Changed source from `<address>10.4.2.0/24</address>` to `<network>opt1</network>`
- This was preventing all custom firewall rules from loading
- **Interface Naming Discovery**: OPNsense interface names in config.xml:
- `opt1` = LAN (vtnet0, 10.4.2.0/24)
- `opt2` = Trusted (vlan01, 10.4.10.0/24)
- `opt3` = IoT (vlan02, 10.4.20.0/24)
- `opt4` = Guest (vlan03, 10.4.30.0/24)
### WireGuard VPN Setup
- **WireGuard configured** on OPNsense (built into 25.7 core)
- Server: wg0, port 51820, tunnel 10.10.10.1/24
- Allows remote access to all internal subnets
- Firewall rule added for WireGuard interface
- **AT&T IP Passthrough configured**:
- Mode: DHCPS-fixed
- MAC: bc:24:11:cb:12:82 (OPNsense WAN)
- OPNsense now receives public IP directly (99.74.188.161)
- Required for both WireGuard and Traefik to work properly
- **Plugins installed**:
- os-qemu-guest-agent (for Proxmox integration)
- os-tailscale (backup VPN, not yet configured)
### NAT Port Forwards Migrated
- **Port forwards migrated from Asus router** to OPNsense:
- HTTP (80) → Traefik (10.4.2.10)
- HTTPS (443) → Traefik (10.4.2.10)
- Game server ports → AMP (10.4.2.26):
- 2223-2323, 2456-2556, 5678-5778, 7777-7877, 8766-8866 (AMP)
- 25565-25570 (Minecraft), 27004-27025 (CS/Steam)
- 15637 (Enshrouded), 16261-16262 (Project Zomboid)
- 9876-9877 (V Rising), 8211 (Palworld), 25576 (Palworld RCON)
- 27016 (Palworld Query), 26900-26910 (7 Days to Die)
- **Port range NAT fix**: OPNsense config.xml requires `<local-port>` to contain only the **starting port** (e.g., `2223`), not the full range (e.g., `2223-2323`). OPNsense maps ranges 1:1 automatically.
### NAT Reflection Fixed
- **Problem**: Internal clients couldn't access services via public domain names (kavcorp.com)
- **Root cause**: Pure NAT reflection mode doesn't source-NAT traffic, so return packets bypass OPNsense
- **Solution**: Enabled `enablenatreflectionhelper` (NAT+proxy mode) instead of pure NAT
- **Config changes**:
- `<enablenatreflectionpurenat>no</enablenatreflectionpurenat>`
- `<enablenatreflectionhelper>yes</enablenatreflectionhelper>`
- Added `<natreflection>purenat</natreflection>` to HTTP/HTTPS port forward rules
- Internal and external access now both work via public domain names
### Verified Working
- All VLANs (10, 20, 30) receiving DHCP from OPNsense
- LAN (10.4.2.0/24) receiving DHCP from OPNsense
- DNS resolution via Pi-hole from all VLANs
- Internet access from all VLANs
- Firewall isolation rules in place
## 2025-12-19
### Network Upgrade Progress
- **UniFi Controller**: Deployed LXC 111 on pm4 for AP management
- IP: 10.4.2.242 (DHCP, will be assigned static via OPNsense later)
- Port: 8443 (HTTPS web UI)
- Deployed via ProxmoxVE community helper script
- Configured 3 SSIDs: KavCorp-Trusted, KavCorp-IOT (2.4GHz only), KavCorp-Guest
- **OPNsense**: Deployed VM 130 on pm4 as future router/firewall
- Hostname: KavSense
- IP: 10.4.2.1 (WAN interface, static)
- Gateway: 10.4.2.254 (Asus router as upstream during transition)
- Memory: 8GB, 2 vCPU, 32GB disk
- VLAN 10 interface configured: 10.4.10.1/24 with DHCP (10.4.10.100-200)
- Web UI: https://10.4.2.1
- Status: Running, ready for migration when GiGaPlus switches arrive
- **pm4 vmbr0**: Enabled VLAN-aware bridge for VLAN support
- **VLAN Testing**: Attempted VLAN 10 through existing Netgear GS308EP
- GS308EP trunk mode configuration unsuccessful
- Decision: Wait for GiGaPlus 10G switches for proper VLAN support
- UniFi VLAN10-Test network created, ready for use
## 2025-12-18
### Service Additions
- **Pi-hole**: Added network-wide ad blocker with recursive DNS
- LXC 103 on pm4
- IP: 10.4.2.129
- Domain: pihole.kavcorp.com
- Unbound configured for recursive DNS resolution
- Traefik config: `/etc/traefik/conf.d/pihole.yaml`
- Deployed via ProxmoxVE community helper script
- Tagged: adblock, dns
### Planning
- **Network Upgrade Plan**: Created comprehensive plan for network overhaul
- Replace Asus mesh with UniFi APs (U6 Enterprise existing + 2× U7 Pro)
- Add 10G backhaul between server closet and basement
- Hardware: 2× GiGaPlus 10G PoE switches (~$202), 2× U7 Pro (~$378)
- Total estimated cost: ~$580
- VLAN segmentation: Trusted (1), Servers (10), IoT (20), Guest (30)
- OPNsense VM on Elantris for routing/firewall
- UniFi Controller LXC for AP management
- See `docs/NETWORK-UPGRADE-PLAN.md` for full details
## 2025-12-15
### Frigate Migration & Upgrade
- **Frigate**: Migrated from source install (LXC 111) to Docker-based (LXC 128)
- Old: LXC 111 on pm3 (source install, 0.14.1)
- New: LXC 128 on pm3 (Docker, 0.17.0-beta1)
- IP: 10.4.2.8
- Domain: frigate.kavcorp.com
- Privileged LXC required for USB device passthrough (Coral TPU)
- Coral USB TPU successfully passed through
- NFS mount for media storage: `/mnt/pve/KavNas/frigate-media`
- **Frigate Configuration Updates**:
- Enabled built-in authentication (port 8971)
- Updated MQTT to correct Home Assistant IP (10.4.2.199)
- Consolidated camera configs using global defaults
- Fixed garage stream bug (was using wrong ffmpeg source)
- Added stationary car filtering (stops tracking after 30 seconds)
- **Traefik Updates**:
- Updated Frigate route to use HTTPS backend (port 8971)
- Added serversTransport for self-signed cert (insecureSkipVerify)
- Fixed disk full issue (removed 903MB old access log)
- Added logrotate config: 50MB max, 3 rotations, daily
### Service Recovery
- **Power Outage Recovery**: Started all stopped LXCs on pm2, pm3, pm4
- **VM 109 (docker-pm3)**: Fixed missing onboot setting
### Infrastructure Notes
- LXC 111 (old Frigate) pending deletion after new setup confirmed
- Port 5000 on Frigate remains available for Home Assistant integration (unauthenticated)
- Admin credentials logged on first auth-enabled startup
## 2025-12-08
### Service Configuration
- **Shinobi (LXC 103)**: Configured NVR storage and Traefik endpoint
- Added to Traefik reverse proxy: shinobi.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/shinobi.yaml`
- Created NFS storage on elantris (`/el-pool/shinobi`) - 11TB available
- Added Proxmox NFS storage: `elantris-shinobi`
- Mounted NFS to LXC 103: `/opt/Shinobi/videos`
- Coral USB TPU device passed through to container
- Coral object detection plugin attempted but blocked by TensorFlow Lite unavailability for Ubuntu 24.04/Python 3.12
- Motion detection available and working
### Notes
- Coral TPU native plugin requires building TensorFlow Lite from source, which is complex for Ubuntu 24.04
- Basic motion detection works out of the box for event recording
- Object detection may require alternative approach (Frigate, or CPU-based detection)
## 2025-12-07
### Service Additions
- **Vaultwarden**: Created new password manager LXC
- LXC 125 on pm4
- IP: 10.4.2.212
- Domain: vtw.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/vaultwarden.yaml`
- Tagged: community-script, password-manager
- **Immich**: Migrated from Docker (dockge LXC 107 on pm3) to native LXC
- LXC 126 on pm4
- IP: 10.4.2.24:2283
- Domain: immich.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/immich.yaml`
- Library storage: NFS mount from elantris (`/el-pool/downloads/immich/`)
- 38GB photo library transferred via rsync
- Fresh database (version incompatibility: old v1.129.0 → new v2.3.1)
- Services: immich-web.service, immich-ml.service
- Tagged: community-script, photos
- **Gitea**: Added self-hosted Git server
- LXC 127 on pm4
- IP: 10.4.2.7:3000
- Domain: git.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/gitea.yaml`
- Config: `/etc/gitea/app.ini`
- Push-to-create enabled for users and orgs
- Initial repo: `proxmox-infra` (infrastructure documentation)
- Tagged: community-script, git
### Infrastructure Maintenance
- **Traefik (LXC 104)**: Fixed disk full issue
- Truncated 895MB access log that filled 2GB rootfs
- Added logrotate config to prevent recurrence (50MB max, 7 day rotation)
- Cleaned apt cache and journal logs
## 2025-11-20
### Service Changes
- **AMP**: Added to Traefik reverse proxy
- LXC 124 on elantris (10.4.2.26:8080)
- Domain: amp.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/amp.yaml`
- Purpose: Game server management via CubeCoders AMP
## 2025-11-19
### Service Changes
- **LXC 123 (elantris)**: Migrated from Ollama to llama.cpp
- Removed Ollama installation and service
- Built llama.cpp from source with CURL support
- Downloaded TinyLlama 1.1B Q4_K_M model (~667MB)
- Created systemd service for llama.cpp server
- Server running on port 11434 (OpenAI-compatible API)
- Model path: `/opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
- Service: `llama-cpp.service`
- Domain remains: ollama.kavcorp.com (pointing to llama.cpp now)
- **LXC 124 (elantris)**: Created new AMP (Application Management Panel) container
- IP: 10.4.2.26
- Resources: 4 CPU cores, 4GB RAM, 16GB storage
- Storage: local-lvm on elantris
- OS: Ubuntu 24.04 LTS
- Purpose: Game server management via CubeCoders AMP
- Tagged: gaming, amp
## 2025-11-17
### Service Additions
- **Ollama**: Added to Traefik reverse proxy
- LXC 123 on elantris
- IP: 10.4.2.224:11434
- Domain: ollama.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/ollama.yaml`
- Downloaded Qwen 3 Coder 30B model
- **Frigate**: Added to Traefik reverse proxy
- LXC 111 on pm3
- IP: 10.4.2.215:5000
- Domain: frigate.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/frigate.yaml`
- **Foundry VTT**: Added to Traefik reverse proxy
- LXC 112 on pm3
- IP: 10.4.2.37:30000
- Domain: vtt.kavcorp.com
- Traefik config: `/etc/traefik/conf.d/foundry.yaml`
### Infrastructure Changes
- **SSH Access**: Regenerated SSH keys on pm2 and distributed to all cluster nodes
- pm3 SSH service was down, enabled and configured
- All nodes (pm1, pm2, pm3, pm4, elantris) now accessible from pm2 via Proxmox web UI
### Service Configuration
- **NZBGet**: Fixed file permissions
- Set `UMask=0000` in nzbget.conf to create files with 777 permissions
- Fixed permission issues causing Sonarr import failures
- **Sonarr**: Enabled automatic permission setting
- Media Management → Set Permissions → chmod 777
- Ensures imported files are accessible by Jellyfin
- **Jellyseerr**: Fixed Traefik routing
- Corrected IP from 10.4.2.20 to 10.4.2.18 in media-services.yaml
- **Jellyfin**: Fixed LXC mount issues
- Restarted LXC 121 to activate media mounts
- Media now visible in `/media/tv`, `/media/movies`, `/media/anime`
### Documentation
- **Major Reorganization**: Consolidated scattered docs into structured system
- Created `README.md` - Documentation index and guide
- Created `INFRASTRUCTURE.md` - All infrastructure details
- Created `CONFIGURATIONS.md` - Service configurations
- Created `DECISIONS.md` - Architecture decisions and patterns
- Created `TASKS.md` - Current and pending tasks
- Created `CHANGELOG.md` - This file
- Updated `CLAUDE.md` - Added documentation policy
## 2025-11-16
### Service Deployments
- **Home Assistant**: Added to Traefik reverse proxy
- Domain: hass.kavcorp.com
- Configured trusted proxies in Home Assistant
- **Frigate**: Added to Traefik reverse proxy
- Domain: frigate.kavcorp.com
- **Proxmox**: Added to Traefik reverse proxy
- Domain: pm.kavcorp.com
- Backend: pm2 (10.4.2.6:8006)
- **Recyclarr**: Configured TRaSH Guides automation
- Sonarr and Radarr quality profiles synced
- Dolby Vision blocking implemented
- Daily sync schedule via cron
### Configuration Changes
- **Traefik**: Removed Authelia from *arr services
- Services now use only built-in authentication
- Simplified access for Sonarr, Radarr, Prowlarr, Bazarr, Whisparr, NZBGet
### Issues Encountered
- Media organization script moved files incorrectly
- Sonarr database corruption (lost TV series tracking)
- Permission issues with NZBGet downloads
- Jellyfin LXC mount not active after deployment
### Lessons Learned
- Always verify file permissions (777 required for NFS media)
- Backup service databases before running automation scripts
- LXC mounts may need container restart to activate
- Traefik auto-reloads configs, no restart needed
## Earlier History
*To be documented from previous sessions if needed*