Added OPNsense firewall rules allowing Guest VLAN (10.4.30.0/24) to access media services: - Jellyseerr: 10.4.2.25 - Jellyfin: 10.4.2.26 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
12 KiB
Architecture Decisions & Patterns
Purpose: Record of important decisions, patterns, and "why we do it this way" Update Frequency: When making significant architectural choices
Service Organization
Authentication Strategy
Decision: Services use their own built-in authentication, not Authelia Reason: Most *arr services and media tools have robust auth systems Exception: Consider Authelia for future services that lack authentication
LXC vs Docker
Keep in Docker:
- NZBGet (requires specific volume mapping, works well in Docker)
- Multi-container stacks
- Services requiring Docker-specific features
Migrate to LXC:
- Single-purpose services (Sonarr, Radarr, etc.)
- Services benefiting from isolation
- Stateless applications
File Permissions
Media Files
Standard: All media files and folders must be 777 Reason:
- NFS mounts between multiple systems with different UID mappings
- Jellyfin runs in LXC with UID namespace mapping (100107)
- Sonarr runs in LXC with different UID mapping
- NZBGet runs in Docker with UID 1000
Implementation:
- NZBGet:
UMask=0000to create files with 777 - Sonarr: Media management → Set permissions → chmod 777
- Manual fixes:
chmod -R 777on media directories as needed
Network Architecture
Local DNS (.kav TLD)
Decision: Use .kav as the local top-level domain for internal services
Reason:
- Unique to KavCorp network, avoids conflicts with real TLDs
- Short and memorable
- Works without additional configuration
- Pi-hole handles resolution via
dns.hostsin pihole.toml
Alternatives Considered:
.lan- Common but can conflict with some routers.local- Conflicts with mDNS/Bonjour.home.arpa- RFC 8375 compliant but verbose
Usage: All services accessible via <service>.kav (e.g., traefik.kav, sonarr.kav)
SSH Access Policy
Decision: SSH from workstation only, no container-to-container SSH Reason:
- Reduces attack surface
- Single key to manage
- Containers don't need to communicate via SSH
Implementation:
- Workstation ed25519 key added to all containers
PermitRootLogin prohibit-password(key-only)- Provisioning script:
scripts/provisioning/setup-ssh-access.sh
IP Allocation Scheme
Decision: Organized IP ranges by service type Reason: Easy to identify service type from IP, logical grouping
| Range | Purpose |
|---|---|
| 10.4.2.1 | Gateway (OPNsense) |
| 10.4.2.2-9 | Proxmox nodes |
| 10.4.2.10-19 | Core infrastructure |
| 10.4.2.20-29 | Media stack |
| 10.4.2.30-39 | Other services |
| 10.4.2.40-49 | Game servers |
| 10.4.2.50-99 | IoT / Reserved |
| 10.4.2.100-199 | DHCP pool |
| 10.4.2.200-209 | Docker hosts |
Network Isolation Strategy
Goal: Isolate IoT (KavCorp-IOT) and Guest (KavCorp-Guest) WiFi networks from the main LAN, while allowing Smart Home VMs to access IoT devices.
Status: Implemented via OPNsense VLANs and firewall rules.
VLAN Architecture
Unmanaged Gigabyte switches pass VLAN tags through (they just don't understand them). UniFi APs tag traffic per SSID, OPNsense receives tagged traffic on VLAN interfaces.
| VLAN | Interface | Subnet | Gateway | Purpose |
|---|---|---|---|---|
| - | vtnet0 (LAN) | 10.4.2.0/24 | 10.4.2.1 | Infrastructure (Proxmox, core services) |
| 10 | vlan01 | 10.4.10.0/24 | 10.4.10.1 | Trusted (user devices) |
| 20 | vlan02 | 10.4.20.0/24 | 10.4.20.1 | IoT (KavCorp-IOT SSID) |
| 30 | vlan03 | 10.4.30.0/24 | 10.4.30.1 | Guest (KavCorp-Guest SSID) |
DHCP Configuration
All DHCP served by OPNsense:
- LAN: 10.4.2.100-200, DNS: 10.4.2.11 (Pi-hole)
- Trusted: 10.4.10.100-200, DNS: 10.4.2.11
- IoT: 10.4.20.100-200, DNS: 10.4.2.11
- Guest: 10.4.30.100-200, DNS: 10.4.2.11
OPNsense Firewall Rules (Implemented)
| Rule | Source | Destination | Action |
|---|---|---|---|
| Allow DNS | IoT/Guest | 10.4.2.11:53 | Pass |
| Allow Guest→Media | 10.4.30.0/24 | 10.4.2.25, 10.4.2.26 | Pass |
| Block IoT→LAN | 10.4.20.0/24 | 10.4.2.0/24 | Block |
| Block Guest→LAN | 10.4.30.0/24 | 10.4.2.0/24 | Block |
| Block Guest→IoT | 10.4.30.0/24 | 10.4.20.0/24 | Block |
| Allow LAN→IoT | 10.4.2.0/24 | 10.4.20.0/24 | Pass |
| Allow IoT Internet | 10.4.20.0/24 | any | Pass |
| Allow Guest Internet | 10.4.30.0/24 | any | Pass |
Note: LAN→IoT rule allows Home Assistant, Frigate, and other LAN services to access IoT devices (cameras, sensors, etc.).
Network Segmentation Philosophy
| Network | Contains | Access Level |
|---|---|---|
| 10.4.2.0/24 (LAN) | Proxmox hosts, OPNsense, Pi-hole, Traefik, NAS | Full infrastructure access |
| 10.4.10.0/24 (Trusted) | User PCs, laptops | Full access to LAN and services |
| 10.4.20.0/24 (IoT) | Smart devices, cameras | Internet + DNS only, no LAN access |
| 10.4.30.0/24 (Guest) | Guest WiFi | Internet + DNS only, no local access |
Future Considerations
- Consider adding a Servers VLAN to isolate services (media stack, Bitwarden) from infrastructure
- Consider OPNsense HA (CARP) with second USB NIC on another node for failover
Router/Firewall
Decision: OPNsense VM 130 on pm4 (server closet) Status: Deployed, pending WAN cutover
Reason:
- Free, full-featured firewall/router
- Inter-subnet firewall rules for IoT/Guest isolation
- IDS/IPS capability
- pm4 is in server closet next to AT&T modem (avoids routing WAN over backhaul)
Network Interfaces (VM 130):
| Interface | Bridge | Purpose | Status |
|---|---|---|---|
| net0 | vmbr0 | LAN (10.4.2.0/24) | Configured |
| net1 | vmbr1 | WAN (to AT&T modem) | Configured |
pm4 Bridge Configuration:
| Bridge | Physical NIC | Purpose |
|---|---|---|
| vmbr0 | eno1 (Intel I226-V) | LAN - all VMs/LXCs |
| vmbr1 | enx6c1ff76e4d47 (USB 2.5G) | WAN - OPNsense only |
HA/Failover Consideration:
- Current: Single OPNsense on pm4 (SPOF)
- Future options:
- OPNsense HA with CARP (requires second USB NIC on another node)
- Keep current router as cold standby (swap cables if pm4 fails)
- Protectli Vault as backup router (limited by port speeds)
Alternative Considered: Ubiquiti Dream Machine
- Rejected due to cost and ecosystem lock-in
- OPNsense more flexible for homelab
Alternative Considered: OPNsense on Elantris (basement)
- Rejected because WAN would need to traverse 10G backhaul
- Would require managed switches for WAN VLAN isolation
10G Backhaul (Planned)
Decision: 10G RJ45 between server closet and basement Hardware: 2× GiGaPlus 6-Port 10G PoE switches ($101 each) Why GiGaPlus over UniFi:
- Native 10G RJ45 (no SFP+ transceivers needed)
- Includes PoE for APs
- $202 total vs $800+ for UniFi equivalent
- Cat6 can handle 10G at house distances (<55m)
WiFi (Planned)
Decision: UniFi APs with mixed models Hardware:
- 1× U6 Enterprise (existing) - server closet/upstairs
- 2× U7 Pro ($189 each) - basement + main floor
Why UniFi:
- Multiple SSIDs mapped to VLANs
- Seamless roaming between APs
- Centralized management via controller
- Better than Asus mesh for VLAN support
Controller: LXC on Proxmox (free) via community helper script
OPNsense Configuration Patterns
Interface Names in config.xml (IMPORTANT):
| UI Name | config.xml | Physical | Subnet |
|---|---|---|---|
| LAN | opt1 | vtnet0 | 10.4.2.0/24 |
| WAN | wan | vtnet1 | DHCP |
| Trusted | opt2 | vlan01 | 10.4.10.0/24 |
| IoT | opt3 | vlan02 | 10.4.20.0/24 |
| Guest | opt4 | vlan03 | 10.4.30.0/24 |
Why This Matters: When editing config.xml directly, use opt1 not lan. Using the wrong name causes rules to fail silently.
Firewall Rule Reload Commands:
# Reload all services (safe, full reload)
configctl filter reload
# Check active rules
pfctl -sr
# Test rules file for syntax errors
pfctl -nf /tmp/rules.debug
# View generated rules before loading
cat /tmp/rules.debug
Common Gotchas:
- IPv6 rules with IPv4 addresses cause entire ruleset to fail loading
- Rules added via config.xml need proper interface names (opt1, not lan)
- After config.xml edits, run
configctl filter reloadto apply - NAT port range rules:
<local-port>must be just the starting port, not the full range- Correct:
<port>2223-2323</port>with<local-port>2223</local-port> - Wrong:
<port>2223-2323</port>with<local-port>2223-2323</local-port>(rule will be commented out)
- Correct:
- NAT reflection requires
enablenatreflectionhelper(not just purenat) when clients and servers are on the same subnet - pure NAT doesn't source-NAT so return traffic bypasses OPNsense
Reverse Proxy
Decision: Single Traefik instance handles all external access Location: LXC 104 on pm2 Benefits:
- Single point for SSL/TLS management
- Automatic Let's Encrypt certificate renewal
- Centralized routing configuration
- DNS-01 challenge for wildcard certificates
Service Domains
Pattern: <service>.kavcorp.com
DNS: All subdomains point to public IP (99.74.188.161)
Routing: Traefik inspects Host header and routes internally
Storage Architecture
Media Storage
Decision: NFS mount from elantris for all media
Path: /mnt/pve/elantris-media → elantris /el-pool/media
Reason:
- Centralized storage
- Accessible from all cluster nodes
- Large capacity (24TB ZFS pool)
- Easy to backup/snapshot
LXC Root Filesystems
Decision: Store on KavNas NFS for most services Reason:
- Easy backups
- Portable between nodes
- Network storage sufficient for most workloads
Exception: High I/O services use local-lvm
Monitoring & Maintenance
Configuration Management
Decision: Manual configuration with documentation Reason: Small scale doesn't justify Ansible/Terraform complexity Trade-off: Requires disciplined documentation updates
Backup Strategy
Decision: Proxmox built-in backup to KavNas Frequency: [To be determined] Retention: [To be determined]
Common Patterns
Adding a New Service Behind Traefik
- Deploy service with static IP in 10.4.2.0/24 range
- Create Traefik config in
/etc/traefik/conf.d/<service>.yaml - Use pattern:
http: routers: <service>: rule: "Host(`<service>.kavcorp.com`)" entryPoints: [websecure] service: <service> tls: certResolver: letsencrypt services: <service>: loadBalancer: servers: - url: "http://<ip>:<port>" - Traefik auto-reloads (no restart needed)
- Update
docs/INFRASTRUCTURE.mdwith service details
Troubleshooting Permission Issues
- Check file ownership:
ls -la /path/to/file - Check if 777:
stat /path/to/file - Fix permissions:
chmod -R 777 /path/to/directory - For NZBGet: Verify
UMask=0000in nzbget.conf - For Sonarr/Radarr: Check Settings → Media Management → Set Permissions
Node SSH Access
From local machine:
- User:
kavren - Key:
~/.ssh/id_ed25519
Between cluster nodes:
- User:
root - Each node has other nodes' keys in
/root/.ssh/authorized_keys - Proxmox web UI uses node SSH for shell access
Known Issues & Workarounds
Jellyfin Not Seeing Media After Import
Symptom: Files imported to /media/tv but Jellyfin shows empty
Cause: Jellyfin LXC mount not active or permissions wrong
Fix:
- Restart Jellyfin LXC:
pct stop 121 && pct start 121 - Verify mount inside LXC:
pct exec 121 -- ls -la /media/tv/ - Fix permissions if needed:
chmod -R 777 /mnt/pve/elantris-media/tv/
Sonarr/Radarr Import Failures
Symptom: "Access denied" errors in logs Cause: Permission mismatch between download client and *arr service Fix: Ensure download folder has 777 permissions
Future Considerations
- Automated backup strategy
- Monitoring/alerting system (Prometheus + Grafana?)
- Consider Authelia for future services without built-in auth
- Document disaster recovery procedures
- Consider consolidating Docker hosts