Initial commit: KavCorp infrastructure documentation

- CLAUDE.md: Project configuration for Claude Code - docs/: Infrastructure documentation - INFRASTRUCTURE.md: Service map, storage, network - CONFIGURATIONS.md: Service configs and credentials - CHANGELOG.md: Change history - DECISIONS.md: Architecture decisions - TASKS.md: Task tracking - scripts/: Automation scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:07:01 -05:00
commit 120c2ec809
19 changed files with 3448 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,239 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Repository Purpose
+
+Infrastructure documentation and management repository for the **KavCorp** Proxmox cluster - a 5-node homelab cluster running self-hosted services. This repository supports migration from Docker containers to Proxmox LXCs where appropriate.
+
+## Cluster Architecture
+
+**Cluster Name**: KavCorp
+**Nodes**: 5 (pm1, pm2, pm3, pm4, elantris)
+**Network**: 10.4.2.0/24
+**Primary Management Node**: pm2 (10.4.2.6)
+
+### Node IP Mapping
+- pm1: 10.4.2.2
+- pm2: 10.4.2.6 (primary for new LXC deployment)
+- pm3: 10.4.2.3
+- pm4: 10.4.2.5
+- elantris: 10.4.2.14 (largest node, 128GB RAM, ZFS storage)
+
+## Common Commands
+
+### Cluster Management
+```bash
+# Access cluster (use pm2 as primary management node)
+ssh pm2
+
+# View cluster status
+pvecm status
+pvecm nodes
+
+# List all VMs/LXCs across cluster
+pvesh get /cluster/resources --type vm --output-format json
+
+# List all nodes
+pvesh get /cluster/resources --type node --output-format json
+
+# List storage
+pvesh get /cluster/resources --type storage --output-format json
+```
+
+### LXC Management
+```bash
+# List LXCs on a specific node
+pct list
+
+# Get LXC configuration
+pvesh get /nodes/<node>/lxc/<vmid>/config
+pct config <vmid>
+
+# Start/stop/restart LXC
+pct start <vmid>
+pct stop <vmid>
+pct restart <vmid>
+
+# Execute command in LXC
+pct exec <vmid> -- <command>
+
+# Enter LXC console
+pct enter <vmid>
+
+# Create LXC from template
+pct create <vmid> <template> --hostname <name> --cores <n> --memory <mb> --rootfs <storage>:<size>
+```
+
+### Network Configuration
+```bash
+# View network interfaces
+ip -br addr show
+
+# Network config location
+/etc/network/interfaces
+
+# Standard bridge: vmbr0 (connected to eno1 physical interface)
+# Gateway: 10.4.2.254
+```
+
+## Storage Architecture
+
+### Storage Pools
+
+**Local Storage** (per-node):
+- `local`: Directory storage on each node, for backups/templates/ISOs (~100GB each)
+- `local-lvm`: LVM thin pool on each node, for VM/LXC disks (~350-375GB each)
+
+**ZFS Pools**:
+- `el-pool`: ZFS pool on elantris (24TB), used for large data storage
+
+**NFS Mounts** (shared):
+- `KavNas`: Primary NFS share from Synology NAS (10.4.2.13), ~23TB - used for backups, ISOs, and LXC storage
+- `elantris-downloads`: NFS share from elantris, ~23TB - used for media downloads
+
+### Storage Recommendations
+- **New LXC containers**: Use `KavNas` for rootfs (NFS, easily backed up)
+- **High-performance workloads**: Use `local-lvm` on the host node
+- **Large data storage**: Use `elantris-downloads` or `el-pool`
+- **Templates and ISOs**: Store in `KavNas` or node's `local`
+
+## Service Categories (by tags)
+
+- **arr**: Media automation (*arr stack - Sonarr, Radarr, Prowlarr, Bazarr, Whisparr)
+- **media**: Media servers (Jellyfin, Jellyseerr, Kometa)
+- **proxy**: Reverse proxy (Traefik)
+- **authenticator**: Authentication (Authelia)
+- **nvr**: Network Video Recorder (Shinobi)
+- **docker**: Docker host LXCs (docker-pm2, docker-pm4) and VMs (docker-pm3)
+- **proxmox-helper-scripts**: Deployed via community scripts
+- **community-script**: Deployed via ProxmoxVE Helper Scripts
+
+## Migration Strategy
+
+**Goal**: Move services from Docker containers to dedicated LXCs where it makes sense.
+
+**Good candidates for LXC migration**:
+- Single-purpose services
+- Services with simple dependencies
+- Stateless applications
+- Services that benefit from isolation
+
+**Keep in Docker**:
+- Complex multi-container stacks
+- Services requiring Docker-specific features
+- Temporary/experimental services
+
+**Current Docker Hosts**:
+- VM 109: docker-pm3 (on pm3, 4 CPU, 12GB RAM)
+- LXC 110: docker-pm4 (on pm4, 4 CPU, 8GB RAM)
+- LXC 113: docker-pm2 (on pm2, 4 CPU, 8GB RAM)
+- LXC 107: dockge (on pm3, 12 CPU, 8GB RAM) - Docker management UI
+
+## IP Address Allocation
+
+**Infrastructure Services**:
+- 10.4.2.10: traefik (LXC 104)
+- 10.4.2.13: KavNas (Synology NAS)
+- 10.4.2.14: elantris
+
+**Media Stack**:
+- 10.4.2.15: sonarr (LXC 105)
+- 10.4.2.16: radarr (LXC 108)
+- 10.4.2.17: prowlarr (LXC 114)
+- 10.4.2.18: bazarr (LXC 119)
+- 10.4.2.19: whisparr (LXC 117)
+- 10.4.2.20: jellyseerr (LXC 115)
+- 10.4.2.21: kometa (LXC 120)
+- 10.4.2.22: jellyfin (LXC 121)
+
+**Other Services**:
+- 10.4.2.23: authelia (LXC 116)
+- 10.4.2.24: notifiarr (LXC 118)
+
+*Note: Update docs/network.md when allocating new IPs*
+
+## Documentation Structure
+
+**CRITICAL**: Always read `docs/README.md` first to understand the documentation system.
+
+### Core Documentation Files (ALWAYS UPDATE, NEVER CREATE NEW)
+
+1. **`docs/INFRASTRUCTURE.md`** - Single source of truth
+   - **CHECK THIS FIRST** for node IPs, service locations, storage paths
+   - Update whenever infrastructure changes
+
+2. **`docs/CONFIGURATIONS.md`** - Service configurations
+   - API keys, config snippets, copy/paste ready configs
+   - Update when service configs change
+
+3. **`docs/DECISIONS.md`** - Architecture decisions
+   - Why we made choices, common patterns, troubleshooting
+   - Update when making decisions or discovering patterns
+
+4. **`docs/TASKS.md`** - Current work tracking
+   - Active, pending, blocked, and completed tasks
+   - Update at start and end of work sessions
+
+5. **`docs/CHANGELOG.md`** - Historical record
+   - Date-stamped entries for all changes
+   - Update after completing any significant work
+
+### Documentation Workflow
+
+**MANDATORY - Before ANY work session**:
+1. Read `docs/README.md` - Understand the documentation system
+2. Check `docs/INFRASTRUCTURE.md` - Get current infrastructure state
+3. Check `docs/TASKS.md` - See what's already in progress or pending
+
+**MANDATORY - During work**:
+- When you need node IPs, service locations, or paths → Read `docs/INFRASTRUCTURE.md`
+- When you need config snippets or API keys → Read `docs/CONFIGURATIONS.md`
+- When wondering "why is it done this way?" → Read `docs/DECISIONS.md`
+- When you discover a pattern or make a decision → Immediately update `docs/DECISIONS.md`
+- When you encounter issues → Check `docs/DECISIONS.md` Known Issues section first
+
+**MANDATORY - After completing ANY work**:
+1. Update the relevant core doc:
+   - Infrastructure change? → Update `docs/INFRASTRUCTURE.md`
+   - Config change? → Update `docs/CONFIGURATIONS.md`
+   - New pattern/decision? → Update `docs/DECISIONS.md`
+2. Add dated entry to `docs/CHANGELOG.md` describing what changed
+3. Update `docs/TASKS.md` to mark work complete or add new tasks
+4. Update "Last Updated" date in `docs/README.md`
+
+**STRICTLY FORBIDDEN**:
+- Creating new documentation files without explicit user approval
+- Leaving documentation outdated after making changes
+- Creating session-specific notes files (use CHANGELOG for history)
+- Skipping documentation updates "to save time"
+- Assuming you remember infrastructure details (always check docs)
+
+### When to Update Which File
+
+| You just did... | Update this file |
+|----------------|------------------|
+| Added/removed a service | `INFRASTRUCTURE.md` (service map) |
+| Changed an IP address | `INFRASTRUCTURE.md` (service map) |
+| Modified service config | `CONFIGURATIONS.md` (add/update config snippet) |
+| Changed API key | `CONFIGURATIONS.md` (update credentials) |
+| Made architectural decision | `DECISIONS.md` (add to decisions section) |
+| Discovered troubleshooting pattern | `DECISIONS.md` (add to common patterns) |
+| Hit a recurring issue | `DECISIONS.md` (add to known issues) |
+| Completed a task | `TASKS.md` (mark complete) + `CHANGELOG.md` (add entry) |
+| Started new work | `TASKS.md` (add to in progress) |
+| ANY significant change | `CHANGELOG.md` (always add dated entry) |
+
+## Scripts
+
+- `scripts/provisioning/`: LXC/VM creation scripts
+- `scripts/backup/`: Backup automation scripts
+- `scripts/monitoring/`: Monitoring and health check scripts
+
+## Workflow Notes
+
+- New LXCs are primarily deployed on **pm2**
+- Use ProxmoxVE Helper Scripts (https://helper-scripts.com) for common services
+- Always tag LXCs appropriately for organization
+- Document service URLs and access details in `docs/services.md`
+- Keep inventory documentation in sync with changes