Initial commit: KavCorp infrastructure documentation

- CLAUDE.md: Project configuration for Claude Code
- docs/: Infrastructure documentation
  - INFRASTRUCTURE.md: Service map, storage, network
  - CONFIGURATIONS.md: Service configs and credentials
  - CHANGELOG.md: Change history
  - DECISIONS.md: Architecture decisions
  - TASKS.md: Task tracking
- scripts/: Automation scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-07 22:07:01 -05:00
commit 120c2ec809
19 changed files with 3448 additions and 0 deletions

239
CLAUDE.md Normal file
View File

@@ -0,0 +1,239 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Repository Purpose
Infrastructure documentation and management repository for the **KavCorp** Proxmox cluster - a 5-node homelab cluster running self-hosted services. This repository supports migration from Docker containers to Proxmox LXCs where appropriate.
## Cluster Architecture
**Cluster Name**: KavCorp
**Nodes**: 5 (pm1, pm2, pm3, pm4, elantris)
**Network**: 10.4.2.0/24
**Primary Management Node**: pm2 (10.4.2.6)
### Node IP Mapping
- pm1: 10.4.2.2
- pm2: 10.4.2.6 (primary for new LXC deployment)
- pm3: 10.4.2.3
- pm4: 10.4.2.5
- elantris: 10.4.2.14 (largest node, 128GB RAM, ZFS storage)
## Common Commands
### Cluster Management
```bash
# Access cluster (use pm2 as primary management node)
ssh pm2
# View cluster status
pvecm status
pvecm nodes
# List all VMs/LXCs across cluster
pvesh get /cluster/resources --type vm --output-format json
# List all nodes
pvesh get /cluster/resources --type node --output-format json
# List storage
pvesh get /cluster/resources --type storage --output-format json
```
### LXC Management
```bash
# List LXCs on a specific node
pct list
# Get LXC configuration
pvesh get /nodes/<node>/lxc/<vmid>/config
pct config <vmid>
# Start/stop/restart LXC
pct start <vmid>
pct stop <vmid>
pct restart <vmid>
# Execute command in LXC
pct exec <vmid> -- <command>
# Enter LXC console
pct enter <vmid>
# Create LXC from template
pct create <vmid> <template> --hostname <name> --cores <n> --memory <mb> --rootfs <storage>:<size>
```
### Network Configuration
```bash
# View network interfaces
ip -br addr show
# Network config location
/etc/network/interfaces
# Standard bridge: vmbr0 (connected to eno1 physical interface)
# Gateway: 10.4.2.254
```
## Storage Architecture
### Storage Pools
**Local Storage** (per-node):
- `local`: Directory storage on each node, for backups/templates/ISOs (~100GB each)
- `local-lvm`: LVM thin pool on each node, for VM/LXC disks (~350-375GB each)
**ZFS Pools**:
- `el-pool`: ZFS pool on elantris (24TB), used for large data storage
**NFS Mounts** (shared):
- `KavNas`: Primary NFS share from Synology NAS (10.4.2.13), ~23TB - used for backups, ISOs, and LXC storage
- `elantris-downloads`: NFS share from elantris, ~23TB - used for media downloads
### Storage Recommendations
- **New LXC containers**: Use `KavNas` for rootfs (NFS, easily backed up)
- **High-performance workloads**: Use `local-lvm` on the host node
- **Large data storage**: Use `elantris-downloads` or `el-pool`
- **Templates and ISOs**: Store in `KavNas` or node's `local`
## Service Categories (by tags)
- **arr**: Media automation (*arr stack - Sonarr, Radarr, Prowlarr, Bazarr, Whisparr)
- **media**: Media servers (Jellyfin, Jellyseerr, Kometa)
- **proxy**: Reverse proxy (Traefik)
- **authenticator**: Authentication (Authelia)
- **nvr**: Network Video Recorder (Shinobi)
- **docker**: Docker host LXCs (docker-pm2, docker-pm4) and VMs (docker-pm3)
- **proxmox-helper-scripts**: Deployed via community scripts
- **community-script**: Deployed via ProxmoxVE Helper Scripts
## Migration Strategy
**Goal**: Move services from Docker containers to dedicated LXCs where it makes sense.
**Good candidates for LXC migration**:
- Single-purpose services
- Services with simple dependencies
- Stateless applications
- Services that benefit from isolation
**Keep in Docker**:
- Complex multi-container stacks
- Services requiring Docker-specific features
- Temporary/experimental services
**Current Docker Hosts**:
- VM 109: docker-pm3 (on pm3, 4 CPU, 12GB RAM)
- LXC 110: docker-pm4 (on pm4, 4 CPU, 8GB RAM)
- LXC 113: docker-pm2 (on pm2, 4 CPU, 8GB RAM)
- LXC 107: dockge (on pm3, 12 CPU, 8GB RAM) - Docker management UI
## IP Address Allocation
**Infrastructure Services**:
- 10.4.2.10: traefik (LXC 104)
- 10.4.2.13: KavNas (Synology NAS)
- 10.4.2.14: elantris
**Media Stack**:
- 10.4.2.15: sonarr (LXC 105)
- 10.4.2.16: radarr (LXC 108)
- 10.4.2.17: prowlarr (LXC 114)
- 10.4.2.18: bazarr (LXC 119)
- 10.4.2.19: whisparr (LXC 117)
- 10.4.2.20: jellyseerr (LXC 115)
- 10.4.2.21: kometa (LXC 120)
- 10.4.2.22: jellyfin (LXC 121)
**Other Services**:
- 10.4.2.23: authelia (LXC 116)
- 10.4.2.24: notifiarr (LXC 118)
*Note: Update docs/network.md when allocating new IPs*
## Documentation Structure
**CRITICAL**: Always read `docs/README.md` first to understand the documentation system.
### Core Documentation Files (ALWAYS UPDATE, NEVER CREATE NEW)
1. **`docs/INFRASTRUCTURE.md`** - Single source of truth
- **CHECK THIS FIRST** for node IPs, service locations, storage paths
- Update whenever infrastructure changes
2. **`docs/CONFIGURATIONS.md`** - Service configurations
- API keys, config snippets, copy/paste ready configs
- Update when service configs change
3. **`docs/DECISIONS.md`** - Architecture decisions
- Why we made choices, common patterns, troubleshooting
- Update when making decisions or discovering patterns
4. **`docs/TASKS.md`** - Current work tracking
- Active, pending, blocked, and completed tasks
- Update at start and end of work sessions
5. **`docs/CHANGELOG.md`** - Historical record
- Date-stamped entries for all changes
- Update after completing any significant work
### Documentation Workflow
**MANDATORY - Before ANY work session**:
1. Read `docs/README.md` - Understand the documentation system
2. Check `docs/INFRASTRUCTURE.md` - Get current infrastructure state
3. Check `docs/TASKS.md` - See what's already in progress or pending
**MANDATORY - During work**:
- When you need node IPs, service locations, or paths → Read `docs/INFRASTRUCTURE.md`
- When you need config snippets or API keys → Read `docs/CONFIGURATIONS.md`
- When wondering "why is it done this way?" → Read `docs/DECISIONS.md`
- When you discover a pattern or make a decision → Immediately update `docs/DECISIONS.md`
- When you encounter issues → Check `docs/DECISIONS.md` Known Issues section first
**MANDATORY - After completing ANY work**:
1. Update the relevant core doc:
- Infrastructure change? → Update `docs/INFRASTRUCTURE.md`
- Config change? → Update `docs/CONFIGURATIONS.md`
- New pattern/decision? → Update `docs/DECISIONS.md`
2. Add dated entry to `docs/CHANGELOG.md` describing what changed
3. Update `docs/TASKS.md` to mark work complete or add new tasks
4. Update "Last Updated" date in `docs/README.md`
**STRICTLY FORBIDDEN**:
- Creating new documentation files without explicit user approval
- Leaving documentation outdated after making changes
- Creating session-specific notes files (use CHANGELOG for history)
- Skipping documentation updates "to save time"
- Assuming you remember infrastructure details (always check docs)
### When to Update Which File
| You just did... | Update this file |
|----------------|------------------|
| Added/removed a service | `INFRASTRUCTURE.md` (service map) |
| Changed an IP address | `INFRASTRUCTURE.md` (service map) |
| Modified service config | `CONFIGURATIONS.md` (add/update config snippet) |
| Changed API key | `CONFIGURATIONS.md` (update credentials) |
| Made architectural decision | `DECISIONS.md` (add to decisions section) |
| Discovered troubleshooting pattern | `DECISIONS.md` (add to common patterns) |
| Hit a recurring issue | `DECISIONS.md` (add to known issues) |
| Completed a task | `TASKS.md` (mark complete) + `CHANGELOG.md` (add entry) |
| Started new work | `TASKS.md` (add to in progress) |
| ANY significant change | `CHANGELOG.md` (always add dated entry) |
## Scripts
- `scripts/provisioning/`: LXC/VM creation scripts
- `scripts/backup/`: Backup automation scripts
- `scripts/monitoring/`: Monitoring and health check scripts
## Workflow Notes
- New LXCs are primarily deployed on **pm2**
- Use ProxmoxVE Helper Scripts (https://helper-scripts.com) for common services
- Always tag LXCs appropriately for organization
- Document service URLs and access details in `docs/services.md`
- Keep inventory documentation in sync with changes