proxmox-infra/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Git Commit Policy - IMPORTANT

**You MUST commit and push changes frequently.** Evaluate after each tool call whether a commit makes sense:

### Commit immediately after:
- Any Edit or Write to documentation files (docs/*.md, CLAUDE.md)
- Creating or modifying Traefik configs
- Adding new services to infrastructure
- Completing a discrete task or fix

### Commit in batches for:
- Multiple related file edits (e.g., updating INFRASTRUCTURE.md + CHANGELOG.md together)
- Exploratory changes that may be reverted

### Commit message format:
```
<type>: <short description>

<optional body>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
```

Types: `add`, `update`, `fix`, `docs`, `config`

**Remote**: `origin` → http://10.4.2.7:3000/kavren/proxmox-infra.git

## Repository Purpose

Infrastructure documentation and management repository for the **KavCorp** Proxmox cluster - a 5-node homelab cluster running self-hosted services. This repository supports migration from Docker containers to Proxmox LXCs where appropriate.

## Cluster Architecture

**Cluster Name**: KavCorp
**Nodes**: 5 (pm1, pm2, pm3, pm4, elantris)
**Network**: 10.4.2.0/24
**Primary Management Node**: pm2 (10.4.2.6)

### Node IP Mapping
- pm1: 10.4.2.2
- pm2: 10.4.2.6 (primary for new LXC deployment)
- pm3: 10.4.2.3
- pm4: 10.4.2.5
- elantris: 10.4.2.14 (largest node, 128GB RAM, ZFS storage)

## Common Commands

### Cluster Management
```bash
# Access cluster (use pm2 as primary management node)
ssh pm2

# View cluster status
pvecm status
pvecm nodes

# List all VMs/LXCs across cluster
pvesh get /cluster/resources --type vm --output-format json

# List all nodes
pvesh get /cluster/resources --type node --output-format json

# List storage
pvesh get /cluster/resources --type storage --output-format json
```

### LXC Management
```bash
# List LXCs on a specific node
pct list

# Get LXC configuration
pvesh get /nodes/<node>/lxc/<vmid>/config
pct config <vmid>

# Start/stop/restart LXC
pct start <vmid>
pct stop <vmid>
pct restart <vmid>

# Execute command in LXC
pct exec <vmid> -- <command>

# Enter LXC console
pct enter <vmid>

# Create LXC from template
pct create <vmid> <template> --hostname <name> --cores <n> --memory <mb> --rootfs <storage>:<size>
```

### Network Configuration
```bash
# View network interfaces
ip -br addr show

# Network config location
/etc/network/interfaces

# Standard bridge: vmbr0 (connected to eno1 physical interface)
# Gateway: 10.4.2.254
```

## Storage Architecture

### Storage Pools

**Local Storage** (per-node):
- `local`: Directory storage on each node, for backups/templates/ISOs (~100GB each)
- `local-lvm`: LVM thin pool on each node, for VM/LXC disks (~350-375GB each)

**ZFS Pools**:
- `el-pool`: ZFS pool on elantris (24TB), used for large data storage

**NFS Mounts** (shared):
- `KavNas`: Primary NFS share from Synology NAS (10.4.2.13), ~23TB - used for backups, ISOs, and LXC storage
- `elantris-downloads`: NFS share from elantris, ~23TB - used for media downloads

### Storage Recommendations
- **New LXC containers**: Use `KavNas` for rootfs (NFS, easily backed up)
- **High-performance workloads**: Use `local-lvm` on the host node
- **Large data storage**: Use `elantris-downloads` or `el-pool`
- **Templates and ISOs**: Store in `KavNas` or node's `local`

## Service Categories (by tags)

- **arr**: Media automation (*arr stack - Sonarr, Radarr, Prowlarr, Bazarr, Whisparr)
- **media**: Media servers (Jellyfin, Jellyseerr, Kometa)
- **proxy**: Reverse proxy (Traefik)
- **authenticator**: Authentication (Authelia)
- **nvr**: Network Video Recorder (Shinobi)
- **docker**: Docker host LXCs (docker-pm2, docker-pm4) and VMs (docker-pm3)
- **proxmox-helper-scripts**: Deployed via community scripts
- **community-script**: Deployed via ProxmoxVE Helper Scripts

## Migration Strategy

**Goal**: Move services from Docker containers to dedicated LXCs where it makes sense.

**Good candidates for LXC migration**:
- Single-purpose services
- Services with simple dependencies
- Stateless applications
- Services that benefit from isolation

**Keep in Docker**:
- Complex multi-container stacks
- Services requiring Docker-specific features
- Temporary/experimental services

**Current Docker Hosts**:
- VM 109: docker-pm3 (on pm3, 4 CPU, 12GB RAM)
- LXC 110: docker-pm4 (on pm4, 4 CPU, 8GB RAM)
- LXC 113: docker-pm2 (on pm2, 4 CPU, 8GB RAM)
- LXC 107: dockge (on pm3, 12 CPU, 8GB RAM) - Docker management UI

## IP Address Allocation

**Infrastructure Services**:
- 10.4.2.10: traefik (LXC 104)
- 10.4.2.13: KavNas (Synology NAS)
- 10.4.2.14: elantris

**Media Stack**:
- 10.4.2.15: sonarr (LXC 105)
- 10.4.2.16: radarr (LXC 108)
- 10.4.2.17: prowlarr (LXC 114)
- 10.4.2.18: bazarr (LXC 119)
- 10.4.2.19: whisparr (LXC 117)
- 10.4.2.20: jellyseerr (LXC 115)
- 10.4.2.21: kometa (LXC 120)
- 10.4.2.22: jellyfin (LXC 121)

**Other Services**:
- 10.4.2.23: authelia (LXC 116)
- 10.4.2.24: notifiarr (LXC 118)

*Note: Update docs/network.md when allocating new IPs*

## Documentation Structure

**CRITICAL**: Always read `docs/README.md` first to understand the documentation system.

### Core Documentation Files (ALWAYS UPDATE, NEVER CREATE NEW)

1. **`docs/INFRASTRUCTURE.md`** - Single source of truth
   - **CHECK THIS FIRST** for node IPs, service locations, storage paths
   - Update whenever infrastructure changes

2. **`docs/CONFIGURATIONS.md`** - Service configurations
   - API keys, config snippets, copy/paste ready configs
   - Update when service configs change

3. **`docs/DECISIONS.md`** - Architecture decisions
   - Why we made choices, common patterns, troubleshooting
   - Update when making decisions or discovering patterns

4. **`docs/TASKS.md`** - Current work tracking
   - Active, pending, blocked, and completed tasks
   - Update at start and end of work sessions

5. **`docs/CHANGELOG.md`** - Historical record
   - Date-stamped entries for all changes
   - Update after completing any significant work

### Documentation Workflow

**MANDATORY - Before ANY work session**:
1. Read `docs/README.md` - Understand the documentation system
2. Check `docs/INFRASTRUCTURE.md` - Get current infrastructure state
3. Check `docs/TASKS.md` - See what's already in progress or pending

**MANDATORY - During work**:
- When you need node IPs, service locations, or paths → Read `docs/INFRASTRUCTURE.md`
- When you need config snippets or API keys → Read `docs/CONFIGURATIONS.md`
- When wondering "why is it done this way?" → Read `docs/DECISIONS.md`
- When you discover a pattern or make a decision → Immediately update `docs/DECISIONS.md`
- When you encounter issues → Check `docs/DECISIONS.md` Known Issues section first

**MANDATORY - After completing ANY work**:
1. Update the relevant core doc:
   - Infrastructure change? → Update `docs/INFRASTRUCTURE.md`
   - Config change? → Update `docs/CONFIGURATIONS.md`
   - New pattern/decision? → Update `docs/DECISIONS.md`
2. Add dated entry to `docs/CHANGELOG.md` describing what changed
3. Update `docs/TASKS.md` to mark work complete or add new tasks
4. Update "Last Updated" date in `docs/README.md`

**STRICTLY FORBIDDEN**:
- Creating new documentation files without explicit user approval
- Leaving documentation outdated after making changes
- Creating session-specific notes files (use CHANGELOG for history)
- Skipping documentation updates "to save time"
- Assuming you remember infrastructure details (always check docs)

### When to Update Which File

| You just did... | Update this file |
|----------------|------------------|
| Added/removed a service | `INFRASTRUCTURE.md` (service map) |
| Changed an IP address | `INFRASTRUCTURE.md` (service map) |
| Modified service config | `CONFIGURATIONS.md` (add/update config snippet) |
| Changed API key | `CONFIGURATIONS.md` (update credentials) |
| Made architectural decision | `DECISIONS.md` (add to decisions section) |
| Discovered troubleshooting pattern | `DECISIONS.md` (add to common patterns) |
| Hit a recurring issue | `DECISIONS.md` (add to known issues) |
| Completed a task | `TASKS.md` (mark complete) + `CHANGELOG.md` (add entry) |
| Started new work | `TASKS.md` (add to in progress) |
| ANY significant change | `CHANGELOG.md` (always add dated entry) |

## Scripts

- `scripts/provisioning/`: LXC/VM creation scripts
- `scripts/backup/`: Backup automation scripts
- `scripts/monitoring/`: Monitoring and health check scripts

## Workflow Notes

- New LXCs are primarily deployed on **pm2**
- Use ProxmoxVE Helper Scripts (https://helper-scripts.com) for common services
- Always tag LXCs appropriately for organization
- Document service URLs and access details in `docs/services.md`
- Keep inventory documentation in sync with changes