Files
proxmox-infra/CLAUDE.md
kavren 7b78c474f5 docs: Add git commit policy to CLAUDE.md
Instructs Claude to commit frequently and evaluate after each tool call
whether changes should be committed. Includes commit message format
and guidelines for immediate vs batched commits.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:18:42 -05:00

269 lines
8.7 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Git Commit Policy - IMPORTANT
**You MUST commit and push changes frequently.** Evaluate after each tool call whether a commit makes sense:
### Commit immediately after:
- Any Edit or Write to documentation files (docs/*.md, CLAUDE.md)
- Creating or modifying Traefik configs
- Adding new services to infrastructure
- Completing a discrete task or fix
### Commit in batches for:
- Multiple related file edits (e.g., updating INFRASTRUCTURE.md + CHANGELOG.md together)
- Exploratory changes that may be reverted
### Commit message format:
```
<type>: <short description>
<optional body>
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
```
Types: `add`, `update`, `fix`, `docs`, `config`
**Remote**: `origin` → http://10.4.2.7:3000/kavren/proxmox-infra.git
## Repository Purpose
Infrastructure documentation and management repository for the **KavCorp** Proxmox cluster - a 5-node homelab cluster running self-hosted services. This repository supports migration from Docker containers to Proxmox LXCs where appropriate.
## Cluster Architecture
**Cluster Name**: KavCorp
**Nodes**: 5 (pm1, pm2, pm3, pm4, elantris)
**Network**: 10.4.2.0/24
**Primary Management Node**: pm2 (10.4.2.6)
### Node IP Mapping
- pm1: 10.4.2.2
- pm2: 10.4.2.6 (primary for new LXC deployment)
- pm3: 10.4.2.3
- pm4: 10.4.2.5
- elantris: 10.4.2.14 (largest node, 128GB RAM, ZFS storage)
## Common Commands
### Cluster Management
```bash
# Access cluster (use pm2 as primary management node)
ssh pm2
# View cluster status
pvecm status
pvecm nodes
# List all VMs/LXCs across cluster
pvesh get /cluster/resources --type vm --output-format json
# List all nodes
pvesh get /cluster/resources --type node --output-format json
# List storage
pvesh get /cluster/resources --type storage --output-format json
```
### LXC Management
```bash
# List LXCs on a specific node
pct list
# Get LXC configuration
pvesh get /nodes/<node>/lxc/<vmid>/config
pct config <vmid>
# Start/stop/restart LXC
pct start <vmid>
pct stop <vmid>
pct restart <vmid>
# Execute command in LXC
pct exec <vmid> -- <command>
# Enter LXC console
pct enter <vmid>
# Create LXC from template
pct create <vmid> <template> --hostname <name> --cores <n> --memory <mb> --rootfs <storage>:<size>
```
### Network Configuration
```bash
# View network interfaces
ip -br addr show
# Network config location
/etc/network/interfaces
# Standard bridge: vmbr0 (connected to eno1 physical interface)
# Gateway: 10.4.2.254
```
## Storage Architecture
### Storage Pools
**Local Storage** (per-node):
- `local`: Directory storage on each node, for backups/templates/ISOs (~100GB each)
- `local-lvm`: LVM thin pool on each node, for VM/LXC disks (~350-375GB each)
**ZFS Pools**:
- `el-pool`: ZFS pool on elantris (24TB), used for large data storage
**NFS Mounts** (shared):
- `KavNas`: Primary NFS share from Synology NAS (10.4.2.13), ~23TB - used for backups, ISOs, and LXC storage
- `elantris-downloads`: NFS share from elantris, ~23TB - used for media downloads
### Storage Recommendations
- **New LXC containers**: Use `KavNas` for rootfs (NFS, easily backed up)
- **High-performance workloads**: Use `local-lvm` on the host node
- **Large data storage**: Use `elantris-downloads` or `el-pool`
- **Templates and ISOs**: Store in `KavNas` or node's `local`
## Service Categories (by tags)
- **arr**: Media automation (*arr stack - Sonarr, Radarr, Prowlarr, Bazarr, Whisparr)
- **media**: Media servers (Jellyfin, Jellyseerr, Kometa)
- **proxy**: Reverse proxy (Traefik)
- **authenticator**: Authentication (Authelia)
- **nvr**: Network Video Recorder (Shinobi)
- **docker**: Docker host LXCs (docker-pm2, docker-pm4) and VMs (docker-pm3)
- **proxmox-helper-scripts**: Deployed via community scripts
- **community-script**: Deployed via ProxmoxVE Helper Scripts
## Migration Strategy
**Goal**: Move services from Docker containers to dedicated LXCs where it makes sense.
**Good candidates for LXC migration**:
- Single-purpose services
- Services with simple dependencies
- Stateless applications
- Services that benefit from isolation
**Keep in Docker**:
- Complex multi-container stacks
- Services requiring Docker-specific features
- Temporary/experimental services
**Current Docker Hosts**:
- VM 109: docker-pm3 (on pm3, 4 CPU, 12GB RAM)
- LXC 110: docker-pm4 (on pm4, 4 CPU, 8GB RAM)
- LXC 113: docker-pm2 (on pm2, 4 CPU, 8GB RAM)
- LXC 107: dockge (on pm3, 12 CPU, 8GB RAM) - Docker management UI
## IP Address Allocation
**Infrastructure Services**:
- 10.4.2.10: traefik (LXC 104)
- 10.4.2.13: KavNas (Synology NAS)
- 10.4.2.14: elantris
**Media Stack**:
- 10.4.2.15: sonarr (LXC 105)
- 10.4.2.16: radarr (LXC 108)
- 10.4.2.17: prowlarr (LXC 114)
- 10.4.2.18: bazarr (LXC 119)
- 10.4.2.19: whisparr (LXC 117)
- 10.4.2.20: jellyseerr (LXC 115)
- 10.4.2.21: kometa (LXC 120)
- 10.4.2.22: jellyfin (LXC 121)
**Other Services**:
- 10.4.2.23: authelia (LXC 116)
- 10.4.2.24: notifiarr (LXC 118)
*Note: Update docs/network.md when allocating new IPs*
## Documentation Structure
**CRITICAL**: Always read `docs/README.md` first to understand the documentation system.
### Core Documentation Files (ALWAYS UPDATE, NEVER CREATE NEW)
1. **`docs/INFRASTRUCTURE.md`** - Single source of truth
- **CHECK THIS FIRST** for node IPs, service locations, storage paths
- Update whenever infrastructure changes
2. **`docs/CONFIGURATIONS.md`** - Service configurations
- API keys, config snippets, copy/paste ready configs
- Update when service configs change
3. **`docs/DECISIONS.md`** - Architecture decisions
- Why we made choices, common patterns, troubleshooting
- Update when making decisions or discovering patterns
4. **`docs/TASKS.md`** - Current work tracking
- Active, pending, blocked, and completed tasks
- Update at start and end of work sessions
5. **`docs/CHANGELOG.md`** - Historical record
- Date-stamped entries for all changes
- Update after completing any significant work
### Documentation Workflow
**MANDATORY - Before ANY work session**:
1. Read `docs/README.md` - Understand the documentation system
2. Check `docs/INFRASTRUCTURE.md` - Get current infrastructure state
3. Check `docs/TASKS.md` - See what's already in progress or pending
**MANDATORY - During work**:
- When you need node IPs, service locations, or paths → Read `docs/INFRASTRUCTURE.md`
- When you need config snippets or API keys → Read `docs/CONFIGURATIONS.md`
- When wondering "why is it done this way?" → Read `docs/DECISIONS.md`
- When you discover a pattern or make a decision → Immediately update `docs/DECISIONS.md`
- When you encounter issues → Check `docs/DECISIONS.md` Known Issues section first
**MANDATORY - After completing ANY work**:
1. Update the relevant core doc:
- Infrastructure change? → Update `docs/INFRASTRUCTURE.md`
- Config change? → Update `docs/CONFIGURATIONS.md`
- New pattern/decision? → Update `docs/DECISIONS.md`
2. Add dated entry to `docs/CHANGELOG.md` describing what changed
3. Update `docs/TASKS.md` to mark work complete or add new tasks
4. Update "Last Updated" date in `docs/README.md`
**STRICTLY FORBIDDEN**:
- Creating new documentation files without explicit user approval
- Leaving documentation outdated after making changes
- Creating session-specific notes files (use CHANGELOG for history)
- Skipping documentation updates "to save time"
- Assuming you remember infrastructure details (always check docs)
### When to Update Which File
| You just did... | Update this file |
|----------------|------------------|
| Added/removed a service | `INFRASTRUCTURE.md` (service map) |
| Changed an IP address | `INFRASTRUCTURE.md` (service map) |
| Modified service config | `CONFIGURATIONS.md` (add/update config snippet) |
| Changed API key | `CONFIGURATIONS.md` (update credentials) |
| Made architectural decision | `DECISIONS.md` (add to decisions section) |
| Discovered troubleshooting pattern | `DECISIONS.md` (add to common patterns) |
| Hit a recurring issue | `DECISIONS.md` (add to known issues) |
| Completed a task | `TASKS.md` (mark complete) + `CHANGELOG.md` (add entry) |
| Started new work | `TASKS.md` (add to in progress) |
| ANY significant change | `CHANGELOG.md` (always add dated entry) |
## Scripts
- `scripts/provisioning/`: LXC/VM creation scripts
- `scripts/backup/`: Backup automation scripts
- `scripts/monitoring/`: Monitoring and health check scripts
## Workflow Notes
- New LXCs are primarily deployed on **pm2**
- Use ProxmoxVE Helper Scripts (https://helper-scripts.com) for common services
- Always tag LXCs appropriately for organization
- Document service URLs and access details in `docs/services.md`
- Keep inventory documentation in sync with changes