Initial commit: KavCorp infrastructure documentation

- CLAUDE.md: Project configuration for Claude Code
- docs/: Infrastructure documentation
  - INFRASTRUCTURE.md: Service map, storage, network
  - CONFIGURATIONS.md: Service configs and credentials
  - CHANGELOG.md: Change history
  - DECISIONS.md: Architecture decisions
  - TASKS.md: Task tracking
- scripts/: Automation scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-07 22:07:01 -05:00
commit 120c2ec809
19 changed files with 3448 additions and 0 deletions

184
docs/storage.md Normal file
View File

@@ -0,0 +1,184 @@
# Storage Architecture
**Last Updated**: 2025-11-16
## Storage Overview
The KavCorp cluster uses a multi-tiered storage approach:
1. **Local node storage**: For node-specific data, templates, ISOs
2. **NFS shared storage**: For LXC containers, backups, and shared data
3. **ZFS pools**: For high-performance storage on specific nodes
## Storage Pools
### Local Storage (Per-Node)
Each node has two local storage pools:
#### `local` - Directory Storage
- **Type**: Directory
- **Size**: ~100GB per node
- **Content Types**: backup, vztmpl (templates), iso
- **Location**: `/var/lib/vz`
- **Usage**: Node-specific backups, templates, ISO images
- **Shared**: No
**Per-Node Status**:
| Node | Used | Total | Available |
|---|---|---|---|
| pm1 | 10.1GB | 100.9GB | 90.8GB |
| pm2 | 8.0GB | 100.9GB | 92.9GB |
| pm3 | 6.9GB | 100.9GB | 94.0GB |
| pm4 | 7.5GB | 100.9GB | 93.4GB |
| elantris | 4.1GB | 100.9GB | 96.8GB |
#### `local-lvm` - LVM Thin Pool
- **Type**: LVM Thin
- **Size**: ~350-375GB per node (varies)
- **Content Types**: rootdir, images
- **Usage**: High-performance VM/LXC disks
- **Shared**: No
- **Best For**: Services requiring fast local storage
**Per-Node Status**:
| Node | Used | Total | Available |
|---|---|---|---|
| pm1 | 16.9GB | 374.5GB | 357.6GB |
| pm2 | 0GB | 374.5GB | 374.5GB |
| pm3 | 178.8GB | 362.8GB | 184.0GB |
| pm4 | 0GB | 374.5GB | 374.5GB |
| elantris | 0GB | 362.8GB | 362.8GB |
**Note**: pm3's local-lvm is heavily used (178.8GB) due to:
- VMID 107: dockge (120GB)
- VMID 111: frigate (120GB)
- VMID 112: foundryvtt (100GB)
### NFS Shared Storage
#### `KavNas` - Primary Shared Storage
- **Type**: NFS
- **Source**: 10.4.2.13 (Synology DS918+ NAS)
- **Size**: 23TB (23,029,958,311,936 bytes)
- **Used**: 9.2TB (9,241,738,215,424 bytes)
- **Available**: 13.8TB
- **Content Types**: snippets, iso, images, backup, rootdir, vztmpl
- **Shared**: Yes (available on all nodes)
- **Best For**:
- LXC container rootfs (most new containers use this)
- Backups
- ISO images
- Templates
- Data that needs to be accessible across nodes
**Current Usage**:
- Most LXC containers on pm2 use KavNas for rootfs
- Provides easy migration between nodes
- Centralized backup location
#### `elantris-downloads` - Download Storage
- **Type**: NFS
- **Source**: 10.4.2.14 (elantris node)
- **Size**: 23TB (23,116,582,486,016 bytes)
- **Used**: 10.6TB (10,630,966,804,480 bytes)
- **Available**: 12.5TB
- **Content Types**: rootdir, images
- **Shared**: Yes (available on all nodes)
- **Best For**:
- Download staging area
- Media downloads
- Large file operations
### ZFS Storage
#### `el-pool` - ZFS Pool (elantris)
- **Type**: ZFS
- **Node**: elantris only
- **Size**: 24TB (26,317,550,091,635 bytes)
- **Used**: 13.8TB (13,831,934,311,603 bytes)
- **Available**: 12.5TB
- **Content Types**: images, rootdir
- **Shared**: No (elantris only)
- **Best For**:
- High-performance storage on elantris
- Large data sets requiring ZFS features
- Services that benefit from compression/deduplication
**Current Usage**:
- VMID 121: jellyfin (16GB on el-pool)
**Status on Other Nodes**: Shows as "unknown" - ZFS pool is local to elantris only
## Storage Recommendations
### For New LXC Containers
**General Purpose Services** (web apps, APIs, small databases):
- **Storage**: `KavNas`
- **Disk Size**: 4-10GB
- **Rationale**: Shared, easy to migrate, automatically backed up
**High-Performance Services** (databases, caches):
- **Storage**: `local-lvm`
- **Disk Size**: As needed
- **Rationale**: Fast local SSD storage
**Large Storage Services** (media, file storage):
- **Storage**: `elantris-downloads` or `el-pool`
- **Disk Size**: As needed
- **Rationale**: Large capacity, optimized for bulk storage
### Mount Points for Media Services
Media-related LXCs typically mount:
```
mp0: /mnt/pve/elantris-media,mp=/media,ro=0
mp1: /mnt/pve/KavNas,mp=/mnt/kavnas
```
This provides:
- Access to media library via `/media`
- Access to NAS storage via `/mnt/kavnas`
## Storage Performance Notes
### Best Performance
1. `local-lvm` (local SSD on each node)
### Best Redundancy/Availability
1. `KavNas` (NAS with RAID, accessible from all nodes)
2. `elantris-downloads` (large capacity, shared)
### Best for Large Files
1. `el-pool` (ZFS on elantris, 24TB)
2. `elantris-downloads` (23TB NFS)
3. `KavNas` (23TB NFS)
## Backup Strategy
**Current Setup**:
- Backups stored on `KavNas` NFS share
- All nodes can write backups to KavNas
- Centralized backup location
**Recommendations**:
- [ ] Document automated backup schedules
- [ ] Implement off-site backup rotation
- [ ] Test restore procedures
- [ ] Monitor KavNas free space (currently 60% used)
## Storage Monitoring
**Watch These Metrics**:
- pm3 `local-lvm`: 49% used (178.8GB / 362.8GB)
- KavNas: 40% used (9.2TB / 23TB)
- elantris-downloads: 46% used (10.6TB / 23TB)
- el-pool: 53% used (13.8TB / 24TB)
## Future Storage Improvements
- [ ] Set up automated cleanup of old backups
- [ ] Implement storage quotas for LXC containers
- [ ] Consider SSD caching for NFS mounts
- [ ] Document backup retention policies
- [ ] Set up alerts for storage thresholds (80%, 90%)