|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +OADP (OpenShift API for Data Protection) is a Kubernetes operator that installs and manages Velero for backup and restore operations in OpenShift clusters. It extends Velero with OpenShift-specific features like Security Context Constraints (SCC), cloud credential management, and monitoring integration. |
| 8 | + |
| 9 | +## Prerequisites |
| 10 | + |
| 11 | +**Go Version**: Go 1.24.0 (with toolchain go1.24.5) |
| 12 | + |
| 13 | +**macOS Users**: Install GNU sed (required for bundle generation and other targets) |
| 14 | + |
| 15 | +```bash |
| 16 | +brew install gnu-sed |
| 17 | +``` |
| 18 | + |
| 19 | +**Container Tool**: Docker or Podman (auto-detected, defaults to Docker if available) |
| 20 | + |
| 21 | +- Override with: `CONTAINER_TOOL=podman make <target>` |
| 22 | + |
| 23 | +**Tool Version Checking**: Run `make versions` to check all tool versions and detect mismatches |
| 24 | + |
| 25 | +## Development Commands |
| 26 | + |
| 27 | +### Essential Commands |
| 28 | + |
| 29 | +```bash |
| 30 | +# Discovery and validation |
| 31 | +make help # Display all available targets with descriptions |
| 32 | +make versions # Check tool versions and detect mismatches |
| 33 | + |
| 34 | +# Development workflow |
| 35 | +make test # Run unit tests, linting, and validation (recommended before commits) |
| 36 | +make build # Build manager binary |
| 37 | +make deploy-olm # Deploy for testing via OLM (recommended for PR testing) |
| 38 | +make undeploy-olm # Remove OLM deployment |
| 39 | + |
| 40 | +# Code generation (run after API changes) |
| 41 | +make generate # Generate DeepCopy methods |
| 42 | +make manifests # Generate CRDs and RBAC manifests |
| 43 | +make bundle # Generate OLM bundle |
| 44 | +make api-isupdated # Check if API is up to date |
| 45 | +make bundle-isupdated # Check if bundle is up to date |
| 46 | + |
| 47 | +# Linting and formatting |
| 48 | +make lint # Run golangci-lint |
| 49 | +make lint-fix # Fix linting issues automatically |
| 50 | +make fmt # Format code with go fmt |
| 51 | + |
| 52 | +# Special targets |
| 53 | +make update-non-admin-manifests # Update NAC manifests from external repo |
| 54 | +``` |
| 55 | + |
| 56 | +### Testing Commands |
| 57 | + |
| 58 | +```bash |
| 59 | +make test-e2e # Run end-to-end tests (requires setup) |
| 60 | +make test-e2e-setup # Setup E2E test environment |
| 61 | +make test-e2e-cleanup # Cleanup after E2E tests |
| 62 | + |
| 63 | +# Test variations |
| 64 | +TEST_VIRT=true make test-e2e # Run virtualization tests |
| 65 | +TEST_UPGRADE=true make test-e2e # Run upgrade tests |
| 66 | +TEST_CLI=true make test-e2e # Run CLI-based tests |
| 67 | + |
| 68 | +# Run focused tests |
| 69 | +GINKGO_ARGS="--ginkgo.focus='test name'" make test-e2e |
| 70 | +``` |
| 71 | + |
| 72 | +### Cloud Authentication Deployment |
| 73 | + |
| 74 | +Deploy OADP with cloud-native authentication (STS, Workload Identity, WIF): |
| 75 | + |
| 76 | +```bash |
| 77 | +make deploy-olm-stsflow # Deploy with standardized flow UI (interactive) |
| 78 | +make deploy-olm-stsflow-aws # Deploy with AWS STS |
| 79 | +make deploy-olm-stsflow-gcp # Deploy with GCP Workload Identity Federation |
| 80 | +make deploy-olm-stsflow-azure # Deploy with Azure Workload Identity |
| 81 | +``` |
| 82 | + |
| 83 | +These targets automate cloud credential setup using cloud-native identity providers instead of manual credential files. The standardized flow provides an interactive UI for configuration. |
| 84 | + |
| 85 | +### E2E Test Setup Requirements |
| 86 | + |
| 87 | +E2E tests require these environment variables: |
| 88 | + |
| 89 | +- `OADP_CRED_FILE`: Path to backup location credentials |
| 90 | +- `OADP_BUCKET`: S3 bucket name for backups |
| 91 | +- `CI_CRED_FILE`: Path to snapshot location credentials |
| 92 | +- `VSL_REGION`: Volume snapshot location region |
| 93 | +- `BSL_REGION`: Backup storage location region (optional, defaults to us-east-1) |
| 94 | + |
| 95 | +**Test Labels**: Tests are filtered by cloud provider labels: `aws`, `gcp`, `azure`, `ibmcloud`, `virt`, `hcp`, `cli`, `upgrade` |
| 96 | + |
| 97 | +**Common Test Issues**: |
| 98 | + |
| 99 | +- ttl.sh images expire after TTL_DURATION (default 1h), which may cause test failures if running tests long after initial deployment |
| 100 | + |
| 101 | +## Important Environment Variables |
| 102 | + |
| 103 | +**Operator Configuration**: |
| 104 | + |
| 105 | +- `IMG`: Custom operator image (default: `quay.io/konveyor/oadp-operator:latest`) |
| 106 | +- `VERSION`: Override version (default: `99.0.0`) |
| 107 | +- `OADP_TEST_NAMESPACE`: Namespace for operator (default: `openshift-adp`) |
| 108 | + |
| 109 | +**Image Build and Registry**: |
| 110 | + |
| 111 | +- `CONTAINER_TOOL`: Container tool to use (`docker` or `podman`, auto-detected) |
| 112 | +- `TTL_DURATION`: ttl.sh image expiry time (default: `1h`, max: `24h`) |
| 113 | +- `BUNDLE_IMG`: Custom bundle image |
| 114 | + |
| 115 | +**Cloud Provider Credentials** (for E2E tests): |
| 116 | + |
| 117 | +- `OADP_CRED_FILE`, `OADP_BUCKET`, `CI_CRED_FILE`: Backup/snapshot credentials |
| 118 | +- `VSL_REGION`, `BSL_REGION`: Cloud regions for volume/backup storage locations |
| 119 | + |
| 120 | +## Git Repository Information |
| 121 | + |
| 122 | +**Upstream Repository**: `openshift/oadp-operator` |
| 123 | + |
| 124 | +**IMPORTANT - Pull Request Target**: Always target `oadp-dev` branch for PRs, NOT `main` |
| 125 | + |
| 126 | +**Branch Structure**: |
| 127 | + |
| 128 | +- Development branch: `oadp-dev` (target for all PRs) |
| 129 | +- Release branches: `oadp-major.minor` (e.g., `oadp-1.4`, `oadp-1.5`) |
| 130 | +- Many remote branches from various contributors exist |
| 131 | + |
| 132 | +You can verify the current default branch with `git ls-remote --symref upstream HEAD`. |
| 133 | + |
| 134 | +## Architecture Overview |
| 135 | + |
| 136 | +### Core APIs (Custom Resources) |
| 137 | + |
| 138 | +- **DataProtectionApplication (DPA)**: Primary resource that configures the entire OADP/Velero stack |
| 139 | +- **CloudStorage**: Manages cloud storage configurations for backup locations |
| 140 | +- **DataProtectionTest**: Framework for testing backup/restore operations |
| 141 | +- **Non-Admin resources**: Enable multi-tenant backup scenarios (NonAdminBackup, NonAdminRestore) |
| 142 | + |
| 143 | +### Key Controllers |
| 144 | + |
| 145 | +- **DataProtectionApplicationReconciler**: Main controller that orchestrates Velero deployment and configuration |
| 146 | +- **CloudStorageReconciler**: Manages cloud storage backend setup |
| 147 | +- **DataProtectionTestReconciler**: Handles data protection testing workflows |
| 148 | + |
| 149 | +### Package Structure |
| 150 | + |
| 151 | +- `api/v1alpha1/`: CRD type definitions and API schemas |
| 152 | +- `internal/controller/`: Controller implementations and reconciliation logic |
| 153 | +- `pkg/credentials/`: Cloud credential management and authentication flows |
| 154 | +- `pkg/velero/`: Velero-specific utilities and integration code |
| 155 | +- `pkg/cloudprovider/`: Multi-cloud provider abstractions (AWS, Azure, GCP, IBM) |
| 156 | +- `tests/e2e/`: Comprehensive end-to-end test suites using Ginkgo |
| 157 | + |
| 158 | +### Integration Points |
| 159 | + |
| 160 | +The operator manages these key integrations: |
| 161 | + |
| 162 | +- **Velero**: Core backup/restore engine with OpenShift-specific patches |
| 163 | +- **Cloud Providers**: AWS (including STS), Azure (Workload Identity), GCP (WIF), IBM Cloud, OpenStack |
| 164 | +- **OpenShift**: SCC management, monitoring integration, image registry |
| 165 | +- **Storage**: CSI snapshots, data mover functionality for cross-cluster scenarios |
| 166 | + |
| 167 | +### Development Workflow |
| 168 | + |
| 169 | +1. Use `make deploy-olm` for testing code changes (builds and deploys current branch) |
| 170 | +2. Always run `make test` before committing to validate code quality |
| 171 | +3. For API changes: run `make generate && make manifests && make bundle` |
| 172 | +4. E2E tests require cloud credentials and should be run in appropriate test environments |
| 173 | +5. The operator follows standard controller-runtime patterns with comprehensive validation and status reporting |
| 174 | + |
| 175 | +### Special Features |
| 176 | + |
| 177 | +- **Multi-cloud standardized authentication**: Supports cloud-native identity (STS, WIF, Workload Identity) |
| 178 | +- **Non-admin backup**: Multi-tenant backup capabilities for namespace-scoped users |
| 179 | +- **Data mover**: Cross-cluster backup/restore using VolSync integration |
| 180 | +- **OpenShift Virtualization**: Backup/restore support for KubeVirt VMs |
| 181 | +- **Must-gather integration**: Diagnostic collection for troubleshooting |
| 182 | + |
| 183 | +### Bundle and Release Management |
| 184 | + |
| 185 | +- Uses OLM (Operator Lifecycle Manager) for deployment and upgrades |
| 186 | +- Bundle generation includes multiple service accounts (velero, non-admin-controller) |
| 187 | +- Supports multiple channels (dev, stable) for different release streams |
| 188 | +- Version compatibility matrix maintained in `PARTNERS.md` |
| 189 | + |
| 190 | +When making changes, always consider the multi-cloud nature of the operator and test against the comprehensive E2E suite that covers various cloud providers and backup scenarios. |
| 191 | + |
| 192 | +## CI/Prow Testing |
| 193 | + |
| 194 | +E2E tests in presubmit CI are automatically triggered via OpenShift's Prow infrastructure: |
| 195 | + |
| 196 | +**CI Configuration**: Tests are defined in the [openshift/release](https://github.com/openshift/release) repository at: |
| 197 | +- `ci-operator/config/openshift/oadp-operator/openshift-oadp-operator-oadp-dev__4.20.yaml` |
| 198 | + |
| 199 | +**Test Container Image**: The `test-oadp-operator` image is built from [build/ci-Dockerfile](build/ci-Dockerfile), which: |
| 200 | +- Uses `quay.io/konveyor/builder` as the base image |
| 201 | +- Installs kubectl for cluster operations |
| 202 | +- Downloads Go dependencies and prepares the build environment |
| 203 | +- Provides the runtime environment for executing E2E tests in CI |
| 204 | + |
| 205 | +**How it works**: |
| 206 | +1. When a PR is opened against `oadp-dev`, Prow automatically triggers configured test jobs |
| 207 | +2. The ci-Dockerfile builds a test container with all necessary dependencies |
| 208 | +3. E2E tests run inside this container against a provisioned OpenShift cluster |
| 209 | +4. Test results are reported back to the PR |
| 210 | + |
| 211 | +**Viewing test results**: Check the PR's "Checks" tab or visit [prow.ci.openshift.org](https://prow.ci.openshift.org) for detailed test logs. |
| 212 | + |
| 213 | +### Automated Failure Analysis with Claude |
| 214 | + |
| 215 | +When E2E tests fail in Prow CI, Claude Code automatically analyzes the failures and generates a comprehensive report. |
| 216 | + |
| 217 | +**How it works**: |
| 218 | + |
| 219 | +1. After test execution completes with failures, the analysis script (`tests/e2e/scripts/analyze_failures.sh`) is invoked |
| 220 | +2. Claude runs in headless mode (`--print` flag) for non-interactive CI automation via Vertex AI |
| 221 | +3. Claude analyzes artifacts written by the E2E test code: JUnit reports, must-gather diagnostics, and per-test pod logs |
| 222 | +4. A detailed markdown report is generated at `${ARTIFACT_DIR}/claude-failure-analysis.md` |
| 223 | +5. The report includes root cause analysis, known flake detection, and actionable recommendations |
| 224 | + |
| 225 | +**Important**: Claude analyzes only artifacts generated during test execution (JUnit, must-gather, per-test logs). Prow's build-log.txt is written by CI infrastructure after tests complete and is not available during analysis. |
| 226 | + |
| 227 | +**Accessing the analysis**: |
| 228 | + |
| 229 | +- Find `claude-failure-analysis.md` in the Prow artifacts directory alongside other test outputs |
| 230 | +- URL pattern: `https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oadp-operator/<PR>/<job-name>/<build-id>/artifacts/claude-failure-analysis.md` |
| 231 | + |
| 232 | +**Configuration**: |
| 233 | + |
| 234 | +- Analysis requires Vertex AI credentials configured in the CI environment |
| 235 | +- Gracefully skips if credentials are not available (no impact on test execution) |
| 236 | +- Can be disabled by setting `SKIP_CLAUDE_ANALYSIS=true` |
| 237 | +- **Automatic secret redaction**: API keys, tokens, passwords, and credentials are automatically redacted from output |
| 238 | + |
| 239 | +For more details, see the [design document](docs/design/claude-prow-failure-analysis_design.md). |
0 commit comments