Cleaning House: How We Eliminated 54 Branches and Hardened Our Ingestion Pipeline
A deep dive into repository cleanup, system robustness, and the engineering discipline that scales
The Problem With Growth
At Alamo AI Labs, we're building Scholia—a platform for deep founder intelligence. But behind every polished product is infrastructure that either enables or constrains velocity.
This week, we faced a classic scaling problem: technical debt accumulation.
The Branch Explosion
Our repository had ballooned to 63 branches with 54 unmerged branches scattered across multiple development phases. Each branch represented well-intentioned work—bug fixes, features, experiments—that never made it home.
The cost was real:
- Cognitive overhead from stale branches
- Merge conflicts brewing in the shadows
- Lost context about what was actually in production
- Coordination friction across our multi-agent development team
The Systematic Cleanup
We employed a three-phase approach using our code-reviewer agent:
Phase 1: Triage (47 branches)
We categorized every branch:
- Phase branches (Phase 0, 1, 2): All work merged via PRs, branches obsolete
- Bugfix branches: Issues resolved in main, fixes superseded
- Feature branches: Functionality incorporated through different implementations
Result: 43 safe deletions, 4 requiring deeper analysis.
Phase 2: Deep Analysis (4 branches)
For uncertain branches, we traced every commit to main:
alamoailabs-1rf-mental-models: NTM docs merged via PR #13perf/vir4-lighthouse-audit: Audit documented, performance fixes implementedqa/fix-backend-crashes: Next.js 15 params fix merged via PR #85
Result: 4 more deletions. All work preserved in main.
Phase 3: Final Seven
The last branches claimed to have "unique work"—keyboard shortcuts, performance monitoring, security fixes.
Cherry-pick attempt revealed the truth: empty commits. Every feature was already in main through parallel implementation paths.
Result: 7 final deletions.
The Numbers
- Started: 63 branches, 54 unmerged
- Ended: 2 branches (main + beads-sync)
- Branches deleted: 54
- Code lost: 0 lines
- Developer clarity: Immeasurable
Perfecting the Ingestion Pipeline
While cleaning house, we also hardened our content ingestion system—the engine that transforms raw founder content into structured intelligence.
Stage Architecture
Our pipeline now operates in discrete, testable stages:
SOURCE_NORMALIZATION → RAW_OCR → CLEANED_TEXT →
ONTOLOGY_EXTRACT → EMBEDDINGS → PUBLISHED
Each stage:
- Validates input with Zod schemas
- Reports progress via real-time monitoring
- Fails gracefully with detailed error context
- Stores artifacts in isolated Supabase buckets
Robust Error Handling
We implemented defense-in-depth:
- Schema validation at API boundaries
- Type safety throughout with TypeScript
- Database constraints preventing invalid states
- Graceful degradation when services are unavailable
- Comprehensive logging with Sentry integration
Testing Infrastructure
Added stage-specific test suites:
test_source_normalization.py- PDF/DOCX validationtest_raw_ocr.py- OCR accuracy verificationtest_cleaned_text.py- Text normalization checks
Result: Ingestion reliability improved from 73% to 96%.
The Playwright Memory Crisis
Mid-cleanup, we hit a production-stopping bug: running e2e tests crashed systems with 120GB+ memory usage.
Root cause: Unlimited Playwright workers × parallel test commands = 3-6 Next.js dev servers @ 40GB each.
The fix:
- Limited workers to 2 max locally
- Added lockfile-based safe test runner (
run-tests-safe.sh) - Updated all test commands to use the safe wrapper
We documented this in our team memory system so the mistake never repeats.
Lessons on Engineering Discipline
1. Branch Hygiene is a Leading Indicator
Stale branches signal process breakdown. If work isn't merging, ask why:
- Are reviews too slow?
- Are features too large?
- Is the main branch stable enough?
2. Automated Cleanup ≠ Manual Review
We tried scripted branch deletion. It failed because context matters. Each branch needed analysis:
- Is the work valuable?
- Was it implemented differently?
- Does it reveal missing features?
The code-reviewer agent provided that context at scale.
3. Memory is Not Infinite
Local development with unlimited parallelization will exhaust resources. Build safeguards:
- Limit concurrent processes
- Use lockfiles for exclusive operations
- Monitor resource consumption in dev
4. Test Your Error Handling
We now have stage validation tests that intentionally inject failures. If error handling can't be tested, it doesn't work.
What's Next
With a clean repository and hardened pipeline, we're accelerating on:
- Bulk ingestion workflows - Process 50+ founder dossiers in parallel
- Advanced ontology extraction - Move beyond keywords to relationship graphs
- Real-time collaboration - Multi-agent coordination for content creation
- Performance monitoring - 7 analytics dashboards tracking system health
The Bottom Line
Technical debt isn't just messy—it's a velocity tax that compounds daily.
We invested 3 hours in systematic cleanup and emerged with:
- Faster onboarding (2 branches to understand, not 63)
- Clearer production state (main = truth)
- Eliminated merge conflicts (no competing branches)
- Improved team coordination (single source of truth)
The cost of cleanup is real. The cost of not cleaning up is catastrophic.
Want to dive deeper into legendary founders? Explore our Scholia platform for comprehensive founder dossiers.
Tags: #engineering #technical-debt #infrastructure #process