What if Your Codebase Could Tell the Truth?
A wild experiment in teaching an AI to continuously analyze, document, and explain an entire system.
Every developer has a project graveyard, and most of the headstones read the same: "The documentation was a lie." I've spent more hours than I can count playing documentation archaeologist, digging through stale READMEs and deprecated architecture diagrams, trying to understand what a system actually does.
I was tired of inheriting mysteries. I was tired of spending 60% of my time reverse-engineering code before I could even write a single new line.
So, I built a system that forces my codebase to explain itself. In real-time. With every single commit.
This post is the story of how I did it. I’ll show you the architecture, the prompts, and the mind-blowing results of orchestrating multiple AI agents to act as a tireless, all-knowing tech lead.
The Universal Lie of Software Development
Let's be honest: documentation is almost always out of date. The moment it's written, it begins to decay. The architecture diagram on the wall doesn't show the hotfix from last Tuesday. The README doesn't mention the new dependency added yesterday.
This isn't a moral failing; it's a systems problem. We're handed the keys to a spaceship with a manual that's mostly coffee stains and wishful thinking.
But what if the manual rewrote itself? What if your codebase could just... tell you things? What if it could:
- Automatically update its own documentation after every commit?
- Detect architectural shifts and flag potential issues before they hit production?
- Build a living knowledge graph of every component, dependency, and data flow?
That's the rabbit hole I went down. The results broke my brain a little.
The Experiment: An AI Swarm for Code Intelligence
I had a crazy idea: what if I orchestrated multiple Claude Code instances to analyze a codebase like a swarm of hyper-efficient, specialized developers?
I used Dagster, an asset-based orchestrator, to model this workflow. Here’s the high-level view:
Each part of this "AI swarm" has a specific, focused job.
Why Not Just One Big AI?
I chose to use a "swarm" of specialized AI agents instead of one monolithic prompt for a few key reasons:
- Specialization: A prompt designed to analyze git diffs is very different from one that updates architectural diagrams. Specialization yields higher-quality results.
- Efficiency: By breaking the problem down, I can run multiple analyses in parallel, dramatically cutting down the time from commit to insight.
- Modularity: It's easier to debug and improve a small, focused prompt than a giant, all-encompassing one.
Why Dagster?
Dagster was the perfect tool for this because of two features:
- Asset-Based Orchestration: I can model each output—like
code_changes
ordocumentation
—as a software-defined asset. Dagster understands the dependency graph and orchestrates the execution automatically. - Sensors and Schedules: I can use a
commit_sensor
to trigger the entire pipeline on every new git push and aschedule
to run a deep quality audit every week.
It’s the ideal backbone for an event-driven, analytical system like this.
How It Works: The Anatomy of an AI Code Reviewer
1. The Git Detective Is Triggered
Every git push
triggers a sensor that kicks off the process. The first Claude instance analyzes the diff.
The prompt is crucial. It’s not just "what changed?"—it's "what does this mean?"
# The prompt for the "Git Detective" Claude
prompt = f"""
You are a senior developer performing a code review. Analyze the following commit.
**Commit Message:** {commit.message}
**Files Changed:** {files_changed}
**Code Diff:**
{diff_text}
**Your Analysis (in JSON format):**
- **summary:** A one-sentence summary of the actual change.
- **change_type:** (e.g., 'Refactor', 'Bugfix', 'Feature', 'Documentation'). Do not trust the commit message; base this on the code.
- **potential_impact:** What systems or components could break because of this?
- **red_flags:** Are there any code quality issues, new dependencies, or security concerns?
- **review_comment:** What would you tell the developer in a pull request review?
"""
2. The Tech Lead Assesses the Big Picture
The outputs from individual commit analyses are then fed to a second "Tech Lead" Claude. This one zooms out.
# The prompt for the "Tech Lead" Claude
prompt = f"""
You are the tech lead for this project. Based on the following batch of commit analyses, provide a high-level impact assessment.
**Commit Analyses:**
{all_the_commit_analyses_json}
**Your Report (in JSON format):**
- **architecture_shift:** Did the fundamental architecture change? (e.g., 'Monolith to Microservice', 'New Service Added', 'None').
- **accumulated_risk:** Is technical debt increasing? Is there a new potential point of failure?
- **production_impact:** What is the likelihood this will cause a production issue? ('High', 'Medium', 'Low').
- **worry_level:** Should the team be alerted? ('It's fine', 'Heads up', 'Call a meeting').
"""
This is where the magic happens. It connects dots a human might miss, like how three small, seemingly unrelated changes are actually part of a larger, unannounced refactor.
3. The Librarian Updates the Docs
The impact assessment is then passed to a "Librarian" Claude, whose only job is to update the documentation.
# The prompt for the "Documentation" Claude
prompt = f"""
You are a technical writer. Update the documentation in the `/docs` directory based on this impact assessment.
**Impact Assessment:**
{impact_assessment_json}
**Instructions:**
1. Identify which documentation files need to be updated (e.g., `architecture.md`, `api_guide.md`).
2. Generate the new content for those files.
3. If a file doesn't exist, create it.
4. Focus on clarity, accuracy, and providing code examples where relevant.
"""
Because this runs with every change, the documentation never lies. It is always an accurate reflection of the current state of the code.
The Results Blew My Mind
I pointed this system at my own codebase, and it was both humbling and incredibly validating.
- Architecture Quality Score: 7.5/10
- Modularity: 8/10 ("Components are well-separated.")
- Separation of Concerns: 9/10 ("The architect knows what they're doing.")
- Documentation Coverage: 85% (and climbing automatically)
- Dependency Health: 0 circular dependencies across 43 total dependencies.
- Technical Debt Ratio: 15% (with a prioritized list of what to fix)
It's Fast. Incredibly Fast.
The most shocking part is the speed. What would take a human developer hours of manual work now happens in the time it takes to make a coffee.
Task | Time | Before vs. After |
---|---|---|
Analyzing Diffs | 45s | Before: A developer skims the code. After: A deep, semantic analysis of every line. |
System Impact | 2m | Before: A best guess during a stand-up. After: A detailed report on architectural impact. |
Updating Docs | 3m | Before: A ticket for next sprint. After: The docs are rewritten to reflect reality. |
Total Time | ~7 mins | From commit to complete codebase intelligence. |
This speed doesn't just save time; it changes how I think about development. It transforms development from a series of guesses into a feedback-driven conversation with my own system.
The Day Claude Roasted Me (and Made My Code Better)
The "aha!" moment came when the system analyzed itself and called me out.
🚨 HIGH RISK - "Your external dependency is going to bite you"
- File:
claude_integration.py
- Problem: "No circuit breaker, no fallback, just pure faith in the API."
- Claude's Suggestion: "Implement circuit breakers and timeout patterns. This will fail in production."
It was right. I was so focused on the "what," I missed the "how."
⚠️ MEDIUM RISK - "This file is doing too much"
- File:
assets.py
(~800 lines of orchestration) - Problem: "High coupling score (8/10), multiple responsibilities."
- Claude's Suggestion: "Split this into
git_assets.py
,analysis_assets.py
, anddocumentation_assets.py
. Your future self will thank you."
The brutal, specific, and actionable honesty was something I'd never gotten from a traditional linter or static analysis tool.
This Changes Everything
I've been writing code for over a decade, and this is the first time I feel like I'm on the cusp of something fundamentally new.
I stopped playing documentation archaeologist. I stopped guessing about the impact of my changes. I built a system that watches my code evolve and tells me what it means.
The future isn't about replacing developers with AI. It's about giving developers superpowers. It's about building systems that make us smarter, faster, and more confident. Systems that let us focus on creating great software instead of reverse-engineering the software we've already built.
After watching an AI analyze my own codebase better than I could, I'm convinced that future is closer than anyone realizes.
Get Started and Ask Your Codebase a Question
This is a proof of concept, but it's one that works on real codebases today. You can find the complete source code, deployment guides, and prompts in the repository.
Clone the repo and see for yourself:
# 1. Clone the repository
git clone https://github.com/patricktrainer/codebase-intelligence.git
# 2. Go into the directory
cd codebase-intelligence
# 3. Follow the setup instructions in the README.md
I'll leave you with a question:
If your codebase could answer one question for you right now, what would you ask it?
Let me know!
not made by a 🤖