AI development toolkit
For the last few months I’ve been pouring most of my building time into a skills-based AI software development toolkit we built at Sancrisoft and use day-to-day with our clients.
The short version: it gives an AI agent — Claude, primarily, but anything that speaks the open agent-skills spec — a fixed set of slash-command skills and six agent personas, and walks it through the full lifecycle from a fuzzy idea to a tested release. Between phases there are mandatory human-approval checkpoints, because the point isn’t to replace the engineer; it’s to give the engineer leverage they didn’t have before.
Why we built it
We kept hitting the same failure mode. A teammate would hand Claude a vague feature ask, get back a confident-looking implementation in twenty minutes, and discover during review that the agent had quietly skipped half the requirements, invented an API endpoint that didn’t exist, or built something that didn’t compose with the rest of the codebase.
The underlying issue was always the same: too much asked of the agent in a single shot, with no structure for “here is how a feature actually moves through this team.”
The toolkit is our answer. It’s opinionated about the workflow, not about the code.
The three phases
Every feature moves through three phases, each one closing with a human review:
- Spec — Product Manager + Architect + QA collaborate on a real spec, an architecture doc with ADRs, and a risk register. The agents actually talk to each other and resolve their own doubts before coming back to me.
- Build — A Tech Lead plans the work, then Developer subagents implement in parallel where it makes sense. Small features collapse to a single subagent; medium features fan out across dependency layers. TDD red-green-refactor inside each subagent.
- Test — A QA Engineer reads the risk register from the spec phase, categorizes the risks, and I pick what’s worth manually verifying. Findings stay around between runs and sync to GitHub Issues.
The risk register is the load-bearing piece. It’s the bridge that makes prioritization possible at the test step, instead of the QA agent re-deriving “what matters” from the code every time.
The six personas
| Persona | Role |
|---|---|
| Product Manager | Defines the feature, owns requirements |
| Architect | Designs the technical approach, writes ADRs |
| Tech Lead | Plans the build, delegates to developers |
| Developer | Implements the code |
| Reviewer | Reviews diffs with specialized lenses (security, performance, accessibility, architecture, data-integrity) |
| QA Engineer | Risk register, test planning, release recommendation |
Each persona has the same “Sancrisoft DNA” baked into its prompt — client orientation, initiative, craftsmanship, teamwork, commitment — so even when the agents disagree, they argue inside the same value system.
What’s working, what isn’t
What’s working:
- Project-scoped install. The team gets the toolkit on clone. Nobody has to remember to set up anything.
- Pre-commit hooks are non-negotiable. Agents generate a lot of code; lint/format gates catch entire categories of issues without per-line review.
- The brief spec, not the full one, gets passed to build subagents. Keeps token usage sane and forces the spec phase to produce a real artifact, not a 4,000-line stream of consciousness.
- Inline learning capture. At the end of every build/review/test, the agent flags what it learned. Periodic
compoundruns synthesize cross-feature patterns.
What’s not (yet):
- External APIs still need verification. The toolkit researches docs but training data goes stale. We’ve been burned on Supabase and AWS generated integrations more than once.
- Spec quality scales with prompt precision. This is obvious in hindsight, but worth saying — a sloppy ask still gets you a sloppy spec, just a longer one.
- The name. We have a working codename internally, but the consensus is it doesn’t sell. The toolkit keeps its codename behind the curtain; the customer-facing methodology built on top of it is getting a real one.
If you’re working in this space and want to compare notes, shoot me an email to hola at juango.nz.
Written by Juan & Claude.