Skip to main content

Command Palette

Search for a command to run...

From Vibe to Verified: Building Production Software with AI

Published
12 min read
From Vibe to Verified: Building Production Software with AI
E

As a seasoned software engineering professional with over two decades of experience, I specialize in leading and architecting complex software projects.

Context

Over three days in late October 2025, I built a intranet application template using FastAPI and React—then reverse-engineered it into 417 BDD scenarios and 278 formal requirements with complete traceability.

This wasn't about seeing how fast AI could generate code. It was about validating a process: Can AI-assisted development produce a well-architected application with the documentation and testing rigor needed for enterprise systems?

The short answer: yes, but only if you approach it as engineering, not magic.

The Experiment Design

I started with a typical enterprise need: an intranet web application with authentication, user management, API access controls, and a sample CRUD entity. Nothing exotic - the kind of system you'd build for internal tools or departmental applications.

The constraint was deliberate: work with AI as a development partner, but maintain engineering discipline. That meant:

  • Start with a detailed plan, not ad-hoc prompting

  • Build iteratively with verification at each step

  • Write tests, not just features

  • Generate documentation as you go, not after

  • Reverse-engineer to BDD and requirements for traceability

The tech stack - FastAPI (Python), React with TypeScript, Docker - wasn't chosen for novelty. These are production-grade tools with mature ecosystems. The goal wasn't to prove AI could write toy demos. It was to see if AI could contribute to the kind of software that survives code review, security audits, and multi-year maintenance cycles.

Day 1: Foundations That Matter

The first move matters. I didn't ask AI to "build me an app." I gave it a project overview and asked it to generate an implementation plan broken into phases with effort estimates, dependencies, and success criteria.

That plan became the contract. It forced clarity on scope, architecture decisions, and what "done" actually meant. Too often, AI development devolves into prompt whack-a-mole - asking for features without understanding how they connect. A plan changes that dynamic.

With the plan in hand, Day 1 became straight-forward:

  • Set up Python and Node environments

  • Scaffold backend and frontend structures

  • Implement JWT-based authentication with Argon2 password hashing

  • Build login and registration pages

  • Wire up protected routes

The interesting part wasn't that AI could generate this code. The interesting part was when it hit Pydantic v2 compatibility issues with FastAPI. AI didn't just error out. It researched the migration guide, understood the breaking changes in FieldInfo, and fixed the problem. That's not code generation. That's troubleshooting.

By evening, authentication worked end-to-end. Users could register, log in, and access protected routes. More importantly, I had working tests verifying the behavior.

Day 2: Building on Solid Ground

Day 2 was about expanding features while maintaining quality. User management, personal access tokens, and a sample CRUD entity (items).

But here's where most AI-generated projects fall apart: consistency. It's easy to get five features that look like they were written by five different people with five different UI patterns.

The fix: establish shared components early. A reusable UserModal for create/edit operations. A Layout component with sidebar navigation. A toast notification system. CSS variables for theming. FormField components with built-in validation.

AI excelled at this once the pattern was established. "Create a PAT management page following the same patterns as the user management page" produced consistent, predictable results. The navigation worked the same way. The forms validated the same way. Error handling looked the same.

Personal Access Tokens (PATs) showcased something important: scope-based authorization. Users could create API keys with read, write, or admin permissions. Tokens were hashed before storage. They displayed only once after creation. The UX included security warnings about token handling.

This wasn't just CRUD. It was thinking through the security model and user workflow. AI contributed ideas here—suggesting one-time token display, implementing secure hashing, recommending scope validation middleware.

By evening, the application had enough features to be useful. But "works on my machine" isn't production-ready.

Day 3 Morning: Quality Is a Feature

Day 3 started with testing infrastructure. Not aspirational testing, not "we'll add tests later"—actual tests running in CI-ready format.

Backend: 52 PyTest cases covering authentication, user management, CRUD operations, and token lifecycle. All passing. Frontend: 21 Vitest component tests with mocking and assertions. Security: 19 dedicated tests for SQL injection prevention, XSS sanitization, password hashing verification, and access control enforcement.

Then came UI polish that actually mattered:

  • Modern top navigation bar with active page highlighting

  • Dropdown menus for user actions

  • Consistent styling via CSS variables

  • Accessibility improvements (ARIA labels, keyboard navigation, reduced motion support)

  • Toast notifications that worked cross-browser

The key wasn't that AI could write this code. It was that I could ask AI to verify it worked. "Use Puppeteer to test the login flow and screenshot the result." AI would spin up the app, run the browser automation, capture screenshots, and report findings.

That verification loop is critical. It's the difference between "I think this works" and "I confirmed this works."

Day 3 Midday: Enterprise Authentication

LDAP integration is where enterprise tools live or die. Nobody wants to manage two sets of credentials. Users expect single sign-on with their Active Directory accounts.

This is complex territory: LDAP connection pooling, group-based role assignment, fallback to local authentication when LDAP is unavailable, secure credential handling, troubleshooting tools for admins.

AI researched LDAP best practices, studied the ldap3 library documentation, and generated a 330-line service with:

  • Dual authentication (LDAP first, local fallback)

  • Group-based admin role assignment (AD groups → app roles)

  • Health monitoring endpoints

  • Comprehensive error handling and logging

  • 22 unit tests

  • A 650-line configuration guide

The documentation is important. LDAP configuration is notoriously finicky. The guide walks through connection strings, search filters, group mappings, troubleshooting steps, and security considerations. This isn't generated fluff - it's operationally useful.

Day 3 Afternoon: Deployment Reality

Docker isn't optional for modern deployment. The application needed to run the same way in development, testing, and production.

Multi-stage Docker builds reduced image sizes. Non-root users improved container security. Health check endpoints enabled orchestration monitoring. Podman compatibility meant no vendor lock-in.

AI wrote the Dockerfiles, docker-compose configuration, and automated deployment tests. Then it ran them. Spun up containers, verified endpoints, captured logs, validated behavior.

This is where "AI can code" becomes "AI can engineer." It's not just writing syntax. It's understanding deployment concerns, testing assumptions, and documenting operational procedures.

The Reverse Engineering Phase

By late Day 3, I had a working application. Tested, documented, deployable. But enterprise systems need more than code - they need traceability.

Could we reverse-engineer this codebase into BDD features and formal requirements? Not just describe what it does, but document it in a form that satisfies compliance, supports long-term maintenance, and serves as living documentation?

Phase 9: BDD Feature Documentation

AI analyzed the implemented features and generated 23 Gherkin feature files organized into 7 categories:

  • Authentication & Authorization (5 files, 64 scenarios)

  • User Management (3 files, 49 scenarios)

  • Personal Access Tokens (3 files, 60 scenarios)

  • Items Management (2 files, 39 scenarios)

  • Security (4 files, 84 scenarios)

  • UI/UX (4 files, 79 scenarios)

  • System Administration (2 files, 42 scenarios)

Total: 417 scenarios written in declarative Gherkin format following industry best practices.

Each scenario describes behavior from a user perspective:

Scenario: User creates personal access token with read scope
  Given the user is authenticated
  When the user creates a token named "API Reader" with read scope
  Then the token is generated successfully
  And the token displays only once
  And the user receives a security warning about token handling

This isn't code documentation. This is behavior specification that non-technical stakeholders can read and verify.

Phase 10: Requirements Generation

From those BDD features, AI extracted formal requirements using EARS (Easy Approach to Requirements Syntax) following ISO/IEC/IEEE 29148:2018 standards.

Eight requirements documents with 278 total requirements:

  • REQ-001: Authentication (50 requirements)

  • REQ-002: User Management (32 requirements)

  • REQ-003: Items Management (38 requirements)

  • REQ-004: Personal Access Tokens (54 requirements)

  • REQ-005: Security (28 requirements)

  • REQ-006: UI/UX (26 requirements)

  • REQ-007: System Administration (24 requirements)

  • REQ-008: Quality Attributes (26 requirements)

Each requirement follows EARS patterns:

REQ-AUTH-FUNC-002: WHEN a user submits the registration form 
THEN the system shall validate that the password meets complexity 
requirements (minimum 8 characters, uppercase, lowercase, number, 
special character).

Priority: CRITICAL
Category: Functional
Source: features/authentication/01-user-registration.feature
Test Cases: TC-AUTH-002, TC-AUTH-003

Finally, AI generated a complete traceability matrix mapping:

  • Feature files → Scenarios → Requirements → Test cases → Implementation

Forward traceability: "Which code implements this requirement?"
Backward traceability: "Which requirements does this code satisfy?"

Output Screenshots

Login and Main Dashboard View

Item View and Management

API Token Management

User Management

Take-aways

Core Argument

The process demonstrated here isn't about AI replacing developers. It's about AI enabling a development workflow that produces better artifacts.

Consider what this three-day sprint created:

  • 32,000+ lines of code (backend, frontend, tests, docs)

  • 92 automated tests (100% passing)

  • 417 BDD scenarios documenting every behavior

  • 278 formal requirements with full traceability

  • Deployment-ready Docker configuration

  • Comprehensive user and admin documentation

A traditional development team would need weeks, maybe months, to produce this scope with equivalent documentation quality. Not because developers are slow, but because documentation usually happens last, if at all.

The AI-assisted approach inverted that. Documentation wasn't an afterthought - it was continuous. The BDD features and requirements weren't retrofitted to match code - they were extracted from working implementation with verification.

Technical and Practical Insights

What works:

  1. Plan-first development: Generate detailed implementation plans before coding. Plans provide context that improves every subsequent AI interaction.

  2. Iterative verification: Don't assume AI output is correct. Test it. AI can write the tests, run them, interpret failures, and fix issues. Use that capability.

  3. Pattern establishment: Define architectural patterns early (component structure, API conventions, error handling). AI excels at applying consistent patterns.

  4. Automated validation: AI can run linters, security scans, browser automation, and deployment tests. Leverage this for rapid feedback loops.

  5. Documentation as process: Generate docs alongside code, not after. BDD features, API guides, and configuration documentation should evolve with implementation.

  6. Reverse engineering works: You can build a "vibe-based" prototype, then formalize it into rigorous specifications. The traceability isn't fake—it accurately reflects what exists.

What doesn't work:

  1. Ad-hoc prompting: "Build me X" without structure produces inconsistent results. Context matters.

  2. Blind trust: AI makes mistakes. It hallucinates APIs that don't exist. It misunderstands requirements. Verification is mandatory.

  3. Ignoring fundamentals: AI won't save a bad architecture. If you don't understand authentication security, LDAP integration, or database transactions, AI will happily generate plausible-looking broken code.

  4. Documentation substitution: Living documentation is valuable, but it doesn't replace understanding. You still need engineers who can debug production incidents.

Key heuristics:

  • Ask AI to create verification tools (tests, scripts, automation) not just features

  • Provide reference documents for specialized domains (BDD best practices, EARS patterns, deployment guides)

  • Request detailed plans with dependencies and success criteria before implementation

  • Use markdown for planning and tracking - AI reads it effectively

  • Break large tasks into phases; review output before proceeding

Broader Implications

This experiment suggests several shifts in how we should think about software development with AI assistance:

From Code Generation to Artifact Generation

AI's value isn't writing code - it's producing the full set of artifacts professional software requires. Code, tests, documentation, deployment configuration, requirements specifications. The velocity gain comes from generating everything in parallel, not just the code.

From Documentation Debt to Living Specifications

Documentation typically lags implementation by weeks or months. With AI assistance, documentation generation keeps pace with coding. Better: you can reverse-engineer implementations into formal specifications that maintain traceability to source.

From "Move Fast and Break Things" to "Move Fast and Document Things"

Startup culture glorified shipping features over writing documentation. AI changes that tradeoff. You can move fast and maintain rigorous documentation. The constraint isn't time—it's discipline.

From Individual Coding to System Engineering

AI shifts the developer role from "writing functions" toward "architecting systems." You spend more time on:

  • Defining architecture and patterns

  • Reviewing AI-generated code for correctness

  • Writing verification criteria

  • Ensuring consistency across artifacts

  • Making tradeoff decisions

Less time on:

  • Boilerplate implementation

  • Routine refactoring

  • Manual test writing

  • Documentation formatting

  • Configuration file generation

This is a good shift. It moves human effort toward higher-leverage activities.

Closing Thought

This project wasn't about proving AI can code. That's solved. It was about proving AI can contribute to engineering - the discipline of building systems that work reliably, documented clearly, and evolve maintainably over years.

The answer is yes, but only if you treat AI as a capable junior engineer who needs clear direction, thorough review, and systematic verification. Give it a detailed plan. Establish architectural patterns. Verify everything. Generate documentation continuously. Reverse-engineer to formal specifications when needed.

What matters most isn't the AI's capability - it's your engineering process. AI amplifies whatever process you give it. A sloppy process produces sloppy results faster. A rigorous process produces rigorous results faster.

Three days, 32,000 lines of code, 417 BDD scenarios, 278 requirements, full traceability. Not because AI is magic, but because structured engineering with AI assistance is efficient.

The template is proven. The workflow works. The artifacts meet enterprise standards.

Now the real work begins: adapting this process to production systems, compliance frameworks, and multi-year maintenance cycles. But that's a solvable problem—because we have a foundation that's both fast and rigorous.

That's the mindset we need: not "how fast can AI write code" but "how can AI help us build better systems."

Appendix: Metrics Summary

Development Time: 3 days (Oct 30 - Nov 1, 2025)

Code Generated:

  • Backend: ~8,500 lines (Python)

  • Frontend: ~7,200 lines (TypeScript/React)

  • Tests: ~2,200 lines

  • Total Code: ~17,900 lines

Documentation Generated:

  • BDD Features: ~6,000 lines (Gherkin)

  • Requirements: ~8,500 lines (EARS format)

  • User/Admin/API Guides: ~1,850 lines

  • Implementation Plans: ~2,400 lines

  • Total Documentation: ~18,750 lines

Testing Coverage:

  • Backend Tests: 52 (PyTest)

  • Frontend Tests: 21 (Vitest)

  • Security Tests: 19

  • BDD Scenarios: 417

  • Total Automated Tests: 92 (100% passing)

Traceability:

  • Feature Files: 23

  • Requirements: 278

  • Test Cases: 87 unique mappings

  • Forward and backward traceability: Complete

Technologies:

  • Backend: FastAPI, Python 3.13, SQLModel, Argon2, ldap3

  • Frontend: React 18, TypeScript, Vite, React Router

  • Testing: PyTest, Vitest, Puppeteer

  • Deployment: Docker, Podman, multi-stage builds

  • Documentation: Gherkin BDD, EARS requirements, Markdown

Process Innovation:

  • Forward development: Requirements → Implementation

  • Reverse engineering: Implementation → BDD → Requirements

  • Continuous verification: AI-driven testing throughout

  • Living documentation: Maintained alongside code