What Is Vibe Testing? How to Test Code You Didn't Write
Vibe Testing, Defined
Vibe testing is the practice of systematically reviewing and verifying AI-generated code for the specific categories of bugs, security vulnerabilities, and production risks that AI coding tools consistently introduce. It's not a replacement for traditional testing — it's an additional layer designed for the reality that most of your codebase was written by something that optimizes for “it compiles and runs” rather than “it's correct, secure, and production-ready.”
The term emerged alongside vibe coding — the practice of building software by describing what you want to AI tools like Cursor, Claude Code, Lovable, and Bolt, then shipping whatever they generate. Vibe coding made building fast. Vibe testing is what keeps that speed from becoming a liability.
If you've inherited a codebase that was mostly AI-generated, or you've been vibe coding and are now preparing to put real users on it, this guide covers what to check, why traditional testing misses it, and how to build vibe testing into your workflow.
Why Traditional Testing Falls Short for AI-Generated Code
Traditional testing assumes a human wrote the code. That assumption shapes everything about how we test: unit tests verify the logic the developer intended, integration tests confirm components work together as designed, and code review catches mistakes the author made. The entire framework is built around the idea that a person understood what they were building and might have made specific, human-shaped errors.
AI-generated code breaks this model. The failure modes are fundamentally different. A human developer who builds a login system probably understands authentication and might forget edge cases. An AI that generates a login system produces plausible-looking code that may have no real understanding of the security model underneath. It might generate a JWT implementation that never verifies signatures, or an OAuth flow that stores tokens in localStorage where any XSS payload can steal them. The code looks right. It even works in the happy path. But the failure isn't a missed edge case — it's a fundamental misunderstanding of the domain that happens to produce working code.
There's a second problem: AI tools generate their own tests. When you ask Cursor to build a feature and write tests for it, you get tests that validate the AI's implementation of the feature — not the feature's actual requirements. If the AI implemented the feature wrong in a subtle way, the tests will confirm the wrong behavior passes. You end up with 100% test coverage and zero confidence, because the same blind spots that produced the code also produced the tests.
Vibe testing addresses this gap. Instead of testing whether the code does what the AI intended, it tests whether the code does what production actually requires: secure authentication, safe data handling, proper error boundaries, and all the non-functional requirements that AI tools consistently skip.
The Vibe Testing Checklist
This is not an exhaustive list. It's a prioritized set of the issues that appear most frequently in AI-generated codebases, organized by the damage they cause when missed.
Security Patterns
Security is where AI-generated code fails most dangerously, because the failures are invisible during development. The app works perfectly — it just also works perfectly for attackers.
Start by grepping your codebase for hardcoded secrets. AI tools embed API keys, database URLs, and tokens directly in source files with alarming frequency. Check client-side bundles especially — a Stripe secret key in a React component is a live credit card processing credential visible to anyone who opens DevTools.
Next, trace every authenticated route. AI tools often add auth checks to the UI layer (hiding a button, redirecting in a useEffect) but skip server-side verification entirely. A direct API call bypasses the UI, and if the server doesn't check the session, any unauthenticated request succeeds. This is the single most common security gap in vibe coded apps.
Finally, test input handling. AI-generated code frequently builds SQL queries with string interpolation, renders user input as raw HTML, and passes unsanitized data to shell commands. These aren't edge cases — they're the default patterns that AI tools reach for unless specifically instructed otherwise. For a deeper dive, see the vibe coding security guide.
Production Readiness
AI-generated code is optimized for the demo. It works when one user hits the happy path. Production is not the demo. Production is a thousand concurrent users hitting edge cases your AI never considered while your third-party payment provider is having an outage.
Check error handling first. AI tools wrap code in try/catch blocks that either swallow errors silently or return the full stack trace to the client. Neither is acceptable. Every error boundary should log the error with enough context to debug it, return a safe message to the user, and surface in your monitoring system. If your app doesn't have a monitoring system, that is the first thing to fix.
Check rate limiting. AI tools almost never add it. Without rate limiting, a single script can exhaust your API quotas, run up your cloud bill, or brute-force every account on your platform. Authentication endpoints, form submissions, and any route that triggers external API calls need rate limits.
Check data validation. AI tools validate on the client and trust on the server. Every API route that accepts user input needs server-side validation with a schema library like Zod — not just type checking, but business logic validation. Is that quantity field actually a positive integer? Is that email address in a valid format? Is that date in the future when it should be?
Architecture Smells
AI tools have a distinctive architectural fingerprint. They over-abstract early, create unnecessary wrapper classes, and pull in dependencies for problems that don't exist yet. A three-page marketing site doesn't need a state management library. A CRUD app doesn't need an event sourcing framework.
Audit your dependency tree. AI tools add packages without evaluating maintenance status, bundle size, or security posture. You might find packages with known vulnerabilities, packages that haven't been updated in years, or five different libraries that all do the same thing because the AI used a different one each time it generated a feature.
Look for dead code. AI tools generate utility functions, helper classes, and abstraction layers that nothing uses — artifacts of earlier prompts that were superseded but never cleaned up. Dead code isn't just clutter; it's surface area for bugs and a tax on every developer who has to understand the codebase.
Performance
AI-generated code works. It does not necessarily work efficiently. The performance patterns AI tools produce are the ones most commonly found in tutorials and Stack Overflow answers — which are optimized for clarity, not for scale.
The most common issue is N+1 queries. AI tools fetch a list of items, then loop through and fetch related data for each item individually. With 10 items, this is 11 database queries instead of 2. With 1,000 items, it's 1,001 queries, and your database is the bottleneck. Check every loop that contains a database call or API request.
Check for unbounded data fetching. AI tools write queries that fetch all records without pagination or limits. This works in development with 50 rows in the database. It crashes in production when the table has 500,000 rows and your server runs out of memory trying to serialize the response.
Look at database indexes. AI tools create tables and write queries but rarely add indexes for the columns being filtered and sorted. Your queries work fine until the table grows past a few thousand rows, then response times climb from milliseconds to seconds. Check every WHERE clause and ORDER BY against your index definitions.
The Checklist
Vibe Testing Tools
No single tool covers everything. A practical vibe testing setup combines several approaches, each targeting a different failure mode.
Static Analysis
ESLint with security-focused plugins (eslint-plugin-security, eslint-plugin-no-secrets) catches hardcoded credentials and common injection patterns. Semgrep provides deeper taint analysis — tracking data flow from user input to dangerous sinks like SQL queries and HTML rendering. These tools run fast, catch the mechanical issues, and integrate into any CI pipeline.
Dependency Auditing
npm audit, Snyk, and Socket.dev scan your dependency tree for known vulnerabilities and supply chain risks. This matters more for AI-generated code because AI tools add dependencies without evaluating them. Run these on every PR, not just periodically.
AI-Aware Code Review
Tools like Vibe Check are designed specifically for AI-generated codebases. Rather than looking for generic code quality issues, they scan for the specific patterns AI tools produce: missing server-side auth checks, client-side-only validation, hardcoded development configs, and the other failure modes covered in this guide. Vibe Check runs as a plugin inside your AI coding tool and scans across 12 production domains, generating fix prompts you can apply directly.
Manual Review
Automated tools catch patterns. They don't catch logic errors, business rule violations, or architectural decisions that are technically valid but wrong for your specific context. Every AI-generated feature needs at least one human reading the code with the question: “does this actually do what we need, or does it do what the AI assumed we need?” This is especially important for payment flows, permission systems, and anything involving user data.
Vibe Testing in CI/CD
Vibe testing belongs in your CI pipeline, not in a quarterly audit. The whole point is catching issues before they reach production, and in a vibe coding workflow where features ship in hours, “we'll review it later” means “it's already in production with real user data.”
A minimal CI vibe testing pipeline has three stages. First, static analysis runs on every commit — ESLint security rules, secret detection, and dependency auditing. These are fast (under 30 seconds) and catch the highest-severity issues. Second, an AI-aware scan runs on every pull request — tools like Vibe Check or Semgrep with custom rules for AI-generated patterns. This is slower (1-3 minutes) but catches architectural and logic issues that static analysis misses. Third, a human reviews every PR that touches authentication, payments, data access, or user permissions. No exceptions.
The goal is not to slow down the vibe coding workflow. It's to make the feedback loop tight enough that issues get caught in the PR, not in production. A developer who sees “missing server-side auth check on /api/admin” in their PR within 2 minutes will fix it immediately. The same developer discovering it in a security audit three months later has to context-switch back into code they've forgotten, while the vulnerability has been exploitable the entire time.
Vibe Test Your Codebase Now
Find out what your AI coding tool missed — across security, performance, and 10 other production domains.