Skip to main content
๐Ÿšฉ

Feature Flags Checklist for AI-Built Apps

Control rollouts and run experiments

When you vibe code feature flags with tools like Cursor, Lovable, Bolt, v0, or Claude Code, the generated code often works in development but misses critical production requirements. This checklist helps you catch what AI missed before you ship.

Danger Zone

moderate risk

Feature flags are supposed to reduce risk โ€” until they become the risk

A feature flag starts simple: an if-statement that checks whether something is on or off. But then you want to show the new checkout to just 5% of users. Then you want to exclude enterprise customers from that test. Then you want different behavior in different regions. Soon you have dozens of flags checking dozens of conditions on every page load, and nobody remembers which ones are still being used or what happens if they fail to load.

Failure scenario

You launch a new pricing page behind a flag, set to 10% of users. It works great. Three months later you're working on something else and accidentally delete the flag config. Suddenly 100% of users see the new page โ€” which was never finished. Or worse: the flag service goes down for 30 seconds and your entire app freezes because every page is waiting for flag checks that never return.

Common mistakes

  • Flags that call an external service on every page load, slowing everything down
  • No default behavior when flags can't be fetched โ€” the app just breaks
  • Flags left in the code forever after the experiment ends, creating confusion
  • Critical features gated behind flags that could accidentally get toggled
  • Flag checks that happen after showing content, causing the page to flicker
  • No record of what each flag does or who owns it

Time to break: 3-12 months as flags accumulate and nobody cleans them up

How are you building this?

Showing what to check when using a managed service

Audit Prompts

Copy these into your AI coding assistant to check your implementation.

Are flags slowing down your app?
performance
Look at where and how often we check feature flags. Are flags being checked on every page load? Are those checks cached (remembered) so we don't call the flag service repeatedly? Does the page wait for flags to load before showing anything? What happens if the flag service takes 3 seconds to respond? Is there a timeout?

Feature flag checks can add hundreds of milliseconds to every page load. Multiply that by every user on every page and your app feels sluggish for no obvious reason.

What happens when the flag service is down?
reliability
Check what happens if our feature flag service (LaunchDarkly, PostHog, etc.) is temporarily unavailable. Does each flag have a sensible default? Does the app still work or does it crash/freeze? Are defaults set close to where flags are used, or buried somewhere central where they're hard to find?

If your app breaks every time your flag service hiccups, you've turned an optional tool into a critical dependency. Defaults should let your app keep running.

Can flags accidentally break critical features?
reliability
List all features currently behind flags. Which ones are customer-facing and revenue-critical (like checkout, login, payments)? Are there safeguards preventing someone from accidentally toggling a critical flag? Is there a review process or is anyone on the team able to turn things on and off?

Feature flags give you power to change behavior without deploying code โ€” which also means power to accidentally break things in production without deploying code.

Are old flags cleaned up?
reliability
Check how many feature flags we have and when each was created. Are there flags older than 6 months? For each flag, is it documented what it does and why it exists? Is there a process for removing flags once an experiment is over?

Every flag adds complexity and slows things down slightly. Old unused flags accumulate like technical debt โ€” eventually nobody knows what's safe to remove.

Checklist

0/8 completed

Smart Move

It depends

Simple on/off flags that default to the safe behavior? Fine to build yourself. But the moment you need percentage rollouts, user targeting, or analytics on how users interact with different versions โ€” use a service. The math for consistent percentage rollouts is trickier than it looks, and tracking which users saw which version gets complicated fast.

PostHog

Feature flags plus analytics and session replay in one tool โ€” great for understanding how users actually use new features

1 million events free per month

Flagsmith

Open source and self-hostable if you want control, with a hosted option if you don't want to manage it

50,000 requests per month free

Vercel Edge Config

Dead simple if you're already on Vercel โ€” just key-value storage at the edge for instant flag checks

Included in Vercel hobby plan

LaunchDarkly

Enterprise-grade with advanced targeting, gradual rollouts, and approval workflows โ€” probably overkill unless you're already scaling

1,000 monthly active users free

Tradeoffs

Services add another external dependency and monthly cost as you grow. DIY flags are simple and fast but get messy quickly. If you build your own, commit to keeping it simple โ€” just on/off switches with sensible defaults, nothing fancier.

Did you know?

Knight Capital lost $440 million in 45 minutes in 2012 because of an old feature flag that accidentally got re-enabled, triggering dormant code that went haywire and placed millions of bad stock trades.

Source: SEC investigation report on Knight Capital trading error

Related Checks