When to Kill an AI Feature
Every product team I talk to is shipping AI features.
Most of them will be dead within a year.
Not because the models are bad. Not because the
engineering was sloppy. Because nobody asked the
only question that matters: does this feature earn
its place in the product?
I have shipped AI features that transformed workflows
and AI features that users ignored completely. The
difference was never the model quality. It was always
the product thinking behind it.
The AI Feature Graveyard
Here are real features I have seen shipped, celebrated
in internal demos, and then quietly removed.
The AI meeting summarizer. A SaaS product added
automatic meeting summaries. Demo looked great. In
production, 4% of users ever opened a summary. The
problem: people already had their own notes. The AI
summary was a second, worse version of something they
already had.
The AI-powered onboarding assistant. A chatbot
that guided new users through setup. It answered
questions accurately. Usage data showed 92% of users
dismissed it within 8 seconds. The existing step-by-step
wizard worked fine. Users did not want a conversation.
They wanted a checklist.
The predictive text composer. An email tool added
AI-generated reply suggestions. Accuracy was 70%.
That sounds decent until you realize users spent more
time reading and editing the suggestions than they
would have spent typing from scratch. Net time saved:
negative.
The smart dashboard. An analytics product shipped
an AI that surfaced "insights" from user data. The
insights were technically correct but obvious. "Your
traffic increased 23% this week." Users could see that
from the chart. They needed the why, and the AI
could not deliver it.
Every one of these features cost $40,000-$150,000
to build. Every one got killed within six months. The
pattern is always the same: the team built for the
demo, not for the daily workflow.
The "AI Because AI" Anti-Pattern
The most expensive product decision in 2025-2026 is
adding a chatbot because your competitor has one.
I watched a B2B platform spend four months building
a conversational interface for their settings page.
Four months. The existing UI had dropdown menus and
toggle switches. Users could configure everything in
under a minute. The chatbot took longer because users
had to describe what they wanted instead of just
clicking it.
When I asked why they built it, the PM said: "Our
competitor launched an AI assistant last quarter."
That is not a product decision. That is a fear
response.
The question is never "should we add AI?" The question
is "what user problem are we solving, and is AI the
best way to solve it?" If you cannot answer the second
part with specifics, you are building a demo, not a
feature.
Three Questions Before Building Any AI Feature
I run every AI feature idea through three filters.
If it fails any one of them, I do not build it.
1. Does the User Have This Problem Today?
Not hypothetically. Not in a survey where you asked
leading questions. Today, right now, are users
struggling with this task?
Check support tickets. Look at session recordings.
Find the places where users abandon flows or repeat
actions. Those are real problems.
I once proposed an AI feature that would auto-categorize
uploaded documents. Seemed obvious. Then I looked at
the data: 89% of users uploaded fewer than 5 documents
per month. They did not have a categorization problem.
They had a "where did I put that file" problem, which
was a search problem, not a classification problem.
The fix was a better search bar. Cost: two weeks of
engineering. The AI classifier would have cost three
months.
2. Is AI the Simplest Solution?
This is the filter that kills the most ideas, and
should.
A filter dropdown beats an AI recommendation engine
when users know what they want. Autocomplete beats a
chatbot when the input space is bounded. Templates
beat generative AI when the output format is
predictable.
I use a simple test: can a hardcoded rule or a
database query solve 80% of this problem? If yes,
start there. You can always add AI later. You cannot
easily remove it once users depend on the 20% it
handles.
A logistics company I worked with wanted an AI system
to recommend optimal shipping routes. We looked at
their data. Twelve routes accounted for 94% of
shipments. A lookup table solved the problem. Total
development time: three days.
3. Can You Measure Success?
If you cannot define what "good output" looks like,
you cannot improve the feature and you cannot justify
its cost.
For every AI feature, I define three metrics before
writing a single line of code:
- Task completion rate: did the user finish what
they started?
- Accuracy/quality score: is the AI output
correct? (This requires a rubric.)
- Cost per successful outcome: not cost per API
call, but cost per task the user actually completed.
If you cannot define these three for your feature,
you do not understand the feature well enough to
build it.
Measuring AI Feature Adoption
You shipped the feature. It passed the three filters.
Now you need to know if it is working. I track four
numbers.
Task Completion Rate
What percentage of users who start using the AI
feature actually finish the task? This is not the
same as usage. A user who opens the AI assistant,
gets a bad response, and manually completes the task
is a failure, not a success.
Benchmark: below 60% task completion after 30 days
means the feature has a problem. Below 40% means
kill it.
Time-to-Value
How long from clicking the AI feature to getting a
useful result? I measure this in seconds, not minutes.
An AI code review tool I built initially took 45
seconds to return results. Developers alt-tabbed
away and forgot about it. We got the latency to 8
seconds. Usage tripled. Same quality, same accuracy.
The only difference was speed.
If your AI feature takes longer than the manual
alternative, users will not adopt it regardless of
quality.
Return Usage Rate
Of users who try the feature once, what percentage
use it again within 7 days? This is the single best
signal for AI feature value.
- Above 40% return rate: the feature is working
- 20-40%: the feature needs iteration
- Below 20%: the feature is not solving a real problem
I have never seen a feature recover from below 15%
return usage. Not once.
User Effort Reduction
Measure the task with the AI feature versus
without it. Count clicks, keystrokes, time, and
error rate. If the AI version does not reduce effort
by at least 30%, users will default to the manual
path because it is familiar.
The Cost-Value Equation
Here is where most teams stop thinking, and where the
real decisions happen.
Every AI feature has a per-user-per-month cost.
Calculate it:
Monthly feature cost =
(API calls x avg cost per call)
+ (compute costs / active users)
+ (engineering maintenance hours x hourly rate
/ active users)
For a real example, I ran these numbers on an AI
document analysis feature:
| Component | Monthly Cost |
|-----------|-------------|
| LLM API calls (GPT-4o) | $2,400 |
| Embedding generation | $180 |
| Vector storage | $95 |
| Engineering maintenance (8 hrs) | $1,200 |
| Total | $3,875 |
With 1,200 active users, the cost per user per month
was $3.23.
The feature saved each user approximately 25 minutes
per month. At an average user salary of $85/hour,
that is $35.42 in time saved.
$3.23 cost for $35.42 in value. That is a 10.9x
return. That feature lives.
Now consider the AI meeting summarizer from earlier.
Similar cost structure, but only 4% adoption. The
cost per active user jumped to $80/month. For a
feature that saved maybe 5 minutes. That feature
dies.
The math is simple. Running the math is the part most
teams skip.
The Kill Criteria
I use three triggers. If any one fires, the feature
goes into a 30-day review. If two fire, I kill it
immediately.
Trigger 1: Adoption below 10% after 90 days.
You gave users three months. You iterated on the UX.
You sent product emails. If 90% of users still do
not touch the feature, the problem it solves is not
important enough.
Trigger 2: Cost per active user exceeds $15/month
(for a B2B product with $50-200 ARPU). Adjust this
threshold based on your pricing, but the principle
holds: if a single feature costs more than 10% of
what users pay you, it needs to deliver proportional
value.
Trigger 3: No measurable improvement after three
iterations. You changed the prompts. You improved
the retrieval. You redesigned the UI. Quality scores
did not move. This usually means the problem is
structural, not solvable with better engineering.
Killing a feature is not failure. Keeping a feature
that costs money and delivers nothing is failure.
What to Ship Instead
When I kill an AI feature, I almost always replace it
with something simpler that works better.
Autocomplete instead of generation. Users type
three characters, get suggestions from their own
historical data. No LLM calls. Sub-50ms latency.
Works offline.
Templates instead of AI drafting. Give users 5-10
proven templates they can customize. Faster than
waiting for AI output, more predictable, zero per-use
cost.
Smart defaults instead of AI recommendations.
Analyze user behavior in batch. Set defaults based on
what 80% of similar users chose. Update weekly, not
in real-time. The compute cost drops from dollars per
user to fractions of a cent.
Structured workflows instead of chatbots. A
three-step wizard with conditional branching handles
90% of use cases better than a free-text conversation.
Users prefer guided paths over open-ended prompts.
These alternatives are not exciting. They do not demo
well. They do, however, get used.
The One AI Feature That Always Works
In every product I have worked on, one AI feature
consistently justifies its cost: search.
Search works because the user intent is explicit. When
someone types a query, you know exactly what they
want. You can measure whether they found it. You can
calculate precision and recall. You can A/B test
ranking algorithms.
Semantic search with embeddings costs $0.001-0.01 per
query at scale. It handles typos, synonyms, and
natural language queries. Users already understand
the interaction model. There is no adoption friction.
Compare that to a chatbot where:
- User intent is ambiguous
- Success is subjective
- Quality measurement requires human evaluation
- Cost per interaction is 100x higher
If you are starting your AI product strategy, start
with search. Get it fast. Get it accurate. Then
measure whether users are asking questions that search
cannot answer. Those unanswered questions are your
roadmap for the next AI feature.
The Framework in Practice
Here is how I evaluate every AI feature request now:
- Problem validation: show me the support tickets,
session recordings, or churn data that proves this
problem exists
- Simplicity check: can we solve this with a
filter, template, or search improvement first?
- Measurement plan: what are the three metrics,
and what are the kill thresholds?
- Cost model: what is the per-user-per-month
cost at 1x, 10x, and 100x current scale?
- 90-day review: automatic check-in with data,
not opinions
Most AI feature ideas die at step 1. That is the
point. The features that survive all five steps are
the ones that actually make your product better.
The goal is not to ship AI. The goal is to solve user
problems. Sometimes AI is the answer. Most of the
time, it is not. The teams that figure out the
difference are the ones building products that last.