What to Automate First in Customer Support: A Tier List for Ecommerce
The exact order to automate customer support tickets, which categories to start with, which to add later, which to leave with humans, and why the sequence matters.
Pick the wrong category to automate first and you can spend six weeks fighting issues that wouldn't have appeared if you'd started somewhere else. Refunds are the canonical wrong answer (everyone wants to "prove" automation by tackling them; everyone gets stuck on judgment and exceptions instead). WISMO is the canonical right answer (high volume, fully data-driven, low downside if the AI is slightly wrong).
Sequence matters more than vendor choice. The tier list below is the order to attack categories in, based on the patterns visible in publicly documented automation rollouts and our pilot conversations.
How to read this
Each tier has three dimensions:
- Volume: how much of your inbound this category represents
- Difficulty: how hard the category is for AI to handle
- Risk: cost of getting it wrong
The optimal sequence is high-volume + low-difficulty + low-risk first, decreasing volume and increasing difficulty over time.
Tier 1: Automate first (week 1–2)
These are the slam dunks. Automate them in week 1 and you'll see immediate wins.
WISMO ("Where is my order?")
- Volume: 40–60% of typical ecommerce inbound. Often higher right after holidays.
- Difficulty: Low. Pure data lookup.
- Risk: Very low. The worst case is "the AI sent old tracking info", easy to fix.
- Expected resolution rate: 90–98%.
The canonical first category. Connect to Shopify, pull tracking, send the customer a clean reply with carrier name, status, ETA. Your team will immediately notice the relief.
Order status questions
- Volume: 5–15% (often counted with WISMO).
- Difficulty: Low.
- Risk: Very low.
- Expected resolution rate: 90–98%.
Variations on WISMO that don't quite fit ("did my order ship", "is my order processed", "when will it arrive"). Same playbook.
Address changes pre-shipment
- Volume: 2–5%.
- Difficulty: Low. Single rule with a clear data check.
- Risk: Low. Worst case is a customer's package goes to the wrong place, recoverable via rerouting.
- Expected resolution rate: 85–95%.
If the order hasn't been picked up by the carrier, update the address. If it has, escalate. The rule is simple and the customer benefit is huge, fixing an address mistake instantly is the kind of thing that earns CSAT points.
Order modifications within a defined window
- Volume: 1–3%.
- Difficulty: Low to medium depending on the action.
- Risk: Low.
- Expected resolution rate: 80–90%.
Adding a note, swapping a size on a pre-fulfillment order, applying a coupon retroactively (within reason). Bound the window tightly, "within 1 hour of order, no payment captured yet", and let the AI handle it.
Tier 2: Automate after a month of data (week 3–6)
Once Tier 1 is humming, expand into the next layer.
Returns initiation
- Volume: 8–15%.
- Difficulty: Medium. The AI generates a label, sends an email, updates the order.
- Risk: Low. Worst case is the customer returns something not eligible, recoverable via inspection.
- Expected resolution rate: 85–95%.
The mechanical part of returns is rule-bound: generate label, email it, mark the order. The judgment part (whether to refund or replace, exception handling) stays in Tier 3.
Refunds within policy
- Volume: 5–10%.
- Difficulty: Medium. Strict policy rules with clear data checks.
- Risk: Medium. Bad refunds are recoverable but visible.
- Expected resolution rate: 80–95%.
Within 30 days, unshipped, under threshold, no fraud signals → refund. Anything outside those bounds → escalate. Our refund automation walkthrough has the full pattern.
Subscription pause / skip / swap / change frequency
- Volume: 5–10% for subscription brands.
- Difficulty: Low to medium.
- Risk: Low. Pauses and skips are reversible.
- Expected resolution rate: 90–95%.
Customers love instant subscription edits. The AI checks active status, no failed payments, and applies the change. Pause for 30 days, skip the next cycle, swap product flavor, all rule-bound.
Account questions (password, login, email)
- Volume: 3–8%.
- Difficulty: Low to medium.
- Risk: Low for password resets, slightly higher for email changes (security).
- Expected resolution rate: 70–90%.
Password resets are dead simple. Login issues sometimes need investigation. Email changes need verification (security risk if automated naively). Build in proper verification flows.
FAQ-style policy questions
- Volume: 5–15% if your KB is good.
- Difficulty: Low if KB is comprehensive, high if it isn't.
- Risk: Low.
- Expected resolution rate: 80–95% with good docs.
"What's your shipping policy?" "How long do refunds take?" "Do you ship internationally?" If these are well-documented, AI handles them flawlessly. If they're not, fix the docs first.
Promo code and pricing questions
- Volume: 2–5%.
- Difficulty: Low.
- Risk: Low.
- Expected resolution rate: 85–95%.
"Is X on sale?" "Can I use this code?" "Why am I being charged $Y?" Mostly data lookups against your active promotions and the customer's cart.
Tier 3: Augment, don't fully automate (week 6+)
These categories benefit from AI assistance but humans should still be in the loop.
Refunds outside policy
- Mode: AI drafts, human approves.
- Why hybrid: Outside-policy refunds are judgment calls. The AI can recommend ("approve goodwill refund of $X based on customer history") but a human should sign off.
- Path to full automation: After 60 days of hybrid, automate the cleanest sub-cases (e.g., 1–7 days outside window for repeat customers). Keep the more ambiguous ones in hybrid mode.
Sizing and fit complaints
- Mode: AI answers the data part, hybrid for emotional part.
- Why hybrid: A sizing complaint often combines a data question ("the size chart says X") with disappointment ("but it doesn't fit"). The AI can answer the first part; humans handle the second.
Damaged or wrong item complaints
- Mode: AI gathers info and starts the workflow, human decides resolution.
- Why hybrid: Customers want acknowledgment first ("oh no, that's awful"). AI can express sympathy but humans do it more credibly. The AI's job is to gather photos, document the issue, and start the replacement workflow, leaving the resolution decision to a human.
Cancellations from upset customers
- Mode: AI offers retention path, hybrid if customer pushes through.
- Why hybrid: Cancellation is a moment of friction. AI offers the standard retention options (pause, discount, swap). If the customer accepts, AI completes it. If the customer rejects and insists, escalate to a human who might find a better answer.
Loyalty / VIP issues (lower-tier VIPs)
- Mode: AI drafts replies, humans send.
- Why hybrid: VIP customers expect a personal touch. Even good AI feels transactional. The AI can prepare a perfect reply with full context; the human reviews and personalizes before sending.
Tier 4: Don't automate (yet)
Some categories should stay with humans permanently, or at least for the foreseeable future.
VIP-tier customer issues
- Why: VIP relationships are about feeling known. Automation breaks that feeling.
- Threshold: Define VIP explicitly. Common: top 5% by LTV, or named accounts, or lifetime spend > $X.
- Exception: VIPs can opt into AI handling for routine questions. Some prefer instant response.
Legal, regulatory, fraud language
- Why: Cost of getting it wrong is too high (legal liability, reputation, fraud losses).
- Detection: Keyword-based escalation. Words like "lawyer", "lawsuit", "fraud", "chargeback dispute", "regulator" trigger immediate human routing.
Wholesale, B2B, partnership inquiries
- Why: Different conversation style, longer cycles, often relationship-driven.
- Routing: Tag and route to the appropriate team (sales, partnerships, BD).
PR and media inquiries
- Why: A wrong answer here ends up in print.
- Routing: Tag, route to founder or PR contact.
Customers explicitly asking for a human
- Why: Always honor this. The cost of refusing is huge; the cost of routing is minimal.
- Detection: Keywords ("speak to a human", "real person", "agent please") plus repeated frustration patterns.
Sequencing in practice
For a typical ecommerce brand starting from zero, here's a realistic 12-week plan:
Week 1: WISMO live in autonomous mode. Watch metrics daily. Fix issues as they appear.
Week 2: Address changes added. Order status questions added. Tier 1 should now cover ~50% of inbound autonomously.
Week 3–4: Subscription pause/skip/swap added (if applicable). Account password resets added. Tier 1+ now covers ~60–65%.
Week 5–6: Returns initiation added. FAQ-style policy questions added. Total ~70%.
Week 7–8: Refunds within policy added. Promo code questions added. Total ~75–80%.
Week 9–10: Tier 3 hybrid mode for refunds outside policy and damaged items. Humans now get pre-drafted replies for the harder tickets.
Week 11–12: Tune escalation rules based on real data. Audit CSAT by category. Identify any category dragging metrics.
By week 12, teams that follow this sequence with clean docs and tight escalation rules tend to land in the 60%+ autonomous-resolution range with healthy CSAT, though the actual ceiling depends on your specific ticket mix and policy clarity. The next 6 months are about tuning, not adding new categories.
What can go wrong
The most common derailment points:
- Skipping Tier 1. Teams think WISMO is "too easy" and start with Tier 2. They struggle. They restart with WISMO. They lose 4–6 weeks.
- Adding Tier 2 too fast. Don't add a new category until the previous one is stable for 1–2 weeks.
- Forgetting Tier 4 boundaries. A VIP customer hits the AI by accident, has a bad experience, complains publicly. Define VIP escalation rules in week 1.
- Skipping the metrics review. Without weekly tracking, you won't notice when a category's CSAT drops.
If you remember three things
- Sequence beats vendor choice. Skipping Tier 1 to start with refunds is the single most common reason rollouts stall.
- Tier 4 is permanent. The list of categories not to automate isn't a temporary state to grow out of; it's the floor under which routing should always go to humans.
- The compounding effect of doing Tier 1 well is bigger than most teams expect. A clean WISMO deployment is the foundation that makes Tier 2 work.
For the full rollout playbook including measurement, see the customer support automation playbook. Once you've picked Tier 1 and are ready to write the rules, the When/If/Then framework covers the exact pattern.
Sources
- Bank of America, A Decade of AI Innovation: Erica Surpasses 3 Billion Client Interactions, customer-side reference for what Tier 1 categories look like at sustained scale (50M users, 58M monthly interactions on routine financial queries).
- Forrester, Predictions 2026: AI Gets Real For Customer Service, But It's Not Glamorous Work, analyst view on why sequencing and category-by-category rollout matters more than vendor selection.
Frequently asked questions
Why is the order so important?
Confidence compounds. WISMO success in week 2 builds team trust, gives you data on AI behavior, and exposes knowledge base gaps cheaply. Starting with refunds means dealing with judgment calls before you have any of that. Teams that skip the order end up restarting from Tier 1 anyway, just 6 weeks later.
How long should I stay in Tier 1 before adding Tier 2?
Two to four weeks of stable Tier 1 metrics. 'Stable' means autonomous resolution >85% on Tier 1 categories, CSAT within 5 points of human, escalation rate flat week-over-week. If those numbers are still moving, fix Tier 1 before adding complexity.
Can I skip a tier if I'm confident?
We strongly recommend not skipping. The teams who insist on jumping to Tier 3 (the high-value, high-judgment work) without proving Tier 1 and 2 first usually fail in ways that take weeks to recover from. The tier order is empirical, it works because each tier earns the right to add the next.
What if my volume mix is unusual (e.g., mostly subscription complaints)?
The principle still holds: start with whatever subset of your volume is highest-quantity, lowest-judgment, lowest-stakes. For subscription-heavy businesses, that's often pause/skip/swap before cancellations or billing disputes. Adapt the categories, keep the order.
When should I revisit Tier 4 (don't automate)?
Most Tier 4 categories should stay there permanently. The exception is hybrid AI assistance, using AI to draft replies for VIP issues that humans then send. That's not 'automating' VIPs, it's giving humans better tools. Worth revisiting in year 2.