The When/If/Then Framework for Customer Support Workflows
How to write workflow rules in plain English that don't break in production, the exact pattern, real examples for ecommerce, and the mistakes that kill rule-based automation.
The single biggest reason support automation rules fail in production isn't the AI, it's how the rules are written. Vague rules produce vague behavior. The When/If/Then pattern is a forcing function for specificity.
The pattern aligns closely with how the model providers themselves describe building reliable LLM agents. Anthropic's Building effective agents and OpenAI's function calling documentation both reach the same conclusion: predictable agent behavior comes from explicit, structured triggers and bounded actions, not from open-ended prompting. The When/If/Then format is the operational version of that engineering discipline.
If you've struggled with workflow rules that work in testing but fall apart with real tickets, this is the format that fixes it.
The pattern
A workflow rule has four parts:
When [trigger event] / If [conditions are met] / Then [take this action] / Else [escalate or fall back]
Read it out loud. If any of the four parts is fuzzy, the rule will produce fuzzy results.
Why each part matters
When (the trigger)
The trigger is the inbound event that activates the rule. It should be specific enough that you'd recognize it on a real ticket.
- ✅ Good: "When a customer asks where their order is"
- ✅ Good: "When a customer requests a refund"
- ❌ Bad: "When the AI is unsure" (too vague, what AI does is the rule's job)
- ❌ Bad: "When a customer is frustrated" (too vague, define how to detect frustration)
The intent classifier handles the matching. Your job is to define the intents clearly.
If (the conditions)
Conditions are the data that determines whether the action is appropriate. Be specific. Use AND/OR explicitly.
- ✅ Good: "If the order is unshipped AND placed within the last 30 days"
- ✅ Good: "If the customer is flagged VIP OR has spent over $1,000 in the last year"
- ❌ Bad: "If it's reasonable" (whose definition?)
- ❌ Bad: "If the customer is unhappy" (you'd need a sentiment threshold)
Each condition should be something the AI can actually check, a data field, a flag, a computed value. If you find yourself writing conditions the AI can't verify, the rule won't work.
Then (the action)
The action is what happens when the conditions are met. Be explicit about every system change.
- ✅ Good: "Process a full refund to the original payment method, send the customer a confirmation email, tag the ticket 'auto-refund-within-policy'"
- ✅ Good: "Generate a return label, attach it to the reply, update the order status to 'return-initiated'"
- ❌ Bad: "Handle the refund" (what does that mean?)
- ❌ Bad: "Apologize and resolve" (how?)
Multiple actions in a single Then are fine, list them all explicitly. The AI executes them in order.
Else (the fallback)
The Else is what happens when conditions aren't met. This is non-negotiable. Without it, the AI is guessing.
- ✅ Good: "Else escalate to a human with a recommendation to refund as goodwill"
- ✅ Good: "Else ask the customer to provide their order number"
- ✅ Good: "Else tag the ticket 'refund-outside-policy' and queue for manual review"
- ❌ Bad: "Else do nothing" (the customer is left hanging)
- ❌ Bad: (omitted) (the AI improvises)
Always have an Else. Always.
Real examples for ecommerce
These are production-grade rules, written for clarity and execution.
WISMO (where is my order?)
When a customer asks for order status (e.g., "where's my order", "shipping update", "tracking info")
If the customer's most recent order has a tracking number AND it's in transit or delivered
Then reply with the carrier name, tracking link, last-known status, and estimated delivery; tag the ticket "wismo-resolved"
Else ask for the order number if not provided, OR escalate to a human if the order shows no tracking and was placed >5 days ago
Refund within policy
When a customer requests a refund
If the order was placed in the last 30 days AND no items have shipped AND the order total is under $200 AND the customer has <3 refunds in the last 90 days
Then process the refund to the original payment method, send confirmation, tag "auto-refund-within-policy", note the timestamp and trigger
Else escalate to a human with the customer's order details, refund history, and a recommended action (approve / partial / decline with goodwill offer)
Subscription pause
When a subscriber asks to pause their subscription
If the subscription is active AND has no failed payments in the last 90 days AND the next billing cycle is more than 7 days away
Then pause until the date the customer requested (or default to 30 days if no date), update Stripe, send confirmation
Else if the next bill is within 7 days, ask if they want to skip this cycle or pause after the next charge; if there's a payment failure, escalate to a human to investigate
Return initiation
When a customer wants to return an item
If the order was placed in the last 30 days AND items have shipped AND the customer specifies which item to return
Then generate a prepaid return label via the warehouse API, email it to the customer, update the order status to "return-pending", set expected refund amount based on item price
Else if the return window has expired, explain the policy and offer store credit if eligible; if items haven't shipped, treat as cancellation; if the customer hasn't specified items, ask which items
Address change
When a customer asks to change the shipping address on a recent order
If the order has not yet been picked up by the carrier (status is "processing" or "pending fulfillment")
Then update the address in Shopify, confirm the new address with the customer, tag "address-updated"
Else if the order has shipped, explain that the address can't be changed and offer to help with carrier-side reroute; escalate if the customer pushes back
Common mistakes (and how to spot them)
Mistake 1: Conditions that aren't actually checkable
You wrote: "If the customer seems reasonable." The AI can't check that. Replace with: "If the customer's tone is neutral (sentiment > 0.3 on the standard scale)."
Mistake 2: Multi-part Then with no order
You wrote: "Then process the refund and send the email." Fine, but in what order? If the email fails, did the refund happen? Make it explicit: "Then 1) process refund, 2) send confirmation, 3) tag ticket. If step 1 fails, escalate before step 2."
Mistake 3: Else that defaults to "do nothing"
You wrote: "Else don't take action." The customer is left hanging. Replace with an explicit fallback: "Else send the customer a message acknowledging the request and tag for human follow-up within 4 hours."
Mistake 4: Rules that depend on context the AI doesn't have
You wrote: "If this is a regular customer." What's "regular"? Number of orders? Lifetime value? Frequency? Replace with: "If the customer has placed 3+ orders OR has lifetime value over $300."
Mistake 5: Rules that contradict each other
You have two rules: "Refund all returns within 30 days" and "Don't refund items damaged in transit." If a customer returns a damaged item within 30 days, what wins? Make the priority explicit. Most platforms support rule priority, use it.
Layering rules
Most categories need a few rules at different priority levels:
Priority 1 (highest): Hard escalations, VIP customers, legal language, fraud signals, dollar threshold breaches. These always escalate, regardless of what other rules say.
Priority 2: Specific exceptions, "If customer has spent >$1,000 in the last year, refund outside the 30-day window automatically."
Priority 3 (default): The general policy rule, "Refund within 30 days, unshipped, etc."
When a ticket comes in, the highest-priority matching rule wins. This lets you handle exceptions cleanly without breaking the general logic.
How to test rules before going live
Before turning on autonomous mode for a category:
- Pull 100 historical tickets in that category from the last 30 days.
- Run them through the rule on paper. For each ticket, write down: would the rule match? What action would it take? Does that match what your team did in reality?
- Categorize the misses. Tickets where the rule would have done something different from what your team did. Are these:
- Cases where the AI would actually be right (your team made an exception that wasn't policy-aligned)?
- Cases where the rule is too aggressive (need to add conditions)?
- Cases where the rule is too cautious (need to remove conditions)?
- Iterate. Update the rule. Run again on the same 100 tickets.
- Don't go live until <5 honest misses out of 100.
This is tedious. It also gives you confidence that the rule will behave correctly in production. Skipping it is the most common reason rollouts fail in week 3.
Why this framework works
The When/If/Then format does three things rule-based automation needs:
- Forces specificity. Vague terms get exposed when you try to write them in this structure.
- Maps cleanly to AI execution. Modern platforms convert this format directly into the underlying logic without translation loss.
- Is auditable by anyone. A rule written this way can be reviewed by your operations lead, legal, finance, or a new hire, they don't need to know the platform's internals.
A common pattern when teams refactor existing automation: 30+ flowchart-style workflows compress into 8–12 well-written When/If/Then rules. Less complexity, better coverage, faster iteration.
The rule of thumb
If a rule can't be read out loud and understood by someone who's never seen your platform, it won't survive production. That's the whole framework in one sentence. Everything above is just how to get rules to that bar.
For a worked example of these rules running in production, see the refund automation walkthrough for Shopify and Stripe. Every step of that post is a When/If/Then rule. And if you've struggled to get rule-based automation working at all, the 7 reasons automation projects fail post covers the rule-design failures specifically (#3 and #6).
Sources
- Anthropic, Building effective agents, model-provider research on structured triggers, bounded actions, and predictable LLM agent behavior. The When/If/Then pattern is the operations layer of these engineering principles.
- OpenAI, Function calling documentation, model-provider documentation on tool definitions, parameter schemas, and reliable execution patterns for LLM agents.
- Forrester, Predictions 2026: AI Gets Real For Customer Service, But It's Not Glamorous Work, analyst view on the operational discipline that determines whether AI rule-design holds up in production.
Frequently asked questions
Why is plain English better than visual flowchart editors?
Three reasons: anyone on your team can write or audit a rule, the rule is self-documenting, and modern AI platforms convert plain-English rules into the underlying logic better than visual editors do. Flowchart editors made sense in 2018 when AI couldn't parse intent, in 2026 they're an unnecessary abstraction layer.
How specific should each rule be?
Specific enough that someone could execute it without asking questions. 'Refund within policy' is too vague. 'Refund the full order amount to the original payment method, if order was placed in the last 30 days, no items have shipped, and the order total is under $200' is the right level.
What if a rule depends on multiple conditions?
Combine them in the If branch with AND or OR. Keep it readable: 'If the order is unshipped AND the request is within 30 days AND the customer has fewer than 3 prior refunds.' If you need more than 4 conditions, you probably have multiple rules collapsed into one, split them.
How do I handle exceptions to my rules?
Make exceptions explicit rules of their own, layered above the main rule. The general rule covers 90%; the exception rule (with higher priority) covers the 10% special case. AI platforms with rule priority features handle this cleanly.
Should every rule have an Else branch?
Yes. The Else is where reliability lives. Without it, the AI will guess what to do when conditions aren't met, and guessing leads to bad outcomes. Common Else patterns: 'escalate with [specific reason]', 'send a follow-up question to gather missing info', 'tag and queue for human review.'