Welcome to SIGNAL by Happy Robots. We review recently published papers - separating the signal from the noise for business operators using LLMs in their day-to-day.
Here's the uncomfortable truth about GenAI adoption: it's not working equally for everyone.
A massive field experiment involving millions of users at a leading online retail platform just revealed something most vendors don't want you to know: When the company deployed GenAI across seven different workflows, productivity gains ranged from 0% to 16.3%.
Same company. Same technology. Wildly different results.
If you're a leader trying to justify GenAI investments, this should terrify you—or at least make you ask better questions.
The Study Nobody Can Dismiss
This isn't another vendor case study or anecdotal success story. Researchers from Cornell and Columbia partnered with one of the world's largest cross-border e-commerce platforms, running controlled experiments over six months (September 2023 to June 2024) involving millions of users and products.
They tested GenAI across seven distinct consumer-facing workflows:
1. Pre-sale Service Chatbot - 24/7 customer service answering product questions
2. Search Query Refinement - Understanding and improving multilingual search queries
3. Product Description Generation - Creating comprehensive product descriptions
4. Marketing Push Messages - Personalizing promotional messages
5. Google Advertising Titles - Optimizing product ad titles
6. Chargeback Defense - Automating dispute resolution for sellers 7. Live Chat Translation - Real-time translation for customer service
Here's what makes this bulletproof: They held inputs and prices constant. They randomized who got GenAI features and who didn't. They tracked every transaction. No variables except the GenAI enhancement changed between test and control groups.
The results? Four applications showed measurable gains. Three showed nothing—or worse.
The Winners and Losers
Big wins: - Pre-sale chatbot: +16.3% in sales (conservative estimate; combined with human escalation hit +25%) - Chargeback defense: +15% success rate in resolving disputes - Live chat translation: +5.2% customer satisfaction
Modest but meaningful gains: - Search query refinement: +2.93% in sales - Product descriptions: +2.05% in sales
The failures: - Marketing push messages: +1.6% (not statistically significant) - Google ad titles: -4.5% (actually negative—the model stripped out commercial keywords advertisers rely on)
Aggregate value across the four successful applications? Approximately $5 per consumer annually. At the platform's scale (hundreds of millions of users), that's hundreds of millions in incremental revenue.
But here's what matters most: That $5 represents 5.5-6% of the entire per-user revenue growth observed in global e-commerce between 2023 and 2024. These aren't trivial gains—they're capturing a measurable chunk of industry growth.
Why Some Applications Won While Others Failed
The critical factor: GenAI delivered the biggest lift where existing capabilities were weak or nonexistent.
Look at what GenAI replaced in each workflow:
Big gains when GenAI filled a void: - Pre-sale chatbot replaced nothing—the platform offered zero pre-sale support due to resource constraints. Going from no help to instant 24/7 multilingual assistance? Massive impact. - Chargeback defense replaced inadequate support—over half of disputes went unaddressed, and the few that were handled used generic templates that rarely worked.
Modest gains when GenAI augmented existing systems: - Search refinement improved already functional machine-learning algorithms. Better, but incremental. - Product descriptions enhanced human-written content that was already there. Helpful, but not transformative.
Failures when GenAI undermined existing optimization: - Google ad titles replaced carefully crafted human-generated titles with commercial keywords optimized over years. GenAI stripped out those keywords, making ads less effective. The model wasn't trained on what actually converts in e-commerce advertising.
The Real Mechanism: Friction Reduction, Not Cost Savings
Here's what's remarkable: GenAI didn't make the platform faster or cheaper. It made buying easier.
Across every successful workflow, the gains came through higher conversion rates—up to 22% increases in the likelihood customers would actually complete a purchase. Average cart values? Unchanged.
This means: - Better chatbots reduced information asymmetries (customers got answers) - Refined search queries lowered discovery friction (customers found what they wanted) - Richer descriptions reduced uncertainty (customers understood products better) - Real-time translation closed communication gaps (customers got help in their language)
GenAI didn't optimize costs. It expanded the market by reducing reasons not to buy.
The Hidden Pattern: Who Benefits Most
Here's where this research gets even more interesting. The researchers analyzed who benefited most from each GenAI enhancement by examining seller characteristics, buyer experience, and product categories.
The pattern is clear and consistent:
On the supply side: Smaller and newer sellers saw dramatically larger gains.
These sellers lacked the resources to create comprehensive product listings, optimize search presence, or provide customer support. GenAI gave them enterprise-level capabilities instantly.
Think about it: A small manufacturer in a developing market suddenly has: - Professional product descriptions in 20 languages - 24/7 multilingual customer service - Optimized search visibility
That's transformative. Meanwhile, large established sellers with mature operations saw modest improvements—they'd already optimized these workflows manually.
On the demand side: Less experienced consumers benefited disproportionately.
New platform users and infrequent shoppers struggled most with navigation, product discovery, and understanding descriptions. GenAI assistance helped them complete purchases they would have abandoned.
Power users who already knew how to navigate the platform efficiently? Minimal improvement. They didn't need the help.
This isn't "skill-biased" technology—it's skill-leveling.
Unlike previous technology waves (computers, internet) that favored already-skilled workers and widened inequality, GenAI is closing capability gaps. It's giving the least capable participants access to tools that bring them closer to the most capable.
The platform itself confirmed this: GenAI adoption didn't just improve productivity—it reduced the variance in outcomes across sellers and buyers.
What This Means for Your GenAI Strategy
If you're deploying GenAI across your organization and expecting uniform results, you're setting yourself up for disappointment—and wasted budget.
This research destroys three common myths about GenAI adoption:
Myth 1: "GenAI saves money by reducing headcount."
Reality: The biggest gains came from expanding markets and improving customer experience, not cutting costs. The pre-sale chatbot didn't replace workers—it filled a gap where no workers existed. Real productivity came from higher conversion, not lower costs.
Myth 2: "Deploy GenAI everywhere to maximize ROI."
Reality: Three out of seven workflows saw zero or negative returns. The Google ad title workflow actually decreased performance by 4.5% because the model wasn't trained on what works in e-commerce advertising. Blanket deployment is how you waste money.
Myth 3: "GenAI benefits everyone equally."
Reality: Benefits were wildly uneven. Small sellers and inexperienced buyers saw massive gains. Large sellers and power users saw minimal improvement. If you're already good at something, GenAI won't help much.
The question isn't "Will GenAI improve productivity?"
The question is: "Where is GenAI's marginal contribution highest relative to what we already do well?"
Four Questions to Ask Before Your Next GenAI Deployment
- What exists there now? If the answer is "nothing" or "inadequate manual process," GenAI will likely deliver significant gains. If it's "optimized automated system," expect minimal returns.
- What friction exists in this workflow? Information asymmetry? Search difficulty? Communication barriers? GenAI excels at reducing these. No clear friction? Look elsewhere.
- Who struggles most with this workflow? Less skilled users, smaller teams, and new participants will benefit most. If everyone's already an expert, the marginal contribution is near zero.
- Can you measure the result? This platform tracked conversion rates, sales, and satisfaction. Without measurement, you're flying blind. The Google ad failure only surfaced because they measured it.
The Real Lesson (And a Reality Check)
GenAI isn't magic. It's a tool that reduces friction and fills capability gaps—when deployed in the right places.
The research makes the pattern crystal clear:
GenAI delivers maximum value when: - Current processes have gaps or inadequacies - Users face clear friction points - Existing capabilities are weak or nonexistent - You're expanding access to previously unavailable services
GenAI delivers minimal or negative value when: - Current processes are already optimized - Users are already expert-level - Existing tools already solve the problem effectively - You're replacing carefully tuned human optimization
This e-commerce platform spent six months rigorously testing seven workflows across millions of transactions. They found that four delivered meaningful returns and three didn't. That's a 57% success rate with world-class implementation by a sophisticated tech company with unlimited access to AI resources.
What's your success rate going to be if you're not systematically testing and measuring?
The companies winning with GenAI aren't deploying everywhere. They're deploying strategically—finding high-friction, low-capability workflows first, measuring rigorously, and scaling only what works.
Stop chasing uniform adoption. Start chasing marginal contribution.
Because somewhere between 0% and 16.3% is exactly where your GenAI ROI lives. And the only way to find it is to test, measure, and ask better questions about where GenAI actually adds value you can't get another way.
P.S. - Speaking of measurement: This is exactly why we built PRISM. Before you deploy GenAI to millions of users, shouldn't you test which models and prompts actually deliver the highest marginal contribution for your specific workflows? That Google ad title failure cost this platform real money because they didn't optimize the prompt for e-commerce conversion. We can help you avoid that.