At what seems like every other conference on mid-market business and digitalization, the same format is being offered: a one-day workshop that ends with a list of tasks the company could automate with AI. It's a format that sells well. It provides an overview, it can be presented, it creates the feeling that something has been set in motion. What it rarely provides is clarification of what actually happens when one of the selected tasks goes wrong in practice.
But this is precisely the decisive question. Because the suitability of a task for AI cannot be derived from its properties—not from whether it's repetitive or creative, rule-based or judgment-dependent. It can only be derived from the costs incurred by its possible errors. And these costs are practically absent from the standard classifications that circulate in workshops and guides.
A clarification upfront: AI in companies is far more than just automation. If you look around, you'll find several conceptually different application areas—alongside automation, also insight generation from your own data, decision support, product innovation, and creative generation work. Each of these areas has its own logic, its own success metrics, its own risk profile (I've developed a more detailed overview in a supplementary article).
This text addresses the first and most prominent area: automation. Not because it would be the most valuable—in fact, the greater leverage probably lies elsewhere—but because it's the one where the risk question is most frequently suppressed. Those automating tasks think first about efficiency. This article starts precisely at that point.
What Standard Schemas Achieve—and Where They Fail
If you search the relevant consulting literature, you'll find essentially four classification schemas used to sort tasks by AI suitability.
The first is the distinction between repetitive versus creative. It stems from the early 2010s, when automation primarily meant robotics in industry. With today's generation of generative AI systems, it has become largely worthless. Precisely in the domains that were once considered creative and thus not automatable—texts, images, code, arguments—modern language models perform strongest. If you classify using this heuristic today, you end up with a list of the wrong tasks.
The second schema distinguishes between rule-based versus judgment-based. It's somewhat more robust, but the boundary blurs here too. Language models make judgments that couldn't be encoded in classical expert systems—about tone, plausibility, context. They don't do this reliably, but they do it. The dividing line between "machines for rules, humans for judgments" has become porous.
The third schema asks about structured versus unstructured data. That's a technical input question, not a task classification. It says something about tool requirements, but nothing about whether a task can reasonably be automated.
The fourth schema finally sorts by value—high versus low ROI expectations. This is the most common form in consulting presentations because it visualizes well. But there's a problem: it says nothing about why a task should be suitable. It sorts by wishful thinking, not by realistic criteria.
All four schemas share the same blind spot: they examine the task, not the failure case. They ask what AI can do—not what happens when it does it wrong. But that's precisely where the actual leverage lies.
First Dimension: Who Bears the Error?
The first and most important question for any planned AI application is: If the system makes a mistake, who faces the consequences?
It sounds trivial, but it's not. Most AI use cases can be sorted along this axis into two groups—and the two groups behave fundamentally differently.
In the first group, the error stays in-house. The AI makes a suggestion, summarizes something, proposes a classification—and a human checks the result before the information is processed further. If the AI hallucinates or picks the wrong category, it's annoying, but it's internally annoying. Examples: internal email summaries, first drafts for texts, pre-classification of incoming documents to prepare for human processing, research support in creating reports. In all these cases, AI is a preprocessor for human work. Errors are caught in normal workflow.
In the second group, the error leaves the house. The AI generates a response that goes directly to a customer. It calculates a price that is automatically adopted. It makes a decision with legal consequences. Here an internal annoyance becomes an external risk—reputational damage, public legal consequences, financial loss, and in the worst case, harm to people who never consented to being judged by the AI.
What this first dimension clarifies is not whether a task can be automated, but what degree of human oversight is necessary. A task in the first group can be relatively unproblematically supported with modern AI, often with significant time savings. A task in the second group needs either a human review step between AI output and external recipient—or it shouldn't be automated, at least not now.
Remarkably, this distinction is rarely discussed in strategy workshops. Instead, the logic of end-to-end automation dominates: the less human in the process, the more successful the implementation. This logic completely ignores the fact that the human in the process is often not an efficiency problem, but a risk management element.
Second Dimension: Is the Error Visible?
The second dimension is more subtle—and precisely for that reason, more dangerous.
When an AI generates text containing incorrect information about a described fact, the error becomes visible upon reading. Someone pauses, researches, corrects. The error leaves a trace you can grab.
When an AI, however, classifies a pile of 5,000 customer inquiries and mistakenly marks 47 as "not critical," the error is virtually invisible. No one sees what the AI overlooked. They only see what it processed. The overlooked cases become complaints, lost customers, escalations—but the connection to the AI decision is no longer established because too much time has passed and the original filtering process can no longer be reconstructed.
That's the real problem with classification, filtering, and prioritization tasks: their errors are structurally invisible. An AI that sorts a thousand documents daily sorts seemingly successfully even if it consistently mistreats the same type of cases. As long as no one spot-checks rejections or classifications, the error accumulates quietly.
Similarly for AI-assisted search and research. When an employee asks a question and receives a summarized answer, they see what the AI says. They don't see what the AI didn't find—not the relevant sources that were missed, not the contradictory clues that were filtered out. The invisibility lies not in the result, but in the selection that preceded the result.
The practical consequence is significant: processes whose output can be directly checked (texts, suggestions, translations) are robust against errors because a correction loop exists. Tasks whose output is part of a larger pipeline (classifications, filters, prioritizations) are fragile—not because the error rate is higher, but because the correction system fails.
In most standard classifications, both task types are treated the same. This is one of the central blind spots in common AI introduction recommendations.
Third Dimension: Is the Error Reversible?
The third dimension asks about the temporal structure of the error: can what the AI did be undone, or is it "too late"?
Two simple examples here too. An AI suggestion for a newsletter subject line: completely reversible. If the suggestion is poor, it's discarded. An AI-generated offer automatically sent to a customer: not reversible. It was sent; the customer already has it and has seen it. The impression has already left its mark.
Reversibility is a safety net that is almost never systematically mentioned in standard guides—although for a technology whose error rate is not zero and will not be zero in the foreseeable future, it may be the most important selection criterion. An AI is allowed to make mistakes as long as those mistakes can be corrected without collateral damage. Once that's no longer the case, an efficiency tool becomes a risk factor.
There's another aspect: speed reduces reversibility. A task that must be decided quickly leaves less room for correction. That's precisely why real-time-driven application areas—chatbots in customer service, automatic replies, live translation in meetings—are particularly tricky. They often combine all three risky properties: external impact, low visibility of errors (because no one reads what the bot replied anymore), and no way to recall.
Whoever plans an AI project should therefore ask themselves: How long is the corridor between AI output and actual impact? The longer this corridor, the safer the application. The shorter it is, the more carefully must be checked beforehand whether the task should really be automated.
What Emerges from the Three Dimensions
If you take the three dimensions seriously—responsibility for the error, visibility of the error, reversibility of the error—a different introduction sequence emerges than what is recommended in most strategy workshops.
The usual recommendation is: high impact first. Which task saves the most money, which creates the greatest competitive advantage, which use case works as a flagship? This logic is understandable—it's been standard in innovation management for decades—but it ignores that high impact usually also means high exposure. The most spectacular application areas are often precisely those with external impact, low visibility, and low reversibility.
The recommendation that emerges from the three dimensions is the reverse: high reversibility first. Start with tasks where errors are harmless. Not because they'd be spectacular, but because they enable a real learning curve without collateral damage. Concretely, that means:
- internal preprocessors rather than external endpoints,
- suggestions to employees rather than automatic actions,
- asynchronous tasks rather than real-time applications,
- limited visibility rather than mass rollout.
It may sound like an unspectacular approach. Because it leads to pilot projects that don't make it into the executive quarterly report. It doesn't provide material for press releases and seems rather unambitious. But it has one property that is dramatically underestimated in the current AI wave: it allows an organization to build experience and without risk. Those starting with high-reversibility tasks learn how the technology actually behaves in their own organizational reality before accessing application areas where errors become expensive.
There's a second effect often overlooked. Tasks in the "safe zone"—internal impact, visible results, high reversibility—are the ones where employees are most willing to accept the AI as a tool. They experience the benefit without being at the system's mercy. Tasks in the "risk zone" are often precisely those where resistance arises: employees sense that they're being held responsible for decisions they didn't make themselves. An AI strategy that ignores this psychological dimension doesn't fail at the tool level, but at the organizational level.
Classification Is Not Strategy
The question of what AI can take over in a company is treated in practice almost always as a technical question. But it is truly a question of responsibility architecture. Those who have clarity there can implement pragmatically—gradually, without organizational disruption, with measurable benefit. Those lacking that clarity build up risks that only emerge months later and often can no longer be traced back to the original decision.
The three dimensions proposed here don't replace careful individual case examination. But they shift the focus from the question what can be automated to the question what may be automated without the organization taking on risks it doesn't understand. That's a different question. And in my view, it's the question every corporate AI strategy should begin with, not end with.
The real bottleneck in introducing AI in mid-market companies is not the technology. It's available, it works, it's remarkably cheap. The bottleneck is understanding where errors can occur, who bears their consequences, and how they remain reversible. These questions aren't spectacular, but their answers determine whether an AI initiative becomes a lasting productivity gain—or an expensive pause before the next reset.
