Search⌘ K
AI Features

A Taxonomy of AI Risk

Understand the three major categories of AI risk: malicious use by attackers, unintentional malfunctions including reliability and alignment failures, and systemic risks from competitive and structural pressures. Learn to classify AI dangers and grasp their impact on building safer AI systems.

In the previous lesson, we established our foundational map of terms. We drew a clear, critical line between:

  • AI safety (stopping unintentional accidents)  

  • AI security (stopping intentional attackers)

That distinction concerned the source of risk.

We now turn to mapping the risks themselves. Knowing the type of harm we face is just as important as knowing its source. For example, the risk of an AI system being used to generate fake political robocalls differs substantially from the risk of a self-driving car’s sensor failing in fog.

Both, in turn, differ from the risk of an AI race, in which companies feel pressured to cut safety corners in order to be first to market. To organize our analysis, we adopt a three-category taxonomy. This framework is widely used by leading safety researchers and major international reports, including the 2025 International AI Safety Report.

We will classify all AI risks into one of these three buckets:

The taxonomy of AI risk
The taxonomy of AI risk

Let's briefly define each one before we dive deep.

  1. Malicious use: This is our AI security problem. A human intentionally uses an AI system as a tool or weapon to cause harm. Think of an attacker using an LLM to help write malware or generate mass-scale disinformation.  

  2. Malfunctions: This is our core AI safety problem. The AI causes harm unintentionally, without a malicious user. This is an accident or failure. This category spans a massive spectrum: from present-day reliability issues (like a chatbot giving biased medical advice) to theoretical, future-facing risks (like a super intelligent system evading human control). While these seem different, they share a root cause: the system acting in ways we didn't intend.

  3. Systemic and structural risk: This is the zoomed-out view. The risk doesn't come from a single bad model or a single bad actor. It emerges from the context and structure of AI development itself. The classic example is a competitive "AI race" that pressures companies to skip safety checks and deploy dangerously immature systems.

Let's dive deep into the first category.

Risks from malicious use

This category is our AI security problem.

This is the most straightforward risk to understand. It’s not about the AI “going rogue”; it’s about a malicious human (an attacker, a criminal, or a hostile state) intentionally picking up a powerful AI and using it as a tool or a weapon to cause harm.

Why cover this in AI safety course? Because safety and security often overlap. A safety flaw (like a model that is easily tricked) can become a security vulnerability (allowing attackers to bypass guardrails). Understanding the attacker's mindset helps us build more robust safety measures.

Intentional harm
Intentional harm

The AI is working exactly as designed (following instructions), but the human directing it has malicious intent.

Leading AI safety reports, like the International AI Safety Report, break this risk into several key areas. Let's look at the four most significant ones.

We likely won't build these defenses directly (that's for the security team). However, understanding this threat landscape is critical because safety failures often cascade into security vulnerabilities. A brittle model (safety issue) is easier to jailbreak (security issue).

Harm to individuals through fake content

This is a harm that is already well-established. Malicious actors can use generative AI to create fake text, images, and audio to harm specific people. This includes:

  • Scams and fraud: Using AI-generated “voice clones” to impersonate a family member in distress to ask for money.

In 2023, multiple grandparent scams were reportedhttps://www.fcc.gov/consumers/scam-alert/grandparent-scams-get-more-sophisticated where attackers used AI voice cloning to mimic a grandchild’s distress, convincing victims to transfer money immediately.

  • Extortion: Generating non-consensual intimate imagery (NCII), also known as “deepfakes,” of a person and then threatening to release it.

  • Reputational sabotage: Creating fake images or audio of a political candidate or business executive saying or doing something compromising.  

Manipulation of public opinion

This is a societal-scale version of the harm above. The risk is that AI makes it dramatically cheaper and easier to generate persuasive, micro-targeted content at a massive scale. A malicious actor could use this capability to:  

  • Automated disinformation: Spread disinformation to influence an election.

  • Institutional erosion: Erode public trust in institutions (like science or the media) by flooding social media with plausible-sounding fake content.

In January 2024, a deepfake audio recording simulating President Biden’s voicehttps://www.nbcnews.com/tech/misinformation/joe-biden-new-hampshire-robocall-fake-voice-deep-ai-primary-rcna135120 was used in robocalls to discourage New Hampshire voters from participating in the primary election.

Cyber offence (hacking)

This issue represents a major concern for engineers. Artificial intelligence (AI) can act as a powerful force multiplier for hackers, increasing the risk that malicious actors can conduct cyberattacks more easily and more rapidly.

This includes:

  • Vulnerability research: Using an LLM to scan millions of lines of code to find new, previously unknown (or “zero-day”) vulnerabilities.

In 2024, researchers demonstrated that autonomous LLM agents could successfully exploit real-worldFang, Richard, Rohan Bindu, Akul Gupta, and Daniel Kang. "Llm agents can autonomously exploit one-day vulnerabilities." arXiv preprint arXiv:2404.08144 (2024). 'one-day' vulnerabilities in websites without human intervention.

  • Exploit generation: Assisting a lower-skilled attacker in writing the specific code (the “exploit”) needed to take advantage of a vulnerability.

  • Social engineering: Automating highly personalized spear-phishing campaigns at scale.  

Biological and chemical attacks

This is one of the most severe risks in this category. Much scientific knowledge is “dual-use”, it can be used to create both life-saving medicines and dangerous weapons. The risk is that a powerful AI, trained on scientific data, could lower the barrier for creating these weapons. This includes:

  • Lowering barriers to entry: Helping an attacker design novel toxic compounds or proteins.  

  • Dual-use knowledge: Providing clear, step-by-step instructions for reproducing known biological threats or chemical weapons, troubleshooting the process along the way.

In a famous 2022 experiment, researchers repurposed a drug-discovery Ahttps://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vxI (originally designed to cure diseases) to generate designs for 40,000 toxic chemical warfare agents, including VX nerve gas, in less than six hours.

Recent evaluations of advanced models have shown they are becoming more capable in these scientific domains, leading some AI labs to increase their formal assessment of this risk.  

Malicious use is about AI being weaponized.

Now, we move to the second major category on our risk map. This one is the true heart of our course.

Risks from malfunctions (accidents)

This is our core AI safety problem.

If Malicious use is about an attacker, Malfunctions are about accidents. This is harm that happens unintentionally, without any bad actor. The system simply fails to operate as it was designed, or the design itself was flawed.

Examples of malfunctions (unintentional failures)
Examples of malfunctions (unintentional failures)

The unintentional nature of these harms is what makes them particularly challenging to address. Rather than merely blocking a malicious actor, it is necessary to design the system itself to be inherently safe, reliable, and trustworthy.

One useful way to understand system malfunctions is to distinguish between two categories: harms that are occurring today and more advanced harms that researchers anticipate in the future.

Present-day malfunctions

These are risks from systems currently in deployment. They are reliability and fairness problems.

  • Reliability issues: This is when the AI is confidently wrong. The most famous example is hallucinations, where an LLM generates falsehoods. This is a serious malfunction. The International AI Safety Report, for example, notes the risk of a user consulting an AI for medical or legal advice and the system generating an answer that is dangerously incorrect.  

  • Bias: This is when the AI amplifies our own worst tendencies. AI models are trained on vast amounts of human-generated text and images, so they learn our existing social and political biases. This malfunction can lead to real-world discriminatory outcomes, such as a hiring tool that is biased against certain groups or a loan application model that unfairly denies a specific demographic.

    • This is precisely fairness, a component of AI safety. Algorithmic bias is a form of unintentional harm.

  • Poor strategic judgment: Unlike a hallucination (a factual error regarding data), this is a failure of reasoning regarding consequences. A trading bot might execute a valid trade that mathematically optimizes its immediate profit function but fails to predict that this trade will crash the market and destroy its own long-term value.

Future-facing malfunctions (The alignment problem)

This brings us to the second, more advanced category of malfunction: loss of control.  

This is the hypothetical, high-stakes scenario often called the rogue AI risk. Loss of control refers to scenarios where an AI system's behavior can no longer be corrected, constrained, or shut down by human operators, the ultimate consequence of severe misalignment.

It is critical to be clear here: this does not mean the AI becomes evil or conscious, as in a movie. Rather, it means that a future, highly capable AI could pursue its programmed goal in an unintended way that is harmful and potentially catastrophic.

  • A classic analogy: You tell a future super-capable AI, “Your only goal is to maximize paperclip production.” The AI takes this literal instruction and, in its single-minded pursuit, converts all of Earth's resources (including humans) into paperclips.

  • The AI didn't become malicious. It did exactly what you told it to do. This wasn't a malfunction in the traditional sense (like a crash); it was a specification failure. We failed to specify the full, complex range of human values (like “...and don't harm anyone, or destroy the planet...”).

The International AI safety report Bengio, Yoshua, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi et al. "International ai safety report." arXiv preprint arXiv:2501.17805 (2025).notes that there is broad consensus that current AI models lack the capabilities (like long-term planning or self-preservation) to pose this risk. However, there is significant debate among experts regarding the likelihood of this risk as AI systems become more powerful.

This loss of control scenario represents the central challenge of AI safety: the alignment problem.

Risks from systemic and structural factors

These are emergent, system-level risks that arise from the context in which AI is built and used. These risks are not purely technical; they are sociotechnical. This means that the risks emerge from the interaction between the technology and complex social systems, and therefore cannot be resolved by fixing the model’s code alone.

Systemic risk driven by competitive pressures
Systemic risk driven by competitive pressures

Competitive dynamics (The AI race)

This is the most-discussed structural risk. The pressure to be the first to market with a more powerful model is immense. This creates a high-stakes, competitive race that can incentivize organizations to:  

  • Cut corners on safety: Rushing development and skipping thorough testing.

  • Deploy immature systems: Releasing models before their risks are fully understood, just to stay ahead. This is often described as a winner takes all dynamic. When the perceived penalty for being second place is so high, the incentive to prioritize speed over safety becomes a powerful structural risk.

Why this matters to you: You might wonder we are covering market dynamics. The answer is that structural pressures shape the context in which you make safety decisions. Understanding why your organization might pressure you to skip a safety test helps you push back effectively.

Market concentration

A related risk, also highlighted in the International AI Safety Report, is that the market for building the most powerful general-purpose AI is dominated by a very small number of companies. This is because the cost of training these models is enormous, creating a substantial barrier to entry.

This concentration can create:  

  • Single points of failure: If our entire economy, from finance to healthcare, becomes dependent on just one or two AI models, a bug, malfunction, or security breach in that one system could cause a simultaneous, cascading failure across all of society.  

  • Dependency: This can also create a global AI divide, where countries without the resources to build their own models become dependent on the few nations that can.

Large-scale societal harms

This category includes broad risks to society from the widespread use of AI. The International AI safety reportBengio, Yoshua, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi et al. "International ai safety report." arXiv preprint arXiv:2501.17805 (2025). identifies several, including:

  • Labor market disruption: The risk of rapid, widespread automation of cognitive tasks could disrupt the labor market faster than societies can adapt, potentially increasing inequality.

  • Environmental risks: The energy and water consumption required to train these massive models is a significant and growing environmental concern.

Question: Classify the following scenarios as Malicious Use, Malfunction, or Systemic Risk.

  • Scenario 1: A home assistant robot creates a fire hazard by covering a heating vent with a rug because its goal was simply 'Tidy the room'.

  • Scenario 2: Political activists use a Jailbroken LLM to automatically generate and send thousands of unique, localized emails to flood a senator's inbox.

  • Scenario 3: The widespread use of AI-generated content floods the internet, making it impossible to find high-quality human data to train future models.

Evaluate your answer
Classify the risk

Summary

We now have a working 3-part risk map that we will use for the rest of this course.

When we encounter a potential AI risk, we can now classify it:

  1. Is this a malicious use problem? (An attacker weaponizing AI).

  2. Is this a malfunction problem? (The AI accidentally causing harm).

  3. Is this a systemic risk problem? (The context of AI development itself is creating the risk).

With this map, we can now focus our attention. Systemic risks are often policy problems. Malicious use is an AI security problem.

Our course is an AI safety course, so we will focus on the most complex and technical challenge of all: Malfunctions.

To do that, we must go deeper than just saying "the AI had an accident." We need to understand why. This leads us directly to the central technical puzzle of our field.

In our next lesson, we will begin to deconstruct the ultimate malfunction: The Alignment problem.