Want to drive more secure GenAI? Try automating your red teaming

Want to drive more secure GenAI? Try automating your red teaming

Although 55% of organizations are currently piloting or using a generative AI (GenAI) solution, securely deploying the technology remains a significant focus for cyber leaders. A recent ISMG poll of business and cybersecurity professionals revealed that some of the top concerns around GenAI implementation include data security or leakage of sensitive data, privacy, hallucinations, misuse and fraud, and model or output bias.

As organizations look for better ways to innovate responsibly with the latest advancements in artificial intelligence, red teaming is one way for security professionals and machine learning engineers to proactively uncover risks in their GenAI systems. Keep reading to learn how.

3 unique considerations when red-teaming GenAI

Red teaming AI systems is a complex, multistep process. At Microsoft, we leverage a dedicated interdisciplinary group of security, adversarial machine learning (ML), and responsible AI experts to map, measure, and minimize AI risks.

Over the past year, the Microsoft AI Red Team has proactively assessed several high-value GenAI systems and models before they were released to Microsoft customers. In doing so, we found that red-teaming GenAI systems differ from red-teaming classical AI systems or traditional software in three prominent ways:

GenAI red teams must simultaneously evaluate security and responsible AI risks: While red teaming traditional software or classical AI systems mainly focuses on identifying security failures, red teaming GenAI systems includes identifying both security risk as well as responsible AI risks. Like security risks, responsible AI risks can vary widely ranging from generating content that includes fairness issues to producing ungrounded or inaccurate content. AI red teams must simultaneously explore the potential risk space of security and responsible AI failures to provide a truly comprehensive evaluation of the technology.

GenAI is more probabilistic than traditional red teaming: GenAI systems have multiple layers of non-determinism. So, while executing the same attack path multiple times on traditional software systems would likely yield similar results, the same input can provide different outputs on an AI system. This can happen due to the app-specific logic; the GenAI model itself; the orchestrator that controls the output of the system can engage different extensibility or plugins; and even the input (which tends to be language), with small variations can provide different outputs. Unlike traditional software systems with well-defined APIs and parameters that can be examined using tools during red teaming, GenAI systems require a red teaming strategy that considers the probabilistic nature of their underlying elements.

GenAI systems architecture varies widely: From standalone applications to integrations in existing applications to the input and output modalities, such as text, audio, images, and videos, GenAI systems architectures vary widely. To surface just one type of risk (for example, violent content generation) in one modality of the application (for example, a browser chat interface), red teams need to try different strategies multiple times to gather evidence of potential failures. Doing this manually for all types of harm, across all modalities across different strategies, can be exceedingly tedious and slow.

Why automate GenAI red teaming?

When red-teaming GenAI, manual probing is a time-intensive but necessary part of identifying potential security blind spots. However, automation can help scale your GenAI red teaming efforts by automating routine tasks and identifying potentially risky areas that require more attention.

At Microsoft, we released the Python Risk Identification Tool for generative AI (PyRIT)—an open-access framework designed to help security researchers and ML engineers assess the robustness of their LLM endpoints against different harm categories such as fabrication/ungrounded content like hallucinations, misuse issues like machine bias, and prohibited content such as harassment.

PyRIT is battle-tested by the Microsoft AI Red Team. It started off as a set of one-off scripts as we began red teaming GenAI systems in 2022, and we’ve continued to evolve the library ever since. Today, PyRIT acts as an efficiency gain for the Microsoft AI Red Team—shining a light on risk hot spots so that security professionals can then explore them. This allows the security professional to retain control of the AI red team strategy and execution. PyRIT simply provides the automation code to take the initial dataset of harmful prompts provided by the security professional and uses the LLM endpoint to generate more harmful prompts. It can also change tactics based on the response from the GenAI system and generate the next input. This automation will continue until PyRIT achieves the security professional’s intended goal.

While automation is not a replacement for manual red team probing, it can help augment an AI red teamer’s existing domain expertise and offload some of the tedious tasks for them. To learn more about the latest emergent security trends, visit Microsoft Security Insider.

Security

 Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *