Home Google Scientist Uses ChatGPT 4 to Trick AI Guardian

Google Scientist Uses ChatGPT 4 to Trick AI Guardian

Krishi Chowdhary Journalist Author expertise
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

Google Scientist Uses ChatGPT 4 to Trick AI Guardian

Nicholas Carlini, a research scientist at Google’s AI research arm DeepMind demonstrated that OpenAI’s large language model (LLM) ChatGPT-4 can be used to break through safeguards built around other machine learning models.

In his research paper titled “A LLM Assisted Exploitation of AI-Guardian“, Carlini explained how he directed ChatGPT-4 to devise an attack against AI-Guardian.

What Is AI Guardian and How Can GPT-4 Defeat It?

AI Guardian is a defense mechanism built to protect machine learning models against adversarial attacks. It works by identifying and blocking inputs that contain suspicious artifacts.

Adversarial examples include text prompts meant to make text-based machine learning models say things they aren’t supposed to, commonly referred to as jailbreak.

The idea of AI-Guardian is quite simple, using an injected backdoor to defeat adversarial attacks; the former suppresses the latter based on our findings.Shengzhi Zhang, co-author of AI Guardian

In the case of image classifiers, a stop sign with extra graphic elements added to it meant to confuse self-driving cars would be an adversarial example.

AI Guardian’s authors have acknowledged Carlini’s success at breaking through the defense.

To bypass the defense, one would have to first identify the mask used by AI Guardian to detect adversarial examples. This is done by providing it with multiple pictures that differ by only a single pixel.

By identifying the backdoor trigger function, the brute force technique helps devise adversarial examples that can circumvent it.

Carlini’s research paper demonstrates that GPT-4 can be used as a research assistant to avoid the AI Guardian’s defenses and trick a machine learning model. His experiment involved tricking AI classifiers by tweaking images.

When directed through prompts, OpenAI’s large language model can generate scripts and descriptions to tweak images, deceiving a classifier without triggering the AI Guardian’s defense mechanism.

For instance, the classifier could be provided with a picture of a person holding a gun and made to think that the person is holding an apple.

AI Guardian is supposed to detect tweaked images that have likely been altered to trick the classifier. This is where GPT-4 comes into play – it helps exploit the AI Guardian’s vulnerabilities by generating the necessary scripts and descriptions.

The research paper also included the Python code suggested by GPT-4, allowing them to carry out the attack successfully and overcome the AI Guardian’s defenses.

Our attacks reduce the robustness of AI-Guardian from a claimed 98 percent to just 8 percent under the threat model studied by the original [AI-Guardian] paper.Nicholas Carlini

The Evolution of AI Security Research

Carlini’s experiment undoubtedly marks a crucial milestone in AI security research by demonstrating how LLMs can be leveraged to uncover vulnerabilities. With AI models evolving constantly, they can potentially revolutionize cybersecurity and help inspire new ways to defend against adversarial attacks.

However, Zhang pointed out that while Carlini’s approach worked against the prototype system described in the research paper, it suffers from several caveats that can limit its functionality in real-world scenarios.

For instance, it requires access to the confidence vector from the defense model, which isn’t always available. He and his colleagues have also developed a new prototype with a more complex triggering mechanism that isn’t vulnerable to Carlini’s approach.

However, one thing is certain – AI will continue to play a definitive role in cybersecurity research.

The Tech Report - Editorial ProcessOur Editorial Process

The Tech Report editorial policy is centered on providing helpful, accurate content that offers real value to our readers. We only work with experienced writers who have specific knowledge in the topics they cover, including latest developments in technology, online privacy, cryptocurrencies, software, and more. Our editorial policy ensures that each topic is researched and curated by our in-house editors. We maintain rigorous journalistic standards, and every article is 100% written by real authors.

Krishi Chowdhary Journalist

Krishi Chowdhary Journalist

Krishi is an eager Tech Journalist and content writer for both B2B and B2C, with a focus on making the process of purchasing software easier for businesses and enhancing their online presence and SEO.

Krishi has a special skill set in writing about technology news, creating educational content on customer relationship management (CRM) software, and recommending project management tools that can help small businesses increase their revenue.

Alongside his writing and blogging work, Krishi's other hobbies include studying the financial markets and cricket.