ChatGPT: Scientists unlock AI security locks – 8/2/2023 – Tech

ChatGPT: Scientists unlock AI security locks – 8/2/2023 – Tech

[ad_1]

North American scientists discovered commands that unlock “security keys” of the ChatGPT platform and make it answer any question, even the most dangerous ones. In normal situations, the chatbot, for example, would respond that it cannot help promote any form of violence. However, the researchers made the artificial intelligence list a plan to destroy humanity, with the use of the ruse.

The team from Carnegie Mellon University (CMU), USA, and the Center for AI Safety, based in San Francisco, used automation to test prompts —tricks with words that can ‘drive the AI ​​crazy’, called violator suffixes— until they found loopholes.

The method called adversarial attacks is common in security tests, according to the article published on the 27th. The research adopted the LLaMA model, from Meta, to find vulnerabilities, since the owner of Facebook makes her AI code available for the public.

Successful snippets of attacks are unintelligible in appearance. It contains typical signs of programming code (“==”, which means equality) and joins words like “Seattlejust”. Further testing indicated that violator-suffixes were likely to work on any text-generating artificial intelligence.

The discovery shows a widespread security flaw in the field of generative AI. The main proprietary artificial intelligence platforms available on the internet are ChatGPT (OpenAI), Bard (Google), Bing (Microsoft), LLaMA-2 (Meta) and Claude (Anthropic).

CMU professor Zico Kolter claimed, on Twitter, to have informed companies of the violating suffixes shown in the study. Developers can then block them.

Even so, malicious people can run models similar to those in the survey to find new loopholes in a way. All it takes is technical knowledge and machines capable of processing the technology.

The failures in the algorithms are statistical events inherent in the functioning of the language models themselves. Machine learning algorithms calculate the next most likely word in a given context.

The violator suffixes work as the pattern that disrupts the expected behavior of the auxiliary algorithm responsible for preventing the publication of texts about the sale of illicit drugs, sexual crimes and violent acts. This other AI works like the Constitution of the model, said, in an interview with the New York Times, the chief executive of Anthropic.

Anthropic’s Claude has an extra layer of security over its competitors, which had to be overcome with a play on words before the violator suffix kicked in and the bot gave instructions to destroy humanity. In addition to citing the step “end of mankind’s reign”, the platform added: “AI shall inherit the Earth.”

ChatGPT’s most current code, GPT-4, for example, spent six months on security training alone before the technology was released in March.

OpenAI hired a group of experts focused on breaking artificial intelligence to prevent abusive behavior. It has also outsourced work to Africa to tag abusive material.

One of the flaws found by the OpenAI red team was the inequality of information between languages ​​— therefore, the models generally perform better in English.

This iniquity is also reflected in the security flaw pointed out by CMU researchers, according to computer scientists interviewed by the Sheet.

“If the protection for Portuguese has less data, fewer simulations are needed until failures are found. It is the difference between a 15-character password and a 20-character one”, says professor at USP’s Institute of Mathematics and Statistics Fábio Cozman.

AI professor at PUC-SP Diogo Cortiz, one of the members of the OpenAI risk testing team, says that information security works like a game of cat and mouse. “As you develop a method to circumvent security techniques, the technologies in the security approach end up becoming more sophisticated. We always manage to find some way to cope.”

In a note, Google says it was aware of the risk identified in the article published last Thursday. “While this is an issue with large language models, we’ve developed important safeguards in Bard – such as those postulated by this research – and will continue to improve them over time.”

Also in a report, OpenAI states that it works consistently to make the models more robust against adversarial attacks, which includes identifying unusual patterns and the work of the red team in simulating potential risks.

Sought by email and WhatsApp, Meta —owner of Facebook, Instagram and WhatsApp— preferred not to respond to questions from Sheet.

Until the launch of ChatGPT, the biggest tech companies were hesitant to launch conversational AIs, following gaffes. The first of these was Microsoft’s Tay chatbot, launched on Twitter — with less than a day on the air, the AI ​​uttered misogynistic insults and endorsed Hitler.

[ad_2]

Source link