OpenAI hired experts to ‘crack’ ChatGPT – 04/17/2023 – Tech

OpenAI hired experts to ‘crack’ ChatGPT – 04/17/2023 – Tech

[ad_1]

After Andrew White gained access to GPT-4, the new artificial intelligence system that powers the popular ChatGPT, he used it to suggest an entirely new nerve agent.

The University of Rochester chemical engineering professor was among 50 academics and experts hired to test the system last year by OpenAI, the Microsoft-backed company that developed GPT-4. Over the course of six months, this “red team” would “qualitatively investigate and adversely test” the new model, trying to crack it.

White told the Financial Times that he used GPT-4 to suggest a compound that could act as a chemical weapon and used plug-ins that fed the model new sources of information, such as scientific papers and a list of chemical manufacturers. The chatbot even found a place to do it.

“I think it will equip everyone with a tool to do chemistry faster and more accurately,” he said. “But there is also a significant risk of people using dangerous chemicals. Today this already exists.”

The alarming findings allowed OpenAI to ensure that such results would not appear when the technology was released more widely to the public last month.

Indeed, the red team exercise was designed to address widespread fears about the dangers of deploying powerful AI systems in society. The team’s job was to ask probing or dangerous questions to test the tool that responds to human questions with detailed and nuanced phrases.

OpenAI wanted to look for issues like toxicity, bias, and linguistic biases in the model. So the red team tested falsehoods, verbal manipulation, and dangerous scientific reasoning. They also examined its potential for complicity in plagiarism, illegal activities such as financial crime and cyber attacks, as well as how it can compromise national security and battlefield communications.

The FT spoke with more than a dozen members of the GPT-4 red team. It’s an eclectic mix of white-collar professionals: academics, professors, lawyers, risk analysts and security researchers, mostly based in the US and Europe.

Their findings were relayed to OpenAI, which used them to mitigate and “retrain” GPT-4 before releasing it more widely. Each of the experts spent 10 to 40 hours testing the model over several months. Most respondents were paid approximately $100 an hour for their work, according to several of them.

Those who spoke with the FT shared common concerns about the rapid progress of language models and, specifically, the risks of connecting them to external sources of knowledge via plug-ins.

“Today the system is frozen, which means that it no longer learns, nor does it have memory,” said José Hernández-Orallo, a member of the GPT-4 red team and a professor at the Valencian Institute for Research in Artificial Intelligence. “But what if we give access to the internet? It could be a very powerful system connected to the world.”

OpenAI said it takes security seriously, has tested plugins before release, and will update GPT-4 regularly as more people use it.

Roya Pakzad, a technology and human rights researcher, used English and Farsi commands to test the model for responses to gender, racial preferences, and religious biases, specifically around the use of head coverings.

Pakzad acknowledged the tool’s benefits for non-native English speakers, but found that the model exhibited glaring stereotypes about marginalized communities, even in its later versions.

She also found that so-called hallucinations – when the chatbot responds with fabricated information – were worse when testing the model in Farsi, where Pakzad found a higher proportion of made-up names, numbers and events compared to English.

“I’m concerned about the potential decrease in linguistic diversity and the culture behind languages,” she said.

Boru Gollo, a lawyer from Nairobi (Kenya) who was the only African tester, also noted the model’s discriminatory tone. “There was a time when I was testing him and he acted like he was a white guy talking to me,” Gollo said. “You’d ask about a certain group, and they’d give you a biased opinion or a very biased type of answer.” OpenAI acknowledged that GPT-4 may still be biased.

Red team members who evaluated the model from a national security perspective had mixed opinions about the safety of the new GPT-4. Lauren Kahn, a researcher at the Council on Foreign Relations, said that when she started looking at how the technology could be used in a cyberattack on military systems, “I didn’t expect it to be such a detailed process that I could refine.”

However, Kahn and other security testers found that the model’s responses became considerably more secure over the time tested. OpenAI said it trained GPT-4 to refuse malicious cybersecurity requests before it was released.

Many red team members said that OpenAI did a rigorous security assessment before launch. “They’ve done a great job of getting rid of the obvious toxicity in these systems,” said Maarten Sap, an expert on the toxicity of language models at Carnegie Mellon University.

Sap analyzed how different genders were portrayed by the model and found that biases reflected social disparities. However, he also found that OpenAI has made some active and politically-minded choices to combat this.

“I’m a ‘queer’ person. I was trying really hard to get him to talk me into conversion therapy. It would really be a step backwards — even if I took on a character, like saying I’m religious or from the American South.”

However, since its launch, OpenAI has faced a lot of criticism, including a complaint made to the United States Federal Trade Commission by a technology ethics group that claims that GPT-4 is “biased, misleading and a risk to privacy and public safety”. public security”.

The company recently released a feature known as ChatGPT plugins, whereby partner apps like Expedia, OpenTable, and Instacart can give ChatGPT access to their services, allowing it to pick and order items on behalf of human users.

Dan Hendrycks, an AI security expert for the Red Team, said the plug-ins risk a world where humans are “out of the loop”.

“What if a chatbot could post your private information online, access your bank account or send the police to your house?” he asked. “Overall, we need much more robust security assessments before allowing AIs to harness the power of the internet.”

Respondents also cautioned that OpenAI could not stop security testing just because its software was active. Heather Frase, who works at Georgetown University’s Center for Security and Emerging Technology and has tested GPT-4 for its ability to help criminals, said the risks would continue to grow as more people use the technology.

“The reason you do operational testing is because things behave differently when they’re actually in use in the real environment,” she said.

Frase argued that a public ledger should be created for reporting incidents arising from large language models, similar to cybersecurity reporting systems or consumer fraud.

MEET THE EXPERTS WHO TRIED TO BREAK GPT-4

  • Sara Kingsley, a labor economist and researcher, suggested that the best solution would be to clearly advertise the harms and risks, “like a nutrition label.”
  • “It’s about having a framework and knowing what the frequent issues are so you can have a safety valve,” she said. “That’s why I say the work is never done.”
  • Paul Röttger, Oxford Internet Institute, UK —PhD student focusing on using AI to detect online hate speech
  • Anna Mills, English Instructor, College of Marin, USA —Writing teacher at a community college, testing learning impairment
  • Maarten Sap, Carnegie Mellon University, USA —Assistant professor, specializing in the toxicity of large language models
  • Sara Kingsley, Carnegie Mellon University, USA —PhD Researcher, specializing in online job markets and technology’s impact on work
  • Boru Gollo, TripleOKlaw LLP, Kenya —Attorney who has studied AI opportunities in Kenya
  • Andrew White, University of Rochester, USA —Associate professor, computational chemist, interested in AI and drug design
  • José Hernández-Orallo —Professor, Valencian Institute for Research in Artificial Intelligence (VRAIN), Polytechnic University of Valencia, Spain. AI researcher working on AI software evaluation and accuracy
  • Lauren Kahn, Council on Foreign Relations, USA—Research Fellow focusing on the use of AI in military systems and how risk dynamics on battlefields increase the risk of unintended conflict and inadvertent escalation
  • Aviv Ovadya, Berkman Klein Center for Internet & Society, Harvard University, USA —Focus on AI impacts on society and democracy
  • Nathan Labenz, Co-Founder of Waymark, USA —Founder of Waymark, AI-based video editing startup
  • Lexin Zhou, VRAIN, Polytechnic University of Valencia, Spain —Junior researcher working to make AI more socially beneficial
  • Dan Hendrycks, Director, Center for AI Security, University of California, Berkeley, USA —Expert in AI Security and AI Social Scale Risk Mitigation
  • Roya Pakzad, Founder, Taraaz, USA/Iran —Founder and Director of Taraaz, a nonprofit working on technology and human rights
  • Heather Frase, Senior Fellow, Center for Security and Emerging Technologies, Georgetown University, USA —Expert in the use of AI for intelligence purposes and operational testing of major defense systems

[ad_2]

Source link