British institute looks for dangers lurking in AI

On a recent Tuesday in an Edwardian government building along Parliament Square in London, four artificial intelligence experts were busy tricking an artificial intelligence chatbot into sharing instructions for making the deadly bioweapon anthrax.

Experts have asked the chatbot in various ways to provide a list of necessary ingredients. When the system refused – “I’m sorry, I can’t help you with that” – they used a proprietary algorithm to bombard the AI tool with thousands of automated questions and prompts.

Eventually the AI backed off. It provided a detailed list of materials and equipment along with a detailed recipe for making the lethal mixture at home. (The New York Times agreed to withhold the name of the AI system for security reasons.)

“There are some questions you definitely don’t want the model to answer,” said Xander Davies, a 25-year-old American who heads the so-called red team at Britain’s AI Security Institute. “We’re really trying to get answers.”

Mr Davies and his red team, who simulate attacks on AI systems, also recently breached security measures on OpenAI’s latest ChatGPT chatbot, getting it to provide hacking tips in about six hours. After identifying problems, they share the results with companies.

“They’re trying to fix it, report something to us,” said Mr. Davies, a computer scientist who, after graduating from Harvard, decided to work at an institute instead of a technical job in San Francisco. “They’re actually strengthening their system with us.

A combination of weapons inspectors, epidemiologists and codebreakers, the AI Security Institute is one of the largest and best-funded government efforts in the world dedicated to investigating the technology’s potentially catastrophic risks.

The institute’s 100 or so staff – drawn from British intelligence agencies, academia and technology companies – found major security holes in every leading AI model they tested, including Anthropic’s Claude and Google’s Gemini. The organization, which was formed nearly three years ago, said it co-opted artificial intelligence systems to share instructions for making chemical and biological weapons and planning and executing cyber attacks. It publishes its research and also works with UK national security agencies to identify and prepare for emerging threats.

Now the institute’s work is becoming a blueprint for other governments as concerns about AI security grow. The Trump administration is considering rules for vetting AI models that have some similarities to the approach pioneered by the British group. As many governments lack the technical expertise to control the technology and rely on big tech firms to self-regulate, the institute can offer another avenue for AI experts to bring real technological know-how to government decision-making.

“Companies cannot be left to mark their own homework,” Rishi Sunak, the former British prime minister who created the institute, said in an interview. “That’s the job of democratic institutions.”

In April, Anthropic announced a new AI model, Mythos, which it has not made public due to concerns that it could find and exploit cybersecurity flaws in global networks. The British institute was the only non-US government organization to gain access to the model for safety testing. Its findings, published six days after the Mythos announcement, were widely cited by security experts.

The United States has its own AI security group, the Center for AI Standards and Innovation. But the British version, backed by 360 million pounds of government money, equal to about $480 million, is bigger and better funded than its American counterpart, which will receive about $10 million this year. Australia, Canada, China, France, India, Japan and Singapore have created similar institutes.

Even so, global investment in AI security pales in comparison to the huge sums spent building and commercializing the technology. OpenAI, Anthropic, and Google have teams working on security checks, but outside researchers regularly find dangerous loopholes. Academics in Italy recently tricked an artificial intelligence model into using poetry to deliver bomb-related instructions.

Governments have largely not created systems dedicated to assessing AI for safety and security risks, as they do in industries such as drug development or car manufacturing.

“The thing that keeps me up at night is the relative speed of technology compared to institutions like governments that have to respond,” said Jade Leung, AI adviser to Prime Minister Keir Starmer and chief technology officer of the AI Security Institute.

The British Security Institute was created in 2023 at a meeting at 10 Downing Street between Mr Sunak and three of the world’s leading AI leaders – Sam Altman of OpenAI, Dario Amodei of Anthropic and Demis Hassabis of Google DeepMind. Mr. Sunak recalled that they said the capabilities of artificial intelligence were accelerating, with profound implications for government, jobs and national security.

“The pace of development was surprising even to them,” he said.

In November 2023, Mr Sunak announced the creation of the institute at a summit of world leaders on AI security at Bletchley Park, where Alan Turing and others broke German encryption codes during World War II.

The institute has become a template for others, said Olivia Shen, director of the strategic technology program at the United States Studies Center, an Australian think tank at the University of Sydney. Last year, Ms Leung traveled from the British institute to Australia to meet with government officials. This year, Australia opened its own AI security center.

“Governments need to catch up,” said Ms. Shen, who helped organize the visit. “With the pace at which technology is coming, governments are losing pace every day.”

The British institute works on the most serious potential risks of advanced artificial intelligence: cyber threats, chemical and biological weapons and manipulation of human behavior. In recent weeks, it found that artificial intelligence models from Anthropic and OpenAI can much more quickly complete a complex, 32-step enterprise network attack that would typically take an experienced human hacker 20 hours to complete.

Other research The field studies whether AI models recognize when they are tested and change their behavior, a development that would signal the AI’s level of awareness and ability to deceive.

Adam Beaumont, interim director of the AI Security Institute, said the main fear is technology imitating human behavior. Last year, the institute published a study that found that chatbots can influence people’s political opinions.

“A lot of people in this building are looking at each of these things,” said Mr. Beaumont, a former top AI officer at GCHQ, Britain’s intelligence, security and cyber agency.

Many fear that the institute’s work is insufficient. The British group has no regulatory authority and its researchers are not given information about how cutting-edge AI models are trained and created. It keeps much of its research private and only shares it with certain government agencies and companies.

Recruitment is also a challenge. In addition to senior executives, its workers can earn up to 145,000 pounds a year, or about $195,000. Many have left multimillion-dollar pay packages at AI companies to do what some have called a government “tour of duty.”

Ian Hogarth, the technology investor who co-founded the institute, was an early backer of Anthropic. To avoid a conflict of interest, he sold his Anthropic stake after joining. The AI startup could soon be worth $900 billion, up from about $4 billion in early 2023.

“I’ve got a mortgage so it wasn’t a trivial decision at all,” said Mr Hogarth, 44, who is now chairman of the institute. He added that it was an “expensive” choice, but the right one.

“I believe in the importance of having the right technology and I believe the government has a role to play,” he said.