Oxford University researchers make AI safety breakthrough

Oxford Mail 21 August 2025

A team of artificial intelligence experts have reported a major advancement in safeguarding open-weight language models, reports Jacob Manuschka for the Oxford Mail.

An open-weight language model is a large language model where the ‘weights’ – the parameters that determine how the model processes language – are publicly available, allowing anyone to download, inspect, and modify the model. The group, including Christ Church’s Professor Yarin Gal, has developed a method to shield these models from malicious updates by filtering out potentially harmful knowledge during training.

The new method embeds safety from the start, rather than adding it later, reducing risk without hindering transparency. Open-weight models are key to collaborative AI research, but their openness also poses risks, as they can be modified for harmful purposes. The team’s approach involves filtering unwanted knowledge from the training data, preventing models from learning potentially dangerous information.
Full report on the Oxford Mail site

Receive email updates from UWN

Global newsletters    Africa newsletters    Other
    (other includes related events and webinars)

Data will be processed according to our standard terms & conditions.