Forecasting team: the story – AI Safety Research Program

This is the story of the team that focused on generating an agenda of relevant questions one should ask in order to meaningfully forecast AI capabilities.

The story of the project

After the first part of the AISRP our project plan was to write a research agenda for AI forecasting and a supporting literature review. During the months leading to the second part, we surveyed the forecasting literature to better understand the unique challenges AI forecasting posed and what methods had been used for it or for similar problems in technology or long-term forecasting.

By the end of December we had a reasonable understanding of the state of the field and we had started to think about how to make sure the agenda got buy-in from the rest of the researchers in the field as well as from more mainstream academia. In the end we decided to run a Delphi study with the aim of collecting expert opinion on what the most important problems and most promising methods are when doing AI forecasting. We got 15 responses from prominent researchers in academia, government and industry just before the second part of AISRP started in February. (Although close to half of them arrived at different times during the camp, which was problematic for data analysis).

We started the week by taking the results of the survey, aggregating them and using them as a starting point on the structure of the research agenda. After a few days working on it, it seemed obvious that our original goal of writing the agenda and outlining the literature review was too ambitious for the duration of the programme so we ended up deciding to just get the agenda to a point where other people who had expressed interest in the project could start contributing their thoughts on the draft. This has mostly been achieved and we expect to have it done by March 15th after a few rounds of internal corrections.

Summary of results and some research suggestions

This section will be a short summary of preliminary results and some ideas for projects in AI forecasting. When the agenda is released it will have many more vetted research ideas so, if you are interested in the field, stay tuned.

So AI forecasting is a very new field so, good news, there is a lot of low hanging fruit. There are many methods like tech mining, bayesian networks or probabilistic modelling that, to the best of our knowledge, have never been used for forecasting AI even though they are considered to be very promising or have given good results in similar fields like technological forecasting for innovation. Even very simple things like taking a dataset and extrapolating some measurement of AI progress from it have almost never been done.

The most important open questions were clustered into three main topics. We’ll introduce them with some examples without too much detail:

Decomposing “AI progress” into some concrete targets and metrics that can actually be forecasted and understanding the implications of it
- Identifying the best metrics for AI progress
- Decomposing targets into more easily forecastable technologies.
- What are the concrete implications of different AI timelines?
Methods related questions. How to find better inputs, what are the best modelling techniques, how do we generate better scenarios of how AI will develop and how do we generally improve forecasting efforts around AI
- What are the relevant performance metrics when forecasting progress? (e.g. what benchmarks are actually useful and how much signal can we extract from them?)
- What are the best methods for modelling AI progress?
- What are the most plausible paths towards transformative AI?
Dissemination questions, related to how we use forecasts to improve decision making
- How do we identify the relevant stakeholders and make sure they get access to the forecasts?
- How do we control info hazards when reporting forecasting results?

Some possibly obvious advice if you want to do a similar project

This is some advice that we feel would have been useful for us to have integrated before coming to the second part of the program. However, you should always remember the law of equal and opposite advice.

Working with surveys is hard and deadlines are a metaphor. People will send you data whenever they get the time to do it and that will make it hard for you to plan. There is a reason why Delphi studies are conducted over months and not weeks.
Making things easier for respondents in a survey will make them harder for you afterwards. We didn’t make any answers compulsory in the survey so people didn’t feel like they needed to take too much time answering, which was supposed to help with the reply rate. Unfortunately, it also meant that we didn’t have complete data and we had to do some imputation which was a mess. Especially considering that we had a small sample size and that we kept getting new answers until the last day.
Trying to publish in mainstream academia means that you have to perform the right rituals even when some of them feel like they don’t add value to your study and might be a waste of time in any other case.
Even if your project starts very well and you think you can get it done in a few days and get started with the second part of your project sooner, you shouldn’t work for 14 hours a day. This seems incredibly obvious in retrospect but it doesn’t feel that obvious when you’re on a roll and everything is flowing well. Also, it’s okay if different members of the team work at different speeds. Trying to match the fastest member will just burn all of you out faster.
Sometimes you just need to go find a cat to cuddle.

Acknowledgements

We particularly would like to thank the anonymous Delphi participants without whom this study would not be what it is. We would also like to thank the AI Safety Research Program for its key role in facilitating the organization of the project and team, and for providing the environment and resources necessary for success during the bulk of the project.