Safer ML paradigms team: the story – AI Safety Research Program

This is a summary of the project the team focussing on combining Inductive Logic Programming with Deep Learning undertook.

Exploration phase

During the first retreat we formed a team based on a vague shared interest to re-interpret existing AI technologies. Our interests included symbolic Reinforcement Learning (RL), emergent communication between RL agents, translating open philosophy problems to RL experiments, factored cognition, and combining Inductive Logic Programming (ILP) with deep learning.

We ranked all these project ideas based on “expected quality of the output if the project is successful”, “tractability”, “novelty”, “how much we’d learn”, “non-tediousness”, “how well it ties in with existing work” and “safety impact”. Combining ILP with deep learning came out on top.

In the time between the pre-retreat and the Louti retreat we read up on ILP (as two out of the four team members had never heard of the method) and generated ideas about how to buff it up with deep learning. Before the start of the second retreat we had collected about 7 rough ideas and lost our only member with prior ILP experience to deadly Oxford deadlines.

We considered extending the existing literature that combines higher order logic with ILP to obtain better performance, either by improving their open-source implementation or by proving theoretical results. The latter would entail making a stronger argument for their design choices, by building intuition for why they led to empirical improvements, formulating hypotheses, and proving theorems. Although we were reasonably confident that we could produce some results, this project seemed like it wouldn’t have any impact whatsoever on the direction that AI or ILP research is taking.

A project related to factored cognition was writing an algorithm that takes in a question, uses a neural network or deep RL to produce subquestions, answers the subquestions using ILP, and then combines the subquestions to produce an answer. We could have either approached this with a focus on “how to learn how to factorize problems” or “how can we improve ILP performance by decomposing tasks”. Ought has worked on factored cognition for a while, so it seems unlikely that we would have very interesting insights on the first perspective that they have not yet realized. We considered reproducing OpenAI’s iterated amplification results on experiments in five algorithmic domains. We decided against this, because we thought it would be a lot of work in proportion to what we would end up contributing. On the whole, we decided not to work on factored cognition because we expected the project would end up being brittle in the face of bottlenecks: if one part of the pipeline is not working then the entire algorithm is useless.

We had some very underspecified ideas on combining ILP with ML. Among which, “predict the distance between a theory and a completed theory (in number of predicates)” and “use ILP or some other proof checking algorithm to check a hypothesis and use a generative algorithm to generate hypotheses”. In the end we went with the idea that seemed most straightforward, which is developing a toy version an algorithm that predicts which pieces of the background in an ILP problem are relevant. This was a natural choice since it has a small base case (bag of words input to ordinary MLP), many possible extensions, and since the exact “background forgetting” algorithm is NP-hard. (Cropper 2019)

ILP implementation project

ILP takes as input background information (rule-like facts about the world), B, a set of positive examples, E+, and a set of negative examples, E-, and outputs a hypothesis, H, that is consistent with all the positive examples and inconsistent with the negative examples while using the background to minimise hypothesis size. ILP takes longer as the background is bigger, so reducing the background size reduces the computation time. Our goal is pruning the background set without losing relevant predicates that would change the output hypothesis.

Approaches include:

Within some domain, such as computational chemistry, there is a broad and fixed background. For each combination of positive and negative examples (problem statement) only a subset of the background is needed to find the correct hypothesis. If we can predict which subset of the background is needed for which problem, then we can reduce the time it takes to solve those problems. Such an algorithm would probably be based on recognizing similarities between background predicates and examples. For each element in the background it could predict whether that predicate is relevant for the problem at hand.
Alternatively, we could develop a general purpose “background size reduction” algorithm that works on any background. Such a distillation process would have to be based on intrinsic similarities between elements of the background. The algorithm could for example recognize which elements are duplicates. However, if we want to deal with the relationships between predicates in the background, then we can not use a predictor that only predicts relevance of individual elements. For example, if we have five duplicates of one predicate, then we may predict that each individual predicate should be thrown away, but we would like to throw away exactly four. Hence, in this setting we probably want to make predictions about the redundancy of subsets of the background (rather than the relevance of individual predicates).

Relevance predictor

Input to the relevance predictor. Train on:
- One specific background within a specific domain.
- Backgrounds of a specific size.
- Backgrounds of variable sizes smaller than n and just have some empty input vector placeholders (in random slots).
- Backgrounds of variable sizes.
Output of relevance predictor. Given a background B:

Return a relevance prediction for each element (predicting if after removing this element the resulting ILP hypothesis will or will not stay the same).
For a background B return a subset B’ (which is a smallest set for which the ILP hypothesis H_B and H_B’ are the same).

To go from predictions about individual predicates to predictions about subsets of the background, we could iteratively apply that system.

Very ambitious ideas (that we did not focus on) are:

Instead of just taking out semantic duplicates, we could build a system that generates summarizing predicates.
We could also be more ambitious and hope to clean up noise by for example eliminating contradictory predicates. This is more difficult than reducing redundancy, because in this case we are trying to change the background such that the output hypothesis could change (or could be found in the first place), but there may be many changes that look valid at face value. Presumably, statistics could predict which predicates are more likely to be noise if a predicate contradicts many other predicates, while the background set would be consistent without the bad predicate. However, when there are two predicates that are inconsistent, but that are both consistent with the rest of the background set, then it seems almost necessary to have domain knowledge to distinguish noise from true predicates.

The prototype developed over the course of the second retreat consists of the following modules:

A formal grammar generator. For a fixed set of symbols, which constitute the background, we generate sets of positive (resp. negative) examples by using (breaking) the rules of the grammar on a subset of symbols. For each set of examples, the background knowledge that we can dispose of is the set of predicates corresponding to the symbols not used in the grammar. We intend to use this very simple task as a first proof-of-concept, but the rest of the pipeline is agnostic about the ILP problem to solve, and we plan to implement other tasks in the future.
A Prolog parser that reads the predicates in the background and examples files and returns their parse trees. This way we obtain a graph representation of the input that encodes the relations between terms of the theory.
A predicate embedder module that maps parse trees to vectors. This is implemented as a graph net. Graph nets are models that receive an annotated graph as input and return the same graph as output with updated annotations. The annotations are simply labels on the graph nodes and/or edges and they can have any type. Update functions are usually implemented by neural networks and they respect certain locality rules e.g. a node label is updated using only the current values of the labels on neighbouring nodes and/or incoming edges. In our case, the labels are vectors attached to the nodes. They are initialized to one-hot vectors and updated following the rules of FormulaNet.
A relevance prediction module that receives the vector representation of the triple (B,E+,E-) and outputs the probability that each predicate in the background is relevant for the ILP task, given the examples. This module is implemented as an MLP with dropout. The embedder-predictor model is trained end-to-end using cross-entropy loss.

Implementation choices:

We could just have inputted a string into our network, but we thought that would be very hard to learn from. We tried two approaches to translating predicates into vectors that can be used as input for a predictor network.
- One approach is to count the number of occurrences of words/symbols in the predicate (bag of words).
- The other is to first translate the predicate to an abstract syntax tree, then use that tree as an input to a graph network, which learns to output a useful vector, which is input to an MLP. This pipeline was trained end-to-end with a training signal from (at first) known ground truth, and later a full ILP system as oracle.
Initially we planned to produce ground truth to train our predictor using an ILP module, but we realized we could just adjust our example generator such that it automatically generates the solutions (which was possible because of the simple grammar domain that we used). For a predictor that can be applied to a wider range of problems, such an ILP module is still needed.

We have two outputs planned: a case for and against ILP for safety, and the writeup and repo from our relevance predictor. We are seeking feedback from the ILP community, and have sufficient ideas for a number of future projects of similar size.