Neural Theorem Provers

07 Oct 2019 Logic Learning Query answering Neural-Symbolic

This paper (Rocktäschel & Riedel, 2017) presents Neural Theorem Provers (NTP), a framework for fuzzy/real logic i.e. [0,1]-query answering with backwards chaining and subsymbolic representation learning.

Idea

We start with a first order logic language with no existential quantifiers or function symbols. Knowledge bases are composed of facts and rules in Horn clause format (ala Prolog).

We assign to each constant c and predicate P symbol a fixed length k vector \(\Theta\) i.e. \(\Theta_c \in \mathbb{R}^k\) and \(\Theta_P \in \mathbb{R}^k\).

Then, allow unification to match any two symbols (constant or predicate) and in turn add a loss to the query answer with respect to the difference between those two symbols.

Unification tree for a query:

useful image

This is how symbols are unified:

useful image

Example of unification:

useful image

OR operation:

useful image

OR example:

useful image

AND operation:

useful image

Finally, optimize vectors with respect to the well known facts of the knowledge base.

useful image

How

Start with a first order knowledge base which contains facts and rules (ala Prolog). No \(\exists\)-quantifiers and all variables are \(\forall\)-quantified implicitly.
Each symbol is assigned a random vector.
Take the known facts as positive examples and replace symbols with other (incorrect) ones to generate negative examples.
In each optimization/training pass run backwards chaining with a set of positive and negative examples as the goals (removing them from the knowledge base temporarily).
At each execution step a confidence score for one goal is calculated (even though we know if that goal is true or not), then we calculate the error of the whole training pass through the log-likelihood function and the gradient of the error with respect to the vector representation of each symbol.
After training pass, we slightly modify the vector representations according to their gradient to minimize the error. Repeat this \(n \in \textrm{N}\) times.

Results

Achieved state-of-the-art accuracy with respect to ComplEx. Example dataset \(\textit{Nations}\) contains 56 binary predicates, 111 unary predicates, 14 constants and 2565 true facts. This method has a horrible time complexity because backwards chaining by itself is very demanding and doing it successively with large sets of examples leads to hard explosions. Note: proof depth is limited and no circular reasoning.
Can learn vector representations that:
1. Match symbols with identical meaning (\(\textit{ABC}\) and \(\textit{aBc}\)),
2. Express similarity between related symbols (\(\textit{fatherOf}\) and \(\textit{parentOf}\)), and
3. Enable to discover and utilize rules by using \(\textit{parametrized rules}\) in which the symbols are unknown but are assigned meaning through training (e.g. parametrized transitivity as \(p(X,Y) \text{ :- } q(X,Z), q(Z,Y).\)).

References

Rocktäschel, T., & Riedel, S. (2017). End-to-end differentiable proving. Advances in Neural Information Processing Systems, 3788–3800.

Martin Jedwabny

Neural Theorem Provers

Idea

How

Results

References

Related Posts

Statistical relational learning 25 Nov 2019

Markov Logic Networks 23 Oct 2019

Bayesian and Markov networks 23 Oct 2019