MJ

Tensorlog

TensorLog (Cohen et al., 2017) is an implementation of probabilistic knowledge bases that (i) leverages deep learning to answer queries efficiently, by (ii) restricting the first order logic language to a class called p-tree-DKG, and (iii) learns weights on knowledge base facts.

How

The paper explains that the problem with past approaches is that they rely on grounding first order clauses (either for training or answering queries). For example, given a (Prolog-like) rule:

\[p(X,Y) \text{ :- } q(Y,Z),r(Z,Y)\]

Representing the formula:

\[\forall X \forall Y \ (p(X,Y) \lor \neg (q(Y,Z) \land r(Z,Y)))\]

And a logic vocabulary with a set of constants \(\mathcal{C}\) of size \(n\), the amount of clauses generated by the grounding of this clause would be \(n^3\), as each variable can be replaced by any constant. It becomes apparent that with larger amounts of variables in the clause, grounding them can quickly become untractable.

Let us quickly define the following: a database is a set of ground facts \(\mathcal{DB} = \{f_1, ..., f_N\}\), and a theory \(\mathcal{T}\) is as a set of Horn clauses (Prolog like), which can also be seen as knowledge base rules (as in the example above).

The proof methodology here is backwards chaining i.e. top-down unification of literals through rules. Then, a proof tree for a query \(Q\) is a graph where nodes contain tuples (S,L) consisting of a solution S (to a query or a successive unification state) and a list L of (non-grounded) literals to unify, where \((Q, [ Q ])\) is the root.

useful image

Now, in order to incorporate probabilistic reasoning to our scheme, we have to assign weights to our facts and rules and provide an extension for query answering through proof trees.

A stochastic logic program (SLP) is a tuple \((\mathcal{DB}, \mathcal{T}, \Theta)\) where \(\Theta\) maps literals and rules to non-negative real numbers. Given a query \(Q\) with proof graph \(G\) and an answer \(f\), the weight of \(f\) with respect to \(Q\) is:

\[w_Q(f) = \sum\limits_{v_0=(Q,[Q])\rightarrow ... \rightarrow v_n=(f,[])} \prod\limits_{i=0}^{n-1} \Theta_{r_{v_i,v_{i+1}}}\]

Where \(r_{v_i,v_{i+1}}\) is the rule used between nodes \(v_i\) and \(v_{i+1}\) in \(G\).

Then, we can normalize all the answers \(\{ f_1,...,f_k \}\) to the query as:

\[P(f_i | Q) = \frac{w_Q(f_i)}{\sum\limits_{j=1}^{k} w_Q(f_j)}\]

(Theorem 1) Computing \(P(f|Q)\) for all possible answers \(f\) of the query \(Q\) is #P-hard, even if there are only two such answers, the theory contains only two non-recursive clauses, and the knowledge graph contains only 13 facts.

To bypass this, the authors propose a series of restrictions that lead to more efficient query answerings.

A polytree-limited stochastic deductive knowledge graph (ptree-SDKG) is a SLP where:

An example:

useful image

(Theorem 2) For any ptree-SDKG, \(P(f\mid Q)\) can be computed in linear time in the size of \(\mathcal{T}\) and \(\mathcal{DB}\).

With these restrictions, it holds that:

\[w'_Q(f) = \sum\limits_{v_0=(Q,[Q])\rightarrow ... \rightarrow v_n=(f,[])} \prod\limits_{ \{ f \in \mathcal{DB} \mid f \text{ is used in the path} \} } \Theta_{f}\] \[P(f_i | Q) = \frac{w'_Q(f_i)}{\sum\limits_{j=1}^{k} w'_Q(f_j)}\]

Meaning that we can compute the normalized answer to a query \(Q\) only by looking at the facts that were used in the proof graph to unify, as they are the only ones with weight different than 1.

Conclusion

References

  1. Cohen, W. W., Yang, F., & Mazaitis, K. (2017). TensorLog: Deep Learning Meets Probabilistic DBs. CoRR, abs/1707.05390. http://arxiv.org/abs/1707.05390