Structure learning of probabilistic logic programs by searching the clause space

ELENA BELLODI; FABRIZIO RIGUZZI

doi:10.1017/S1471068413000689

Structure learning of probabilistic logic programs by searching the clause space

Published online by Cambridge University Press: 15 January 2014

ELENA BELLODI and

FABRIZIO RIGUZZI

Show author details

ELENA BELLODI: Affiliation:
Dipartimento di Ingegneria – University of Ferrara, Via Saragat 1, 44122, Ferrara, Italy (e-mail: elena.bellodi@unife.it)
FABRIZIO RIGUZZI: Affiliation:
Dipartimento di Matematica e Informatica – University of Ferrara, Via Saragat 1, 44122, Ferrara, Italy (e-mail: fabrizio.riguzzi@unife.it)

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Learning probabilistic logic programming languages is receiving an increasing attention, and systems are available for learning the parameters (PRISM, LeProbLog, LFI-ProbLog and EMBLEM) or both structure and parameters (SEM-CP-logic and SLIPCASE) of these languages. In this paper we present the algorithm SLIPCOVER for “Structure LearnIng of Probabilistic logic programs by searChing OVER the clause space.” It performs a beam search in the space of probabilistic clauses and a greedy search in the space of theories using the log likelihood of the data as the guiding heuristics. To estimate the log likelihood, SLIPCOVER performs Expectation Maximization with EMBLEM. The algorithm has been tested on five real world datasets and compared with SLIPCASE, SEM-CP-logic, Aleph and two algorithms for learning Markov Logic Networks (Learning using Structural Motifs (LSM) and ALEPH++ExactL1). SLIPCOVER achieves higher areas under the precision-recall and receiver operating characteristic curves in most cases.

Keywords

probabilistic inductive logic programming statistical relational learning structure learning distribution semantics logic programs with annotated disjunction CP-logic

Type: Regular Papers
Information: Theory and Practice of Logic Programming , Volume 15 , Issue 2: Probability, Logic and Learning , March 2015 , pp. 169 - 212

DOI: https://doi.org/10.1017/S1471068413000689 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Beerenwinkel, N., Rahnenführer, J., Däumer, M., Hoffmann, D., Kaiser, R., Selbig, J. and Lengauer, T. 2005. Learning multiple evolutionary pathways from cross-sectional data. Journal of Computational Biology 12, 584–598.CrossRef Google Scholar PubMed

Bellodi, E. and Riguzzi, F. 2011. Learning the structure of probabilistic logic programs. In 21st International Conference on Inductive Logic Programming (ILP-2011), Revised Selected Papers. LNCS, Vol. 7207. Springer, Berlin, Germany, 61–75.Google Scholar

Bellodi, E. and Riguzzi, F. 2012. Experimentation of an expectation maximization algorithm for probabilistic logic programs. Intelligenza Artificiale 8, 3–18.Google Scholar

Bellodi, E. and Riguzzi, F. 2013. Expectation maximization over binary decision diagrams for probabilistic logic programs. Intelligent Data Analysis 17, 343–363.CrossRef Google Scholar

Berka, P., Rauch, J. and Tsumoto, S. (Eds.) 2002. ECML/PKDD 2002 Discovery Challenge. Proceedings of the ECML/PKDD Discovery Challenge: A Collaborative Effort in Knowledge Discovery from Databases, 108–119.Google Scholar

Biba, M., Ferilli, S. and Esposito, F. 2008. Discriminative structure learning of Markov logic networks. In Proceedings of the 18th International Conference on Inductive Logic Programming (ILP-2008). LNCS, Vol. 5194. Springer, Berlin, Germany, 59–76.Google Scholar

Boyd, K., Davis, J., Page, D. and Santos Costa, V. 2012. Unachievable region in precision-recall space and its effect on empirical evaluation. In Proceedings of the 29th International Conference on Machine Learning (ICML-2012), Edinburgh, Scotland, UK. icml.cc/Omnipress, Madison, WI, 639–646.Google Scholar

Bragaglia, S. and Riguzzi, F. 2011. Approximate inference for logic programs with annotated disjunctions. In 20th International Conference on Inductive Logic Programming (ILP-2010), Revised Papers. LNCS, Vol. 6489. Springer, Berlin, Germany, 30–37.Google Scholar

Craven, M. and Slattery, S. 2001. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning 43, 97–119.CrossRef Google Scholar

Dantsin, E. 1991. Probabilistic logic programs and their semantics. In Russian Conference on Logic Programming (RCLP-1991). LNCS, Vol. 592. Springer, Berlin, Germany, 152–164.Google Scholar

Darwiche, A. 2004. New advances in compiling CNF into decomposable negation normal form. In Proceedings of the 16th Eureopean Conference on Artificial Intelligence (ECAI-2004). IOS Press, Amsterdam, Netherlands, 328–332.Google Scholar

Davis, J. and Goadrich, M. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML-2006). ACM International Conference Proceeding Series 148. ACM, New York, NY, 233–240.CrossRef Google Scholar

De Raedt, L., Demoen, B., Fierens, D., Gutmann, B., Janssens, G., Kimmig, A., Landwehr, N., Mantadelis, T., Meert, W., Rocha, R., Santos Costa, V., Thon, I. and Vennekens, J. 2008. Towards digesting the alphabet-soup of statistical relational learning. In 1st Workshop on Probabilistic Programming: Universal Languages, Systems and Applications (NIPS 2008), Vancouver, British Columbia, Canada, 1–3.Google Scholar

De Raedt, L., Kersting, K., Kimmig, A., Revoredo, K. and Toivonen, H. 2008. Compressing probabilistic Prolog programs. Machine Learning 70, 151–168.Google Scholar

De Raedt, L., Kimmig, A. and Toivonen, H. 2007. ProbLog: A probabilistic prolog and its application in link discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007). AAAI Press, Menlo Park, CA, 2462–2467.Google Scholar

De Raedt, L. and Thon, I. 2010. Probabilistic rule learning. In 20th International Conference on Inductive Logic Programming (ILP-2010), Revised Papers, LNCS, Vol. 7207. Springer, New York, NY, 47–58.CrossRef Google Scholar

Fawcett, T. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874.CrossRef Google Scholar

Friedman, N. 1998. The Bayesian structural EM algorithm. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI '98). Morgan Kaufmann, Burlington, MA, 129–138.Google Scholar

Fuhr, N. 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. Journal of the American Society for Information Science 51, 95–110.Google Scholar

Getoor, L., Friedman, N., Koller, D., Pfeffer, A. and Taskar, B. 2007. Probabilistic relational models. In Introduction to Statistical Relational Learning, Getoor, L. and Taskar, B., Eds. MIT Press, Cambridge, MA, 129–174.Google Scholar

Gutmann, B., Kimmig, A., Kersting, K. and De Raedt, L. 2008. Parameter learning in probabilistic databases: A least squares approach. In Machine Learning and Knowledge Discovery in Databases – European Conference (ECML/PKDD-2008), Proceedings, Part I, LNCS, Vol. 5211. Springer, Berlin, Germany, 473–488.Google Scholar

Gutmann, B., Kimmig, A., Kersting, K. and De Raedt, L. 2010. Parameter Estimation in ProbLog from Annotated Queries. Tech. Rep. CW 583, KU Leuven, Belgium.Google Scholar

Gutmann, B., Thon, I. and De Raedt, L. 2011. Learning the parameters of probabilistic logic programs from interpretations. In Machine Learning and Knowledge Discovery in Databases – European Conference (ECML/PKDD-2011), Proceedings, Part I, LNCS, Vol. 6911. Springer, Berlin, Germany, 581–596.Google Scholar

Huynh, T. N. and Mooney, R. J. 2008. Discriminative structure and parameter learning for Markov logic networks. In Proceedings of the 25th International Conference on Machine Learning (ICML-2008), ACM International Conference Proceeding Series 307. ACM, New York, NY, 416–423.Google Scholar

Inoue, K., Sato, T., Ishihata, M., Kameya, Y. and Nabeshima, H. 2009. Evaluating abductive hypotheses using an EM algorithm on BDDs. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-2009). Morgan Kaufmann, Burlington, MA, 810–815.Google Scholar

Ishihata, M., Kameya, Y., Sato, T. and Minato, S. 2008a. Propositionalizing the EM algorithm by BDDs. In 18th International Conference on Inductive Logic Programming (ILP-2008), Late Breaking Papers, 44–49.Google Scholar

Ishihata, M., Kameya, Y., Sato, T. and Minato, S. 2008b. Propositionalizing the EM Algorithm by BDDs. Tech. Rep. TR08-0004, Department of Computer Science, Tokyo Institute of Technology, Japan.Google Scholar

Ishihata, M., Sato, T. and ichi Minato, S. 2011. Compiling Bayesian networks for parameter learning based on shared BDDs. In Proceedings of the 24th Australasian Joint Conference on Advances in Artificial Intelligence (AI 2011). LNCS, Vol. 7106. Springer, New York, NY, 203–212.Google Scholar

Kersting, K. and De Raedt, L. 2008. Basic principles of learning Bayesian logic programs. In Probabilistic Inductive Logic Programming, De Raedt, L., Frasconi, P., Kersting, K. and Muggleton, S., Eds. LNCS, Vol. 4911. Springer, New York, NY, 189–221.Google Scholar

Khosravi, H., Schulte, O., Hu, J. and Gao, T. 2012. Learning compact Markov logic networks with decision trees. Machine Learning 89, 257–277.CrossRef Google Scholar

Kimmig, A., Demoen, B., De Raedt, L., Santos Costa, V. and Rocha, R. 2011. On the implementation of the probabilistic logic programming language ProbLog. Theory and Practice of Logic Programming 11, 235–262.CrossRef Google Scholar

Kok, S. and Domingos, P. 2005. Learning the structure of Markov logic networks. In Proceedings of the 22nd International Conference on Machine Learning (ICML-2005), ACM International Conference Proceeding Series 119. ACM, New York, NY, 441–448.Google Scholar

Kok, S. and Domingos, P. 2009. Learning Markov logic network structure via hypergraph lifting. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML-2009), ACM International Conference Proceeding Series 382. ACM, New York, NY, 505–512.CrossRef Google Scholar

Kok, S. and Domingos, P. 2010. Learning Markov logic networks using structural motifs. In Proceedings of the 27th International Conference on Machine Learning (ICML-2010). Omnipress, Madison, WI, 551–558.Google Scholar

Lowd, D. and Domingos, P. 2007. Efficient weight learning for Markov logic networks. In Proceedings of the 18th European Conference on Machine Learning (ECML-2007), LNCS, Vol. 4702. Springer, New York, NY, 200–211.Google Scholar

Meert, W., Struyf, J. and Blockeel, H. 2008. Learning ground CP-Logic theories by leveraging Bayesian network learning techniques. Fundamenta Informaticae 89, 131–160.Google Scholar

Mihalkova, L. and Mooney, R. J. 2007. Bottom-up learning of Markov logic network structure. In Proceedings of the 24th International Conference on Machine Learning (ICML-2007), ACM International Conference Proceeding Series 227. ACM, New York, NY, 625–632.CrossRef Google Scholar

Minato, S., Satoh, K. and Sato, T. 2007. Compiling Bayesian networks by symbolic probability calculation based on zero-suppressed BDDs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007). AAAI Press, Palo Alto, CA, 2550–2555.Google Scholar

Muggleton, S. 1995. Inverse entailment and Progol. New Generation Computing 13, 245–286.CrossRef Google Scholar

Ourston, D. and Mooney, R. J. 1994. Theory refinement combining analytical and empirical methods. Artificial Intelligence 66, 273–309.CrossRef Google Scholar

Paes, A., Revoredo, K., Zaverucha, G. and Santos Costa, V. 2006. PFORTE: Revising probabilistic FOL theories. In Advances in Artificial Intelligence – Proceedings of the 2nd International Joint Conference, 10th Ibero-American Conference on AI, 18th Brazilian AI Symposium (IBERAMIA-SBIA-2006), LNCS, Vol. 4140. Springer, New York, NY, 441–450.Google Scholar

Poole, D. 1993. Logic programming, abduction and probability – a top-down anytime algorithm for estimating prior and posterior probabilities. New Generation Computing 11, 377–400.CrossRef Google Scholar

Poole, D. 1997. The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94, 7–56.Google Scholar

Przymusinski, T. C. 1989. Every logic program has a natural stratification and an iterated least fixed point model. In Proceedings of the 8th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-1989). ACM Press, New York, NY, 11–21.Google Scholar

Quinlan, J. R. and Cameron-Jones, R. M. 1993. FOIL: A midterm report. In Machine Learning: ECML-93, Proceedings of the European Conference on Machine Learning. LNCS, Vol. 667. Springer, Berlin, Germany, 3–20.Google Scholar

Rauzy, A., Châtelet, E., Dutuit, Y. and Bérenguer, C. 2003. A practical comparison of methods to assess sum-of-products. Reliability Engineering and System Safety 79, 33–42.CrossRef Google Scholar

Richards, B. L. and Mooney, R. J. 1995. Automated refinement of first-order Horn-clause domain theories. Machine Learning 19, 95–131.Google Scholar

Richardson, M. and Domingos, P. 2006. Markov logic networks. Machine Learning 62, 107–136.CrossRef Google Scholar

Riguzzi, F. 2004. Learning logic programs with annotated disjunctions. In Proceedings of the 14th International Conference on Inductive Logic Programming (ILP-2004). LNAI, Vol. 3194. Springer-Verlag, Berlin, Germany, 270–287.Google Scholar

Riguzzi, F. 2006. ALLPAD: Approximate Learning of Logic Programs with Annotated Disjunctions. In 16th International Conference on Inductive Logic Programming (ILP-2006), Revised Selected Papers, LNCS, Vol. 4455. Springer, Berlin, Germany, 43–45.Google Scholar

Riguzzi, F. 2007. A top-down interpreter for LPAD and CP-Logic. In AI*IA 2007: Artificial Intelligence and Human-Oriented Computing, Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence, LNCS, Vol. 4733. Springer, Berlin, Germany, 109–120.Google Scholar

Riguzzi, F. 2008a. ALLPAD: Approximate Learning of Logic Programs with Annotated Disjunctions. Machine Learning 70, 207–223.CrossRef Google Scholar

Riguzzi, F. 2008b. Inference with Logic Programs with Annotated Disjunctions under the well–founded semantics. In Proceedings of the 24th International Conference on Logic Programming (ICLP-2008), LNCS, Vol. 5366. Springer, Berlin, Germany, 667–771.Google Scholar

Riguzzi, F. 2009. Extended semantics and inference for the independent choice logic. Logic Journal of the IGPL 17, 589–629.Google Scholar

Riguzzi, F. 2010. SLGAD resolution for inference on Logic Programs with Annotated Disjunctions. Fundamenta Informaticae 102, 429–466.Google Scholar

Riguzzi, F. 2013a. MCINTYRE: A Monte Carlo system for probabilistic logic programming. Fundamenta Informaticae 124, 521–541.CrossRef Google Scholar

Riguzzi, F. 2013b. Speeding up inference for probabilistic logic programs. The Computer Journal. doi:10.1093/comjnl/bxt096.Google Scholar

Riguzzi, F. and Di Mauro, N. 2012. Applying the information bottleneck to statistical relational learning. Machine Learning 86, 89–114.CrossRef Google Scholar

Riguzzi, F. and Swift, T. 2010. Tabling and answer subsumption for reasoning on logic programs with annotated disjunctions. In Technical Communications of the 26th International Conference on Logic Programming (ICLP-2010), LIPIcs, Vol. 7. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Wadern, Germany, 162–171.Google Scholar

Riguzzi, F. and Swift, T. 2011. The PITA system: Tabling and answer subsumption for reasoning under uncertainty. Theory and Practice of Logic Programming, International Conference on Logic Programming (ICLP) Special Issue 11, 433–449.Google Scholar

Riguzzi, F. and Swift, T. 2013. Well-definedness and efficient inference for probabilistic logic programming under the distribution semantics. Theory and Practice of Logic Programming 13, 279–302.Google Scholar

Sang, T., Beame, P. and Kautz, H. A. 2005. Performing Bayesian inference by weighted model counting. In Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference (AAAI-2005). AAAI Press/The MIT Press, Cambridge, MA, 475–482.Google Scholar

Santos Costa, V., Damas, L. and Rocha, R. 2012. The YAP Prolog system. Theory and Practice of Logic Programming 12, 5–34.Google Scholar

Santos Costa, V., Page, D., Qazi, M. and Cussens, J. 2003. CLP(BN): Constraint logic programming for probabilistic knowledge. In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI'03). Morgan Kaufmann, Burlington, MA, 517–524.Google Scholar

Sato, T. 1995. A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming (ICLP-1995). MIT Press, Cambridge, MA, 715–729.Google Scholar

Sato, T. and Kameya, Y. 2001. Parameter learning of logic programs for symbolic-statistical modeling. Journal of Artificial Intelligence Research 15, 391–454.Google Scholar

Schwarz, G. 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461–464.Google Scholar

Srinivasan, A. 2012. Aleph [online]. Accessed 3 April 2012. URL: http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html.Google Scholar

Srinivasan, A., Muggleton, S., King, R. and Sternberg, M. 1994. Mutagenesis: ILP experiments in a non-determinate biological domain. In Proceedings of the 4th International Workshop on Inductive Logic Programming, GMD-Studien 237. Gesellschaft fur Mathematik und Datenverarbeitung MBH, 217–232.Google Scholar

Srinivasan, A., Muggleton, S., Sternberg, M. J. E. and King, R. D. 1996. Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85, 277–299.CrossRef Google Scholar

Thayse, A., Davio, M. and Deschamps, J. P. 1978. Optimization of multivalued decision algorithms. In Proceedings of the 8th International Symposium on Multiple-Valued logic (MLV '78). IEEE Computer Society Press, Washington, DC, 171–178.Google Scholar

Thon, I., Landwehr, N. and De Raedt, L. 2008. A simple model for sequences of relational state descriptions. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD-2008), Part II, LNCS Vol. 5212. Springer, New York, NY, 506–521.Google Scholar

Van Gelder, A., Ross, K. A. and Schlipf, J. S. 1991. The well–founded semantics for general logic programs. Journal of the ACM 38, 620–650.CrossRef Google Scholar

Vennekens, J., Denecker, M. and Bruynooghe, M. 2009. CP-logic: A language of causal probabilistic events and its relation to logic programming. Theory and Practice of Logic Programming 9, 245–308.CrossRef Google Scholar

Vennekens, J. and Verbaeten, S. 2003. Logic Programs with Annotated Disjunctions. Tech. Rep. CW386, KU Leuven, Netherlands.Google Scholar

Vennekens, J., Verbaeten, S. and Bruynooghe, M. 2004. Logic programs with annotated disjunctions. In Proceedings of the 20th International Conference on Logic Programming (ICLP-2004). LNCS Vol. 3131. Springer, Berlin, Germany, 195–209.Google Scholar

Article contents

Structure learning of probabilistic logic programs by searching the clause space

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests