Crowd-assessing quality in uncertain data linking datasets

Daniel Faria; Alfio Ferrara; Ernesto Jiménez-ruiz; Stefano Montanelli; Catia Pesquita

doi:10.1017/S0269888920000363

Crowd-assessing quality in uncertain data linking datasets

Part of: Ontology Alignment: Algorithms and Evaluation

Published online by Cambridge University Press: 02 July 2020

and

Daniel Faria: Affiliation:
Instituto Gulbenkian de Ciência, Oeiras, Portugal e-mail: dfaria@igc.gulbenkian.pt INESC-ID, Lisboa, Portugal
Alfio Ferrara: Affiliation:
Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: alfio.ferrara@unimi.it, stefano.montanelli@unimi.it Data Science Research Center, Università degli Studi di Milano, Milan, Italy
Ernesto Jiménez-ruiz: Affiliation:
City, University of London, London, UK e-mail: ernesto.jimenez-ruiz@city.ac.uk Department of Informatics, University of Oslo, Oslo, Norway e-mail: ernestoj@ifi.uio.no
Stefano Montanelli: Affiliation:
Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: alfio.ferrara@unimi.it, stefano.montanelli@unimi.it Data Science Research Center, Università degli Studi di Milano, Milan, Italy
Catia Pesquita: Affiliation:
Lasige, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal e-mail: clpesquita@fc.ul.pt

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The quality of a dataset used for evaluating data linking methods, techniques, and tools depends on the availability of a set of mappings, called reference alignment, that is known to be correct. In particular, it is crucial that mappings effectively represent relations between pairs of entities that are indeed similar due to the fact that they denote the same object. Since the reliability of mappings is decisive in order to perform a fair evaluation of automatic linking methods and tools, we call this property of mappings as mapping fairness. In this article, we propose a crowd-based approach, called Crowd Quality (CQ), for assessing the quality of data linking datasets by measuring the fairness of the mappings in the reference alignment. Moreover, we present a real experiment, where we evaluate two state-of-the-art data linking tools before and after the refinement of the reference alignment based on the CQ approach, in order to present the benefits deriving from the crowd assessment of mapping fairness.

Type: Research Article
Information: The Knowledge Engineering Review , Volume 35 , 2020 , e33

DOI: https://doi.org/10.1017/S0269888920000363 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Achichi, M., Cheatham, M., Dragisic, Z., Euzenat, J., Faria, D., Ferrara, A., Flouris, G., Fundulaki, I., Harrow, I., Ivanova, V., Jiménez-Ruiz, E., Kuss, E., Lambrix, P., Leopold, H., Li, H., Meilicke, C., Montanelli, S., Pesquita, C., Saveta, T., Shvaiko, P., Splendiani, A., Stuckenschmidt, H., Todorov, K., Trojahn dos Santos, C. & Zamazal, O. 2016. Results of the ontology alignment evaluation initiative 2016. In 11th International Workshop on Ontology Matching (OM 2016), Kobe, Japan, 73–129. CEUR-WS.org.Google Scholar

Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S. & Lehmann, J. 2013. Crowdsourcing linked data quality assessment. In Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, 260–276.Google Scholar

Algergawy, A., Cheatham, M., Faria, D., Ferrara, A., Fundulaki, I., Harrow, I., Hertling, S., Jiménez-Ruiz, E., Karam, N., Khiat, N., Lambrix, P., Li, H., Montanelli, S., Paulheim, H., Pesquita, C., Saveta, T., Schmidt, D., Shvaiko, P., Splendiani, A., Thiéblin, E., Trojahn dos Santos, C., Vatascinová, J., Zamazal, O. & Zhou, L. 2018. Results of the ontology alignment evaluation initiative 2018. In 13th International Workshop on Ontology Matching (OM 2018), Monterey, CA, USA, 76–116. CEUR-WS.org.Google Scholar

Bozzon, A., Brambilla, M., Ceri, S. & Mauri, A. 2013. Reactive crowdsourcing. In Proceedings of the 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil, 153–164.Google Scholar

Carmines, E. G. & Zeller, R. A. 1979. Reliability and Validity Assessment, 17. Sage Publications.CrossRef Google Scholar

Castano, S., Ferrara, A., Genta, L. & Montanelli, S. 2016. Combining Crowd Consensus and User Trustworthiness for Managing Collective Tasks. Future Generation Computer Systems, 54.Google Scholar

Castano, S., Ferrara, A. & Montanelli, S. (2015). A multi-dimensional approach to crowd-consensus modeling and evaluation. In Proceedings of the 34th International Conference on Conceptual Modeling (ER 2015), Stockholm, Sweden.CrossRef Google Scholar

Cheatham, M. & Hitzler, P. 2014. Conference v2.0: An uncertain version of the OAEI conference benchmark. In Proceedings of the 13th International Semantic Web Conference, Riva del Garda, Italy, 33–48.Google Scholar

Cruz, I. F., Loprete, F., Palmonari, M., Stroe, C. & Taheri, A. 2014. Pay-as-you-go multi-user feedback model for ontology matching. In Proceedings of the 19th International Conference on Knowledge Engineering and Knowledge Management, Link’oping, Sweden, 80–96.Google Scholar

Cuenca Grau, B., Dragisic, Z., Eckert, K., Euzenat, J., Ferrara, A., Granada, R., Ivanova, V., Jiménez-Ruiz, E., Kempf, A. O., Lambrix, P., Nikolov, A., Paulheim, H., Ritze, D., Scharffe, F., Shvaiko, P., Trojahn dos Santos, C. & Zamazal, O. 2013. Results of the ontology alignment evaluation initiative 2013. In 8th International Workshop on Ontology Matching (OM 2013), Sydney, Australia, 61–100. CEUR-WS.orgGoogle Scholar

Dragisic, Z., Ivanova, V., Lambrix, P., Faria, D., Jiménez-Ruiz, E., & Pesquita, C. (2016). User Validation in Ontology Alignment. In Proceedings of the 15th International Semantic Web Conference, Kobe, Japan.CrossRef Google Scholar

Estellés-Arolas, E. & Guevara, F. G. L. 2012. Towards an integrated crowdsourcing definition. Journal of Information Science 38(2), 189–200.Google Scholar

Euzenat, J., Rosoiu, M. & dos Santos, C. T. 2013. Ontology matching benchmarks: generation, stability, and discriminability. Journal of Web Semantics 21, 30–48.CrossRef Google Scholar

Euzenat, J. & Shvaiko, P. 2013. Ontology Matching, 2nd edition. Springer.CrossRef Google Scholar

Euzenat, J. & Shvaiko, P. 2007. Ontology Matching, 18. Springer.Google Scholar

Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I. F. & Couto, F. M. 2013. The AgreementMakerLight ontology matching system. In OTM Conferences - ODBASE, 527–541.Google Scholar

Ferrara, A., Montanelli, S., Noessner, J. Stuckenschmidt, H. 2011. Benchmarking matching applications on the semantic web. In Extended Semantic Web Conference. Springer, 108–122.Google Scholar

Galton, F. 1907. One vote, one value. Nature 75, 414.CrossRef Google Scholar

Genta, L., Ferrara, A. & Montanelli, S. 2017. Consensus-based techniques for range-task resolution in crowdsourcing systems. In Proceedings of the 7th EDBT International Workshop on Linked Web Data Management, Venice, Italy.Google Scholar

Howe, J. 2006. The rise of crowdsourcing. Wired Magazine 14(6), 1–4.Google Scholar

Jiménez-Ruiz, E. & Cuenca Grau, B. 2011. LogMap: logic-based and scalable ontology matching. In Proceedings of the 10th International Semantic Web Conference, Bonn, Germany, 273–288.Google Scholar

Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I. & Berlanga, R. 2011. Logic-based assessment of the compatibility of UMLS ontology sources. Journal of Biomedical Semantics 2.CrossRef Google Scholar

Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y. & Horrocks, I. 2012a. Large-scale interactive ontology matching: algorithms and implementation. In European Conference on Artificial Intelligence (ECAI), 444–449.Google Scholar

Jiménez-Ruiz, E., Grau, B. C., Horrocks, I.et al. 2012b. Exploiting the UMLS metathesaurus in the ontology alignment evaluation initiative. In 2nd International Workshop on Exploiting Large Knowledge Repositories (E- LKR). CEUR- WS. org.Google Scholar

Li, H., Dragisic, Z., Faria, D., Ivanova, V., Jiménez-Ruiz, E., Lambrix, P. & Pesquita, C. 2019. User validation in ontology alignment: functional assessment and impact. Knowledge Engineering Review 34, e15.CrossRef Google Scholar

Malone, T. W., Laubacher, R. & Dellarocas, C. 2010. The Collective Intelligence Genome. IEEE Engineering Management Review 38(3).CrossRef Google Scholar

Mortensen, J. M. 2013. Crowdsourcing Ontology Verification. In Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, 448–455.Google Scholar

Ngomo, A.-C. N. & Auer, S. 2011. Limesa time-efficient approach for large-scale link discovery on the web of data. In 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain.Google Scholar

Noronha, J., Hysen, E., Zhang, H. & Gajos, K. Z. 2011. Platemate: crowdsourcing nutritional analysis from food photographs. In Proceeding of the 24th Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 1–12.Google Scholar

Noy, N. F., Mortensen, J., Musen, M. A. & Alexander, P. R. 2013. Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow. In Proceedings of the 5th ACM Web Science Conference, Paris, France, 262–271.Google Scholar

Paulheim, H., Hertling, S. & Ritze, D. 2013. Towards evaluating interactive ontology matching tools. In Proceedings of the 10th Extended Semantic Web Conference, Montpellier, France, 31–45.Google Scholar

Röder, M., Saveta, T., Fundulaki, I. & Ngomo, A.-C. N. (2017). Hobbit link discovery benchmarks. 12th International Workshop on Ontology Matching (OM 2017), Vienna, Austria.Google Scholar

Sarasua, C., Simperl, E. & Noy, N. F. 2012. CrowdMap: crowdsourcing ontology alignment with microtasks. In Proceedings of the 11th International Semantic Web Conference, Boston, MA, USA, 525–541.Google Scholar

Saveta, T., Daskalaki, E., Flouris, G., Fundulaki, I., Herschel, M. & Ngonga Ngomo, A.-C. 2015. Pushing the limits of instance matching systems: a semantics-aware benchmark for linked data. In Proceedings of the 24th International Conference on World Wide Web, ACM, 105–106.Google Scholar

Thaler, S., Simperl, E. P. B. & Siorpaes, K. 2011. SpotTheLink: a game for ontology alignment. In Proceedings of the 6th Conference on Professional Knowledge Management: From Knowledge to Action, Innsbruck, Austria, 246–253.Google Scholar

Van Dusen, D. A., Chase, C. & Wise, J. A. 2016. System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction. US Patent 9461876.Google Scholar

Volz, J., Bizer, C., Gaedke, M. & Kobilarov, G. 2009. Silk-a link discovery framework for the web of data. In International Workshop on Linked Data on the Web (LDOW2009), Madrid, Spain. CEUR-WS.org.Google Scholar

Article contents

Crowd-assessing quality in uncertain data linking datasets

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests