Hostname: page-component-78c5997874-xbtfd Total loading time: 0 Render date: 2024-11-17T18:47:35.817Z Has data issue: false hasContentIssue false

Context identification of sentences in research articles: Towards developing intelligent tools for the research community

Published online by Cambridge University Press:  10 October 2012

M. A. ANGROSH
Affiliation:
Department of Information Science, University of Otago, Dunedin, New Zealand e-mail: angrosh@infoscience.otago.ac.nz, scranefield@infoscience.otago.ac.nz, nstanger@infoscience.otago.ac.nz
STEPHEN CRANEFIELD
Affiliation:
Department of Information Science, University of Otago, Dunedin, New Zealand e-mail: angrosh@infoscience.otago.ac.nz, scranefield@infoscience.otago.ac.nz, nstanger@infoscience.otago.ac.nz
NIGEL STANGER
Affiliation:
Department of Information Science, University of Otago, Dunedin, New Zealand e-mail: angrosh@infoscience.otago.ac.nz, scranefield@infoscience.otago.ac.nz, nstanger@infoscience.otago.ac.nz

Abstract

Scientific literature is an important medium for disseminating scientific knowledge. However, in recent times, a dramatic increase in research output has resulted in challenges for the research community. An increasing need is felt for tools that exploit the full content of an article and provide insightful services with value beyond quantitative measures such as impact factors and citation counts. However, the intricacies of language and thought, and the unstructured format of research articles present challenges in providing such services. The identification of sentence contexts that encode the role of specific sentences in advancing an article's scientific argument can facilitate in developing intelligent tools for the research community. This paper describes our research work in this direction. First, we investigate the possibility of identifying contexts associated with sentences and propose a scheme of thirteen context type definitions for sentences, based on the generic rhetorical pattern found in scientific articles. We then present the results of our experiments using sequential classifiers – conditional random fields – for achieving automatic context identification. We also describe our Semantic Web application developed for providing citation context based information services for the research community. Finally, we present a comparison and analysis of our results with similar studies and explain the distinct features of our application.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Angrosh, M. A., Cranefield, S., and Stanger, N. 2010. Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In Hunter, J.et al. (eds.), Proceedings of the 2010 Joint Conference on Digital Libraries, pp. 293302. New York: ACM.Google Scholar
Angrosh, M. A., Cranefield, S., and Stanger, N. 2011. Contextual information retrieval in research articles: semantic publishing tools for the research community. The Information Science Discussion Paper Series Number 2011/06, Department of Information Science, Dunedin, University of Otago.Google Scholar
Baldi, S. 1998. Normative versus social constructivist processes in the allocation of citations: a network-analytic model. American Sociological Review 63 (6): 829–46.CrossRefGoogle Scholar
Brooks, T. A. 1985. Private acts and public objects: an investigation of citer motivations. Journal of the American Society for Information Science 36 (4): 223–9.CrossRefGoogle Scholar
Brooks, T. A. 1986. Evidence of complex citer motivations. Journal of the American Society for Information Science 37 (1): 34–6.CrossRefGoogle Scholar
Buckingham Shum, S. J., Uren, V., Li, G., Sereno, B., and Mancini, C. 2007. Modelling naturalistic argumentation in research literatures: representation and interaction design issues. International Journal of Intelligent Systems 22 (1): 1747.CrossRefGoogle Scholar
Case, D. O., and Higgins, G. M. 2000. How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science 51 (7): 635–45.3.0.CO;2-H>CrossRefGoogle Scholar
Chubin, D. E., and Moitra, S. D. 1975. Content analysis of references: adjunct or alternative to citation counting? Social Studies of Science 5 (4): 423–41.CrossRefGoogle Scholar
Chung, G. Y. 2009. Sentence retrieval for abstracts of randomized controlled trials. BMC Medical Informatics and Decision Making 9 (10): 113.CrossRefGoogle ScholarPubMed
Elmezain, M., Al-Hamadi, A., Appenrodt, J., and Michaelis, B. 2008. A Hidden Markov Model-based continuous gesture recognition system for hand motion trajectory. In Proceedings of the 19th International Conference on Pattern Recognition, pp. 14. Tampa Florida: IEEE.Google Scholar
Finney, B. 1979. The Reference Characteristics of Scientific Texts. Master's thesis. London: The City University of London.Google Scholar
Frost, C. O. 1979. The use of citations in literary research: a preliminary classification of citation functions. The Library Quarterly 49 (4): 399414.Google Scholar
Gaillard, J. 2008. The characteristics of R and D in developing countries: measuring R & D in developing countries, the UNESCO Institute of Statistics (UIS), March 2008. http://www.uis.unesco.org/template/pdf/S&T/Gaillard_final_report.pdfGoogle Scholar
Garfield, E. 1965. Can citation indexing be automated? In Stevens, M. E.et al. (eds.), Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, vol. 1, pp. 189–92. Washington: National Bureau of Standards Miscellaneous Publication.Google Scholar
Garzone, M. A. 1997. Automated Classification of Citations Using Linguistic Semantic Grammars. MSc thesis. London: University of Western Ontario.Google Scholar
Garzone, M., and Mercer, R. E. 2000. Towards an automated citation classifier. In Hamilton, H., and Yang, Q. (eds.), Canadian AI 2000, pp. 337–46. Lecture Notes in Artificial Intelligence, vol. 1822. Berlin: Springer-Verlag.Google Scholar
Guo, Y., Korhonen, A., Liakata, M., Silins, I., Sun, L., and Stenius, U. 2010. Identifying the information structure of scientific abstracts: an investigation of three different schemes. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 99107. Uppsala, Sweden: Association of Computational Linguistics.Google Scholar
Hachey, B., and Grover, C. 2005. Sequence modelling for sentence classification in a legal summarisation system. In Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–6. New York: ACM.CrossRefGoogle Scholar
Hirohata, K., Okazaki, N., Ananiadou, S., and Ishizuka, M. 2008. Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the Third International Joint Conference on Natural Language Processing, Hyderabad, India, pp. 381–8.Google Scholar
Hodges, T. L. 1972. Citation Indexing: Its Potential for Bibliographic Control. PhD thesis. Berkeley: University of California.Google Scholar
Hu, J., Brown, M. K., and Turin, W. 1996. HMM based on-line handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (10): 1039–45.Google Scholar
Kim, S. N., Martinez, D., Cavedon, L., and Yencken, L. 2011. Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics 12 (Suppl 2):S5: 110.CrossRefGoogle ScholarPubMed
Kupiec, J. 1992. Robust part-of-speech tagging using a Hidden Markov Model. Computer Speech and Language 6: 225–42.CrossRefGoogle Scholar
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–9. Williamstown, MA, USA: Morgan Kaufmann.Google Scholar
Langer, H., Lüngen, H., and Bayerl, P. S. 2004. Text type structure and logical document structure. In Proceedings of the 2004 ACL Workshop on Discourse Annotation - DiscAnnotation '04, pp. 4956, Barcelona, Spain: ACL.CrossRefGoogle Scholar
Lawrence, S., Giles, C. L., and Bollacker, K. 1999. Digital libraries and autonomous citation indexing. IEEE Computer 32 (6): 6771.CrossRefGoogle Scholar
Le, M. H., Ho, T. B., and Nakamori, Y. 2006. Detecting citation types using finite-state machines. In PAKDD 2006, LNCS 3918, pp. 265–74, Berlin, Heidelberg: Springer-Verlag.Google Scholar
Li, H., Councill, I., Lee, W. C., and Giles, C. L. 2006. CiteSeerX: an architecture and web service design for an academic document search engine. In WWW 2006, pp. 883–4. New York: ACM.Google Scholar
Liakata, M. 2010. Zones of conceptualisation in scientific papers: a window to negative and speculative statements. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 14, Uppsala, Sweden.Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., and Batchelor, C. 2010. Corpora for the conceptualisation and zoning of scientific papers. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010). Valletta, Malta: European Language Resources Association.Google Scholar
Lindsey, D., and Lindsey, T. 1978. The outlook of journal editors and referees on the normative criteria of scientific craftsmanship: viewpoints from psychology, social work, and sociology. Quality and Quantity 12: 4562.CrossRefGoogle Scholar
Lipetz, B. 1965. Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation 16 (2): 8190.CrossRefGoogle Scholar
Marshall, A. 2009. Principles of Economics, 8th ed.New York: Cosimo.Google Scholar
McCallum, A. K. 2002. MALLET: a machine learning for language toolkit. http://mallet.cs.umass.eduGoogle Scholar
McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pp. 591–8. Stanford, CA: Morgan Kaufmann.Google Scholar
Mizuta, Y., and Collier, N. 2004a. An annotation scheme for a rhetorical analysis of biology articles. In Proceedings of the Fourth International Conference on Language Resource and Evaluation, (LREC 2004), Lisbon, Portugal.Google Scholar
Mizuta, Y., and Collier, N. 2004b. Zone identification in biology articles as a basis for information extraction. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, (JNLPBA’ 04), pp. 2935. Geneva, Switzerland: ACL.Google Scholar
Moravcsik, M. J., and Murugesan, P. 1975. Some results on the function and quality of citations. Social Studies of Science 5 (1): 8692.CrossRefGoogle Scholar
Nanba, H., Kando, N., and Okumura, M. 2000. Classification of research papers using citation links and citation types: towards automatic review article generation. In Proceedings of the 11th SIG Classification Research Workshop, Classification for User Support and Learning, pp. 117–34, Chicago, USA.Google Scholar
Nanba, H., and Okumura, M. 1999. Towards multi-paper summarization retrieval of papers using reference information. In Dean, T. (ed.), IJCAI, pp. 926–31. Stockholm, Sweden: Morgan Kaufmann.Google Scholar
Oppenheim, C., and Renn, S. P. 1978. Highly cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information Science 29 (5): 227–31.CrossRefGoogle Scholar
Oxford University Press (OUP). 2010. Oxford Dictionary of English. Stevenson, A. (ed.). Oxford, UK: Oxford University Press.Google Scholar
Pham, S. B., and Hoffmann, A. 2003. A new approach for scientific citation using cue phrases. In Gedeon, T. D., and Fung, L. C. C. (eds.), Australian Joint Conference in Artificial Intelligence, pp. 759–71. Berlin: Springer-Verlag.Google Scholar
Prabha, C. G. 1986. Some aspects of citation behavior: a pilot study in business administration. Journal of the American Society for Information Science 34 (3): 202–6.CrossRefGoogle Scholar
Rabiner, L. R. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77 (2): 257–86.CrossRefGoogle Scholar
Radoulov, R. 2008. Exploring Automatic Citation Classification. MSc thesis. Ontario: University of Waterloo.Google Scholar
Research4Life. (2009). Research output in developing countries reveals 194% increase in five years. Program Manager. http://www.research4life.org/Documents/Increase_in_developing_country_research_output.pdfGoogle Scholar
Sha, F., and Pereira, F. 2003. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology – (NAACL’03), pp. 134–41. Edmonton, Canada: Association for Computational Linguistics.Google Scholar
Shadish, W. R., Tolliver, D., Gray, M., and Sen Gupta, S. K. 1995. Author judgements about works they cite: three studies from psychology journals. Social Studies of Science 25 (3): 477–98.CrossRefGoogle Scholar
Shatkay, H., Pan, F., Rzhetsky, A., and Wilbur, W. J. 2008. Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics 24 (18): 2086–93.CrossRefGoogle ScholarPubMed
Small, H. 1982. Citation context analysis. In Dervin, P., and Voigt, M. J. (eds.), Progress in Communication Sciences 3, pp. 287310, Norwood, NJ: Ablex.Google Scholar
Soldatova, L., and Liakata, M. 2007. An ontology methodology and CISP - the proposed core information about scientific papers. JISC Technology and Standards Watch. Aberystwyth: The University of Wales. http://ie-repository.jisc.ac.uk/137/1/ReportCISP.pdfGoogle Scholar
Spiegel-Rosing, I. 1977. Science studies: bibliometric and content analysis. Social Studies of Science 7 (1): 97113.CrossRefGoogle Scholar
Swales, J. 1984. Citation analysis and discourse analysis. Applied Linguistics 7 (1): 3956.CrossRefGoogle Scholar
Tanguay, D. O. 1995. Hidden Markov Models for Gesture Recognition. MSc thesis. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Boston.Google Scholar
Teufel, S. 1999. Argumentative Zoning: Information Extraction from Scientific Text. PhD thesis, University of Edinburgh, Edinburgh.Google Scholar
Teufel, S., and Moens, M. 1999. Discourse-level argumentation in scientific articles: human and automatic annotation. In ACL Workshop - Towards Standards and Tools for Discourse Tagging. Maryland, USA: ACL.Google Scholar
Teufel, S., Siddharthan, A., and Tidhar, D. 2006. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 103–10. Sydney, Australia: ACL.CrossRefGoogle Scholar
White, H. D. 2004. Citation analysis and discourse analysis revisited. Applied Linguistics 25 (1): 89116.CrossRefGoogle Scholar
Wilbur, W. J., Rzhetsky, A., and Shatkay, H. 2006. New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 7: 356.CrossRefGoogle ScholarPubMed