Hostname: page-component-77f85d65b8-7lfxl Total loading time: 0 Render date: 2026-03-30T04:16:24.028Z Has data issue: false hasContentIssue false

Machine learning in requirements elicitation: a literature review

Published online by Cambridge University Press:  26 October 2022

Cheligeer Cheligeer
Affiliation:
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada
Jingwei Huang
Affiliation:
Engineering Management & Systems Engineering, Old Dominion University, Norfolk, VA, USA
Guosong Wu
Affiliation:
Center for Health Informatics, Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Nadia Bhuiyan
Affiliation:
Department of Mechanical, Industrial and Aerospace Engineering, Concordia University, Montreal, QC, Canada
Yuan Xu
Affiliation:
Center for Health Informatics, Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Yong Zeng*
Affiliation:
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada
*
Author for correspondence: Yong Zeng, E-mail: zeng@ciise.concordia.ca
Rights & Permissions [Opens in a new window]

Abstract

A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What requirement elicitation activities are supported by ML? (2) What data sources are used to build ML-based requirement solutions? (3) What technologies, algorithms, and tools are used to build ML-based requirement elicitation? (4) How to construct an ML-based requirements elicitation method? (5) What are the available tools to support ML-based requirements elicitation methodology? Keywords derived from these research questions led to 975 records initially retrieved from 7 scientific search engines. Finally, 86 articles were selected for inclusion in the review. As the primary research finding, we identified 15 ML-based requirement elicitation tasks and classified them into four categories. Twelve different data sources for building a data-driven model are identified and classified in this literature review. In addition, we categorized the techniques for constructing ML-based requirement elicitation methods into five parts, which are Data Cleansing and Preprocessing, Textual Feature Extraction, Learning, Evaluation, and Tools. More specifically, 3 categories of preprocessing methods, 3 different feature extraction strategies, 12 different families of learning methods, 2 different evaluation strategies, and various off-the-shelf publicly available tools were identified. Furthermore, we discussed the limitations of the current studies and proposed eight potential directions for future research.

Information

Type
Review Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Related works

Figure 1

Table 2. The research scope under Cooper's literature review taxonomy

Figure 2

Table 3. Inclusion/exclusion criteria

Figure 3

Table 4. Elements of data extraction table

Figure 4

Fig. 1. PRISMA flowchart.

Figure 5

Fig. 2. The number of included papers by year.

Figure 6

Fig. 3. An illustration of the categorization schema of the collected studies.

Figure 7

Fig. 4. ML-based requirement elicitation tasks.

Figure 8

Fig. 5. The data source for building ML-based requirement elicitation solutions.

Figure 9

Fig. 6. Technologies and algorithms.

Figure 10

Table 5. Tools mentioned by included works

Figure 11

Table A1. The included works (#: study id)

Figure 12

Table A2. The extracted tasks for the included works

Figure 13

Table A3. The data source categorization

Figure 14

Table A4. The applied textual features

Figure 15

Table A5. The applied learning algorithms