Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-17T11:04:17.191Z Has data issue: false hasContentIssue false

A semantic similarity-based method to support the conversion from EXPRESS to OWL

Published online by Cambridge University Press:  03 November 2023

Yan Liu*
Affiliation:
Department of Computer Science, College of Engineering, Shantou University, Shantou 515063, China
Qingquan Jian
Affiliation:
School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, Jilin, China
Claudia M. Eckert
Affiliation:
School of Engineering and Innovation, The Open University, Walton Hall, Milton Keynes MK7 6AA, UK
*
Corresponding author: Yan Liu; Email: yanliu@stu.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

Product data sharing is fundamental for collaborative product design and development. Although the STandard for Exchange of Product model data (STEP) enables this by providing a unified data definition and description, it lacks the ability to provide a more semantically enriched product data model. Many researchers suggest converting STEP models to ontology models and propose rules for mapping EXPRESS, the descriptive language of STEP, to Web Ontology Language (OWL). In most research, this mapping is a manual process which is time-consuming and prone to misunderstandings. To support this conversion, this research proposes an automatic method based on natural language processing techniques (NLP). The similarities of language elements in the reference manuals of EXPRESS and OWL have been analyzed in terms of three aspects: heading semantics, text semantics, and heading hierarchy. The paper focusses on translating between language elements, but the same approach has also been applied to the definition of the data models. Two forms of the semantic analysis with NLP are proposed: a Combination of Random Walks (RW) and Global Vectors for Word Representation (GloVe) for heading semantic similarity; and a Decoding-enhanced BERT with disentangled attention (DeBERTa) ensemble model for text semantic similarity. The evaluation shows the feasibility of the proposed method. The results not only cover most language elements mapped by current research, but also identify the mappings of the elements that have not been included. It also indicates the potential to identify the OWL segments for the EXPRESS declarations.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

STandard for Exchange of Product model data (STEP) has long provided a reliable format for the exchange of data in product development processes. As its descriptive language EXPRESS lacks the richness to describe formal semantics for more complex knowledge representation with both geometry (such as size and shape) and non-geometry (such as function and behavior) information (Barbau et al., Reference Barbau, Krima, Rachuri, Narayanan, Fiorentini, Foufou and Sriram2012; Qin et al., Reference Qin, Lu, Qi, Liu, Zhong, Scott and Jiang2017; Kwon et al., Reference Kwon, Monnier, Barbau and Bernstein2020), many researchers suggest to convert STEP models to Web Ontology Language (referred as OWL) models to explicitly express, represent, and exchange semantic information. To assist the conversion process, this paper proposes a method that automatically locates the corresponding language elements of EXPRESS and OWL based on semantic analysis of the official reference manuals. It could also be applied to the identification of OWL segments for the EXPRESS declarations on the data model level.

Product data sharing and exchange supports collaborative product design and development (Eslami et al., Reference Eslami, Lakemond and Brusoni2018; Andres et al., Reference Andres, Poler and Sanchis2021). To enable this exchange, the International Organization for Standardization (ISO) has developed STEP, an international standard referenced as ISO 10303, to provide a mechanism capable of describing products, independent of any particular product modeling environment. The product data models in STEP are described by EXPRESS, a formal information requirement specification language which consists of language elements allowing unambiguous data definitions and their constraints. EXPRESS has been implemented in many computer-aided systems (e.g., CAD, CAE, and CAM) and product data/lifecycle management systems.

To include more semantic information, researchers have suggested using models described in OWL. OWL proposed by the World Wide Web Consortium (W3C) is an ontology language for the Semantic Web with a formally defined meaning. It is designed to represent rich and complex knowledge about objects and their relations by using classes, properties, individuals, and data values. Product data models based on OWL have been applied to enhance the semantic interoperability in the manufacturing industry (Ramos, Reference Ramos2015; Alkahtani et al., Reference Alkahtani, Choudhary, De and Harding2019; Fraga et al., Reference Fraga, Vegetti and Leone2020).

To translate EXPRESS-based models to OWL ontologies could enhance product data sharing and exchange with rich semantic information, but this process requires the mapping between the two languages. Some researchers have suggested solutions, for example, Krima et al. (Reference Krima, Barbau, Fiorentini, Sudarsan and Sriram2009), Barbau et al. (Reference Barbau, Krima, Rachuri, Narayanan, Fiorentini, Foufou and Sriram2012), and Pauwels and Terkaj (Reference Pauwels and Terkaj2016). However, most research is based on the experts' understanding of the two languages and finds the mapping relations manually. The experts have to look up the language elements among hundreds of items, which is an effort intensive and error prone mapping process.

The mapping process identifies the pairs of language elements with similar usage from the two modeling languages, respectively, the definitions of which are all described in the language manuals. In this sense, the mapping is a semantic matching problem of the texts in the two language manuals. The semantic matching also applies to the matching between the data models built upon the language elements. This can be handled by natural language processing (NLP) techniques. This paper proposes an EXPRESS-to-OWL 2 framework, which analyzes the similarities in terms of three aspects, that is heading semantics, text semantics, and heading hierarchy. A method combining Random Walks (RW) and Global Vectors for Word Representation (GloVe) models was designed to calculate the heading similarities. A model based on Decoding-enhanced BERT with disentangled attention (DeBERTa) is suggested for text similarity computation. The similarities from heading, text semantics, and heading hierarchy are then aggregated for an overall similarity of a pair of language elements by an Analytic Hierarchy Process–Simple Additive Weighting (AHP-SAW) method. The top four potential mapping elements of OWL 2 are provided to support the conversion from EXPRESS to OWL, which narrows down the look-up scope. The application shows that the results cover most already identified mappings in literature along with also providing new mappings. The proposed model is also applied to mapping on the data model level. Data model consists of EXPRESS declarations or OWL expressions, which provides the template and governs the format of product data in a physical file. An example of a linear bearing in stamping outer ring is used to illustrate the identification of OWL expressions for EXPRESS declarations.

The rest of the paper is organized as follows. Section “Related works” discusses the literature on the mapping between EXPRESS and OWL 2. Section “Fundamentals for method development” introduces the research approach and the fundamental elements of the proposed method that is the language manuals and RW, GloVe, and DeBERTa models. The proposed framework is presented in Section “Multiple similarity-based method for mapping”, followed by its evaluation in Section “Evaluation and mapping results”. Section “Conclusion” concludes the paper.

Related works

The conversion of EXPRESS-based models to OWL ontologies enhances the exchange of semantic information. Researchers have proposed mapping rules mainly based on a manual process to link the languages elements. The earliest work that we have found is the research of Schevers and Drogemuller (Reference Schevers and Drogemuller2005), which takes Industry Foundation Classes (IFC), a particular EXPRESS schema, as an example and converts several EXPRESS elements to OWL, elements such Entity to Class and List to List. Later more structured mapping approaches are proposed by Agostinho et al. (Reference Agostinho, Dutra, Jardim-Gonçalves, Ghodous and Steiger-Garção2007), Beetz et al. (Reference Beetz, van Leeuwen and de Vries2008), and Krima et al. (Reference Krima, Barbau, Fiorentini, Sudarsan and Sriram2009), where EXPRESS elements are grouped into Schema, Entity, Attribute, simple, constructed, and aggregated data types and the corresponding OWL elements are presented for the EXPRESS elements under each group. Following Beetz's work in transforming IFC to an “ifcOWL” ontology, Terkaj and Šojić (Reference Terkaj and Šojić2015) and Pauwels et al. (Reference Pauwels, Krijnen, Terkaj and Beetz2017) provide a more detailed conversion procedure. These efforts on developing ifcOWL keep the resulting OWL ontology as close as possible to the original EXPRESS schema of IFC (Pauwels et al., Reference Pauwels, Krijnen, Terkaj and Beetz2017) and have been used to build knowledge-based models for storing geometric information (Farias et al., Reference Farias, Roxin and Nicolle2018; González et al., Reference González, Piñeiro, Toledo, Arnay and Acosta2021; Wagner et al., Reference Wagner, Sprenger, Maurer, Kuhn and Rüppel2022). Different from ifcOWL, “OntoSTEP” from Krima et al. (Reference Krima, Barbau, Fiorentini, Sudarsan and Sriram2009) serves a broader scope, based on which Barbau et al. (Reference Barbau, Krima, Rachuri, Narayanan, Fiorentini, Foufou and Sriram2012) developed a plug-in for ontology editor Protégé. OntoSTEP integrates the concepts and semantic relationships into the geometry-enhanced ontology model and has been used to build a semantic-rich product model for a range of applications, such as program-generation for robotic manufacturing systems (Zheng et al., Reference Zheng, Xing, Wang, Qin, Eynard, Li, Bai and Zhang2022), knowledge graph establishment for product quality assurance (Kwon et al., Reference Kwon, Monnier, Barbau and Bernstein2020), and feature extraction for assembly sequence planning (Gong et al., Reference Gong, Shi, Liu, Qian and Zhang2021).

Different researchers have applied slightly different interpretations to the mapping. As illustrated in Table 1, the first column lists the main language elements of EXPRESS and the remaining columns show the corresponding OWL elements proposed by different researchers. “–” denotes an unaddressed element. It can be seen that some conversions have converged such as Entity to Class, while others are handled differently. For example, Agostinho et al. (Reference Agostinho, Dutra, Jardim-Gonçalves, Ghodous and Steiger-Garção2007) consider direct conversion for simple data types, as they have equivalents in OWL like string to string. However, this is not always the case. Number and real are represented in slightly different ways in OWL and Krima et al. (Reference Krima, Barbau, Fiorentini, Sudarsan and Sriram2009) translated them into decimal and double. Instead of a one-to-one mapping, Beetz et al. (Reference Beetz, van Leeuwen and de Vries2008) and Terkaj and Šojić (Reference Terkaj and Šojić2015) create new classes for these types by setting properties. The difference of the mapping approaches lies in the different understanding and choices of the researchers. In addition, it seems that some elements are not covered. For example, there are also logical operators (NOT, AND, OR, and XOR) in EXPRESS, which are not mentioned in the reviewed research.

Table 1. The mapping results of the existing research

The conversion requires the researchers to learn the whole reference manuals, keep the concepts in mind, and be able to recall where the corresponding elements are located in the reference manuals. This process not only requires effort but is also prone to misunderstandings. A computerized method could address this by automatically locating the corresponding language element pairs of EXPRESS and OWL 2. However, only limited research exists on automatic mapping. This paper aims to fill this gap by proposing a method based on NLP techniques to locate the language elements and further to match the expressions based upon the languages.

Fundamentals for method development

NLP techniques enable building an intelligent method to support the conversion from EXPRESS to OWL and reduce the effort of manually mapping. This section first describes the steps of the research, and how the two language reference manuals were analyzed to show the suitability of applying NLP techniques. At the end, the NLP models used in this research are introduced.

Overview of the research approach

As illustrated in Figure 1, this research started with identifying the research topic through examining the literature, and then built the translation method based on NLP techniques followed by evaluations. Figure 1 shows the logical sequence of the steps, the research was carried out with iterations where the evaluation led to improvement in the method.

Figure 1. Research overview.

The method development started with the analysis of the two languages and the analysis of the structure of reference manuals. This informed the selection of NLP techniques. Figure 2 gives an overview of the steps taken in developing the method.

Figure 2. The steps of developing the mapping method.

The method evaluation and application was on three levels:

  • Level 1: evaluates the performance of the method . The Pearson correlation coefficient and the Spearman correlation coefficient were used to measure the capability of NLP-based models for calculating the similarity. Three datasets (MC, RG, and Agirre) were used for word mappings and the STSBenchmark dataset for sentences mapping (see Sections “The combined RW and GloVe models” and “The DeBERTa ensemble”).

  • Level 2: checks the mapping results of the language elements of EXPRESS and OWL were carried out and compared to the literature (see Section “The mapping results of language elements and discussion”).

  • Level 3: applies the proposed method to a data model which presents the mapping results for 12 EXPRESS declarations about high-level product definition (see Section “Application to the mapping of data models”).

The language reference manual analysis

The mapping process finds the pairs of similar language expressions, and thus, the documents that describe the definitions and usage of the two languages are analyzed. The language reference manuals compared in this research are “Industrial automation systems and Integration – Product data representation and exchange – Part 11: Description methods: The EXPRESS language reference manual” (ref. no. ISO 10303-11:2004(E)) and “OWL 2 Web Ontology Language Structural Specification and Functional – Style Syntax” (2nd Edition). Both documents define the languages, specifying their elements in terms of representation, meaning, and usage. The two manuals are the input of the proposed method for the semantic similarity analysis.

The EXPRESS manual consists of 16 sections and the OWL 2 manual has 11 sections, excluding the appendix. Section/subsection headings and the descriptive text under these headings are analyzed for each language and subsequently compared. The headings consist of words and short phrases, reflecting not only the essence of the sections’ content but also the language elements that allow an unambiguous data definition and specification of constraints to construct the syntax. For example, both headings “Data types” in the EXPRESS document and “Datatype Maps” in the OWL 2 document show that the sections define the data types. Their subheadings “String data type” and “Strings”, respectively, describe a particular data type – String. The semantic similarity of the heading directly points out the mapping between the elements.

The descriptive text under each section/subsection describes the language elements in detail. Take subtype in EXPRESS and subclass in OWL 2 for example. From the two words, it is not obvious that the two can be mapped to each other. The following two paragraphs from the manuals suggest a possible mapping because both indicate a “child–parent” hierarchical relation.

EXPRESS allows for the specification of entities as subtypes of other entities, where a subtype entity is a specialization of its supertype. This establishes an inheritance (that is, subtype/supertype) relationship between the entities in which the subtype inherits the properties (that is, attributes and constraints) of its supertype. – from section 9.2.3 of EXPRESS manual

A subclass axiom SubClassOf (CE1 CE2) states that the class expression CE1 is a subclass of the class expression CE2. Roughly speaking, this states that CE1 is more specific than CE2. Subclass axioms are a fundamental type of axioms in OWL 2 and can be used to construct a class hierarchy. – from section 9.1.1 of OWL 2 manual

Both the headings and the text can be analyzed by NLP techniques but need different models, because headings are short whereas text consists of sentences. An examination of the manuals revealed that for the elements with similar functions their headings are also at similar levels of hierarchy, for example subtype and subclass. The heading hierarchy is, therefore, used as an extra dimension for mapping analysis.

RW and GloVe for heading similarity analysis

The headings of the manual sections are composed of single or few words. The semantic matching of two headings can be handled by NLP models for word similarity tasks. Random Walks (RW) (Goikoetxea et al., Reference Goikoetxea, Soroa and Agirre2015) and Global Vectors for Word Representation (GloVe) (Pennington et al., Reference Pennington, Socher and Manning2014) are two such models with outstanding performance.

The RW model

The RW model is obtained through two stages, that is establishing the corpus by random walk in WordNet (Miller, Reference Miller1995) and training on the established corpus using Word2Vec (Mikolov et al., Reference Mikolov, Chen, Corrado and Dean2013). WordNet is a large lexical database of English designed at Princeton University, which is a network of meaningfully related words and concepts. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets) and these synsets are further interrelated by means of conceptual-semantic and lexical relations. In this sense, the adjacent words in the established corpus from WordNet are interlinked with semantics and thus the word vectors trained on the corpus can better capture the semantic relationship between words. The random walk algorithm for establishing the corpus is described in Algorithm 1 based on Goikoetxea et al. (Reference Goikoetxea, Soroa and Agirre2015).

Two neural network models from Word2Vec, that is CBOW and Skip-gram, are then trained with several iterations of random walks over WordNet (Mikolov et al., Reference Mikolov, Chen, Corrado and Dean2013). CBOW predicts the current word based on the context, whereas Skip-gram predicts surrounding words given the current word (Mikolov et al., Reference Mikolov, Chen, Corrado and Dean2013).

The GloVe model

The GloVe model produces a word vector space with meaningful substructure through training on global word-word co-occurrence counts. It is based on the simple observation that ratios of word-word co-occurrence probabilities have the potential for encoding some form of meaning. The training objective of GloVe is to learn word vectors of high consistency with the established matrix of word-word co-occurrence counts. The model is established in three stages:

  • Stage 1: construct the co-occurrence matrix, denoted X = [Xij], whose entry Xij is the number of times word j occurs in the context of word i. A decreasing weighting function is used to reflect the influence of word distance on the contribution of the words’ relationship to one another. In other words, very distant word pairs are expected to contain less relevant information and thus word pairs that are m words apart contribute 1/m to the total count.

  • Stage 2: establish the symmetry exchange relationship between word vectors and the co-occurrence matrix. Equation (1) approximately expresses this relationship where wi and $\tilde{w}_j$ are the word vectors to be learned, and bi and $\tilde{b}_j$ are the bias for wi and $\tilde{w}_j$, respectively, to restore the symmetry:

    (1)$$w_i^T \tilde{w}_j + b_i + \tilde{b}_j = \log ( X_{ij}). $$
  • Stage 3: construct and optimize the loss function and trains the model. The loss function as Eq. (2) is derived from Eq. (1) where V is the size of the vocabulary and f(Xij) is the weighting function. AdaGrad (Duchi et al., Reference Duchi, Hazan and Singer2011), a subgradient method, is used to train the model through optimizing the loss function. The sum wi and $\tilde{w}_j$ are the resulting word vectors:

    (2)$$J = \sum\limits_{i, j = 1}^V {\,f( X_{i, j}) } ( w_i^T w_j + b_i + b_j-\log ( X_{i, j}) ) ^2.$$

DeBERTa for text similarity analysis

DeBERTa (He et al., Reference He, Liu, Gao and Chen2021) is a pre-trained neural language model, which enhances the BERT (Devlin et al., Reference Devlin, Chang, Lee and Toutanova2019) model using two novel techniques: a disentangled attention mechanism and an enhanced mask decoder. Two main stages can be summarized for establishing the model.

  • Stage 1: improve the architecture of BERT. The disentangled attention mechanism makes use of relative positions as well as the contents of a word pair. It represents each word in two vectors encoding its content and position, and computes the attention weights among words using disentangled matrices. The disentangled self-attention with relative position bias is calculated using Eq. (3). $\tilde{A}_{i, j}$ is the attention score from token i to token j. Qic and Kjc are the i-th and j-th rows of Qc and Kc which are two projected content vectors. Qδr (i.j) and Kδr (j.i) are the δ(i,j)-th and δ(j,i)-th rows of Qr and Kr which are two projected relative position vectors, regarding the relative distances δ(i,j) and δ(j,i).

    (3)$$\eqalign{ {\widetilde{A}} _{i, j} &= Q_i^c K_j^{cT} + {Q_i^c K_{\delta ( i, j) }^r}^T + Q_{\delta ( j, i) }^r K_j^{cT}, \cr H_o &= {\rm softmax}\left({\displaystyle{{\widetilde A } \over {\sqrt {3{d}} }}} \right)V_c.} $$
  • As the decoding component of DeBERTa, the enhanced mask decoder then captures the absolute positions of words as complementary information when decoding the masked words. In this way, both relative and absolute positions as well as contents are used in the training stage.

  • Stage 2: train the model. A virtual adversarial training algorithm, namely Scale-invariant-Fine-Tuning, is used for the training, which applies the perturbations to the normalized word embeddings to improve the stability.

Multiple similarity-based method for mapping

Based on the NLP models, this paper proposes a framework for language element mapping between EXPRESS and OWL 2, which consists of five parts as shown in Figure 3.

Figure 3. The proposed framework for mapping EXPRESS and OWL.

The details of the following concepts are presented in this section:

  • Document preprocessing shortens the long sentences in the two language manuals and prepares the inputs for the NLP models.

  • Heading similarity analysis calculates the semantic similarities of the headings in both manuals using RW and GloVe.

  • Text similarity analysis calculates the semantic similarities of the text under each section/subsection where a DeBERTa ensemble is suggested.

  • Heading hierarchical similarity analysis calculates the similarities of the heading levels.

  • Similarity aggregation generates the overall similarity scores of the language elements, combined with the weights of different types of similarities.

Document preprocessing

In both manuals, language elements are mainly described by text. There are also figures, tables, and codes which are mostly examples to help understanding. These have been removed from the documents for this research.

Each section or subsection targets a particular language element type or language element. The length of the text for each section/subsection varies by up to more than 500 words (seen in Fig. 4a). This increases the difficulty of semantic similarity calculation by NLP models, as a very long text could lead to inaccurate semantic extraction. To make the length of each section/subsection text in the two manuals relatively short and balanced, this research adopts Lead-3 algorithm to extract the representative text without losing semantic information. Lead-3 is a baseline method for text summarization and relies on the observation of the semantic distribution of the text that the first three sentences of a text can generally summarize the main semantic information of the whole text. Lead-3 was chosen because it is simple to implement and capable of obtaining effective results (Nallapati et al., Reference Nallapati, Zhai and Zhou2017). After processing the manuals, most of the texts are within 100 words, as shown in Figure 4b.

Figure 4. Words count of (a) before and (b) after text summarization.

The preprocessed documents from the two manuals are used for analyzing semantic similarities. The words of the headings are collected as input to the heading similarity analysis model; the summarized paragraphs are the input to the text similarity analysis model; and heading levels are the input to the heading hierarchical similarity analysis model.

Analysis of heading similarity based on RW and GloVe

Two aspects of a language element impact the translation of manual headings:

  • Keywords: RW focuses on analyzing words through their embedded context. For example, both subtype and subclass indicate a hierarchical relation.

  • Synonyms: GloVe performs well on these word analogies, such as subtype and subclass or entity and class.

To include both aspects, this paper combines RW and GloVE to analyze the semantic similarities of headings. The headings from both documents are the input to the RW and GloVe models. Two word-embedding matrices are then generated, denoted W RW and W GloVe, respectively. Given two words wi and wj, their similarity is calculated as follows.

  • Step 1: locate the word in the word-embedding matrix. As Eq. (4) indicates, the word vectors of the two input words are found from the matrices W RW and W GloVe.

    (4)$$\eqalign{& vec_i^{{\rm RW}} = W^{{\rm RW}}( w_i) , \;vec_j^{{\rm RW}} = W^{{\rm RW}}( w_j) , \;\cr & vec_i^{{\rm GloVe}} = W^{{\rm GloVe}}( w_i) , \;vec_j^{{\rm GloVe}} = W^{{\rm GloVe}}( w_j), } $$

where veci RW and vecj RW are the word vectors of wi and wj in the matrix W RW, and veci GloVe and vecj GloVe are the word vectors of wi and wj in the matrix WGloVe.

  • Step 2: compute the similarity in the individual model. For element i in EXPRESS and element j in OWL 2, two semantic similarity values of comparing their headings are computed by Eq. (5). sim RW and sim GloVe denote the values from the RW model and GloVe model, respectively.

    (5)$$\eqalign{& sim_{_{{\rm RW}} }^{ij} = {\rm cosin}( vec_i^{{\rm RW}} , \;vec_j^{{\rm RW}} ), \cr & sim_{_{{\rm GloVe}} }^{ij} = {\rm cosin}( vec_i^{{\rm GloVe}} , \;vec_j^{{\rm GloVe}} ), } $$

where cosin(vi,vj) is the function calculating the cosine similarity between the two vectors vi and vj.

  • Step 3: compute the final heading similarity by averaging the two semantic similarity values from step 2.

    (6)$${ HSim}_{ij} = \displaystyle{1 \over 2}( {sim}_{{\rm RW}}^{ij} + {sim}_{_{{\rm GloVe}} }^{ij} ). $$

When there are multiple words in the headings, the word vectors in step 1 are the averaged vectors of all the words.

Text similarity analysis based on DeBERTa

The text under the headings explains the language elements in terms of their definitions, usage, and characteristics. After preprocessing the documents with the Lead-3 algorithm, the length of the text in each section/subsection is within 100 words. Similarity analysis on sentence level is carried out with a DeBERTa model. DeBERTa model performs well in classification and regression tasks but tends to produce overconfidence predication when handling samples out of the distribution. To overcome this problem and produce accurate text similarity results, the DeBERTa ensemble proposed by Jian and Liu (Reference Jian and Liu2021) is applied, which adds a predication layer to the original DeBERTa. This additional layer is setup with two neurons, the outputs of which are the mean of the predicated values and the variance. To enhance the discrimination of learned results, the negative log-likelihood loss function shown as Eq. (7) is employed, where σ2 is the variance, μ is the mean of the predicated values, and y is the actual labeled value.

(7)$${\rm NLL} = \displaystyle{{\log \sigma ^2} \over 2} + \displaystyle{{( y-\mu ) } \over {2\sigma ^2}}^2.$$

The models with added layer are further gathered to build an ensemble to improve the reliability, which are trained with the same data but set with different initial weights. As illustrated in the framework (referred to Fig. 3), the ensemble consists of five models of the same structure to balance the consumed resource and the expected performance. In this case, the text similarity is the aggregated result of the five models, as shown in Eq. (8). TSimtij is the similarity score of the two sentences i and j from the t-th model.

(8)$${TSim}_{ij} = \displaystyle{1 \over 5}\sum\limits_{t = 1}^5 {{TSim}_t^{ij} }. $$

Heading hierarchical similarity analysis

The heading hierarchy reflects the granularity of language elements. According to the examination of the documents, if the functions of a pair of elements from the two languages are similar, their headings are usually at similar levels. In this case, this paper suggests taking the heading hierarchical similarity into consideration for the mapping between EXPRESS and OWL 2. The analysis follows three steps.

  • Step 1: obtain the heading hierarchy sets. Through the statistics of the official documents, two sets (denoted headinglevelEXPRESS for EXPRESS and headinglevelOWL for OWL 2 separately) are established, which consists of the heading levels of all the sections/subsections.

  • Step 2: compute the heading level difference. Given element i in EXPRESS and element j in OWL 2, the difference is computed by Eq. (9), where headinglevelEXPRESS[i] and headinglevelOWL[j] are the heading levels of the sections that describe element i and j, respectively.

    (9)$${diff}_{ij} = \vert {headinglevel_{EXPRESS}[ i] -headinglevel_{OWL}[ j] } \vert. $$
  • Step 3: calculate the heading hierarchical similarity. Linear normalization is used to limit the value to [0,1]. The similarity is calculated by Eq. (10) where d+ is the maximum level difference and d is the minimum level difference.

    (10)$${LSim}_{ij} = \displaystyle{{d^ + {-}{diff}_{ij}} \over {d^ + {-}d^-}}.$$

Similarity aggregation based on fuzzy AHP-SAW

The three types of similarity values contribute to the overall similarity between the language elements, but their contributions are not of equal importance. Thus, their weights are determined by Analytic Hierarchy Process (AHP) (Saaty, Reference Saaty1980) first and then aggregated by a simple additive weighting (SAW) method. Both AHP and SAW are multiple criteria decision-making (MCDM) methods. AHP calculates the weights by pairwise comparison which makes it outperform most other methods. SAW produces an aggregated result by combining the weights of the criteria and the values under the criteria. It is a simple and practical method, which offers a transparent and understandable calculation process for the ranking results (Kaliszewski and Podkopaev, Reference Kaliszewski and Podkopaev2016; Wang, Reference Wang2019). The following steps calculate the weights of the three types of similarities and the overall similarity values of the language elements. Steps 1 and 2 come from AHP, while step 3 draws on SAW.

  • Step 1: establish the similarity pairwise comparison matrix F = [cst]n × n. n is the number of types of similarities (n = 3 in our case), and sst is a value from 1 to 9 representing the relative importance of type s over type t.

  • Step 2: obtain the weights of the similarity types. The weight of similarity type i is computed by Eq. (11) where geometric mean is applied.

    (11)$$w_s = \displaystyle{{\root n \of {\prod\nolimits_{t = 1}^n {c_{st}} } } \over {\sum\nolimits_{s = 1}^n {\root n \of {\prod\nolimits_{t = 1}^n {c_{st}} } } }}.$$
  • Step 3: calculate the overall similarity values. Given element i in EXPRESS and element j in OWL 2, their similarity value is calculated by the SAW method shown in Eq. (12). The result is further translated by Eq. (13) with the three types of similarity in this research, where w head, w text, and w level are the weights of heading, text, and hierarchy, respectively.

    (12)$${OSim}_{ij} = \sum\limits_{s = 1}^n {w_s \times {sim}_s^{ij} }, $$
    (13)$${OSim}_{ij} = w_{{\rm head}}{HSim}_{ij} + w_{{\rm text}}{TSim}_{ij} + w_{{\rm level}}{LSim}_{ij}.$$

Evaluation and mapping results

To test the feasibility of elements mapping, the two semantic analysis models are evaluated first. This section then presents the mapping results from the proposed method and compares them with those from the existing research. The approach is also applied to the data model of a linear bearing.

The combined RW and GloVe models

As the headings in the language manuals consist of nouns, three datasets targeted at the similarities of nouns were used to verify the performance of the model:

Tables 2 and 3 show the Pearson correlation coefficient and the Spearman correlation coefficient obtained on the three datasets. The two indicators are often used to measure the capability of NLP models for calculating the similarity. The last column lists the mean value. With a Pearson correlation value of 0.854 and a Spearman correlation value of 0.880, it can be seen that the combined model performs better than individual RW and GloVe on the three datasets of nouns.

Table 2. Pearson correlation results

Table 3. Spearman correlation results

The DeBERTa ensemble

The efficiency of the DeBERTa ensemble for text similarity has been proved by comparing with BERT, DeBERTa, and M-MaxLSTM-CNN (Tien et al., Reference Tien, Le, Tomohiro and Tatsuya2019) models on STSBenchmark dataset. This dataset contains 8628 labeled sentence pairs which have been extracted from image captions, news headlines, and user forums (Cer et al., Reference Cer, Diab, Agirre and Specia2017). Each sentence pair is labeled with a score from 0 to 5, denoting how similar the two sentences are in terms of semantic meaning.

Table 4 shows the Spearman correlation coefficient and Pearson correlation coefficient scores. It can be seen that the performance of the proposed model is close to DeBERTa and better than the rest two models.

Table 4. The scores of the DeBERTa ensemble

The mapping results of language elements and discussion

Table 5 shows the mapping results of EXPRESS to OWL 2 by applying the proposed multiple similarity-based method. The first column lists the language elements of EXPRESS. The second column presents the OWL 2 mappings from Krima et al. (Reference Krima, Barbau, Fiorentini, Sudarsan and Sriram2009) and Barbau et al. (Reference Barbau, Krima, Rachuri, Narayanan, Fiorentini, Foufou and Sriram2012). Their work provides the most mappings among the reviewed research (as seen in Table 1) and thus was chosen by this research to validate the method. The third column presents the four OWL sections of the top similarity scores as the mapping candidates. The remaining columns show the similarity scores of the headings (HSim), text (TSim), heading hierarchies (LSim), and the overall (OSim) with their weights below. Three digits are shown in Table 5, but more were used for calculation.

Table 5. Mapping results for simple elements

For some EXPRESS elements that express complex information, for example bag and set, no one-to-one mapping in OWL exists and a combination of several OWL elements is required. The existence of multiple candidates makes the combination feasible. The results are shown in Table 6.

Table 6. Mapping results for complex elements

It can be seen from the tables that most mapped OWL 2 elements from Krima's work are covered by the top four identified candidates of this research (as highlighted in bold) except array, select, and AND. Their mapping candidate sections have a lower rank. The method also finds the potential mapping sections for elements that are not included in Krima, as highlighted in italics in the table. For example, the top candidate for 4.5 Binary Data was the language element xsd:hexBinary locates. The mappings of INVERSE and CONSTANT are another two examples. For the logical data type, despite no feasible mapping in the first four candidate sections, the follow-up candidate “4.4 Boolean Values” provides a solution that both have TRUE and FALSE values as their domain. Because OWL 2 follows the Open World Assumption that every undefined item is considered as unknown, which corresponds to the third domain value of logical data type in EXPRESS, that is UNKNOWN.

Application to the mapping of data models

The language elements are used to establish data models. ISO 10303 provides a representation of product information with EXPRESS declarations to enable data exchange. For example, ISO 10303-41: Integrated generic resources: Fundamentals of product description and support (Part41 for short hereafter) specifies the generic product description resources, generic management resources, and support resources. The following entity is extracted from Part41. It is a collector of definitions of a product.

ENTITY product_definition_formation;

id : identifier;

description : OPTIONAL text;

of_product : product;

UNIQUE

UR1 : id, of_product;

END_ENTITY;

OWL has no standards for their definitions of product data. This allows users to define their own models but also leads to the difficulty of mapping an EXPRESS data model to an OWL data model. A solution is to extract the parts with similar semantics to EXPRESS declarations from OWL files. This research applies the proposed model to comparing the similarities in an example of a linear bearing in stamping outer ring (see the right upper part of Fig. 5). The STEP file was downloaded from an online CAD part database – LinkAble PARTcommunity (https://linkable.partcommunity.com/3d-cad-models/sso/khm-%E5%86%B2%E5%8E%8B%E5%A4%96%E5%9C%88%E5%9E%8B%E7%9B%B4%E7%BA%BF%E8%BD%B4%E6%89%BF-%E4%B8%8A%E9%9A%86samlo?info=samlo%2Flinear_bushings_ball_retainers%2Fkhm.prj&cwid=7526), which describes the category and geometry of a linear bearing. Its description conformed to the predefined data models provided by ISO 10303 series standards. The file was encoded in the format of ISO 10303-21: Implementation methods: Clear text encoding of the exchange structure (Part 21 for short), which is described according to ISO 10303-203: Application protocol: Configuration controlled design. ISO 10303-203 specifies an application protocol for exchanging configuration-controlled 3D product design data of mechanical parts and assemblies. Other parts of ISO 10303 such as Part41 constitute provisions of this standard. Figure 5 shows a segment of the file. This research takes use of the EXPRESS declarations in the example file, for example entities such as product_definition_formation as inputs of EXPRESS side.

Figure 5. A segment of the STEP file of the example product bearing.

The ontologies of the part were defined in OWL. Since there is no fixed format to describe a product, different descriptions could exist. For example, property of id: identifier can either be defined in OWL as an object data property linking to a class named identifier (as indicated in Fig. 6a) or as a data property in string type (as indicated in Fig. 6b). This research takes the two types of definition as the inputs from the OWL side.

Figure 6. Examples OWL expressions for Entity product_definition_formation: (a) using object property and (b) using data property.

The proposed model was run to match the segments in OWL to the EXPRESS declarations. In the first round, the OWL file defined in the way of Figure 6a was considered. Table 7 shows the OWL segments of top 4 similarities for entity product_definition_formation and entity product_definition_formation_with_specified _source. Appendix A presents the mapping results for 12 EXPRESS declarations about high-level product definition. In the second round, the OWL file defined as Figure 6b was considered (the result is listed in Appendix B). The results indicate that the best matched OWL segments have the highest similarities. In most cases, the parent-class or sub-class (e.g., product_definition_formation to product_definition_formation_with_specified_source or vice versa) or sibling-class (product_definition_context and product_concept_context) are identified as the secondary mapping. This is followed by those classes with similar properties, for example product_definition_formation and product both have property id and the former is also linked to the later via property of_product.

Table 7. Mapping result of OWL segments to two EXPRESS entities in the example

Conclusion

Converting EXPRESS-based models to OWL described models helps transfer product data with semantic information. The conversion is usually based on manually mapping the language elements in EXPRESS and OWL 2. To our knowledge, this paper is the first attempt to locate potential language elements automatically by applying NLP techniques. The proposed method first analyzes the semantic similarities of section headings in the two language reference manuals using a combination of RW and GloVe models, and then calculates the semantic similarities of the text in each section/subsection by the DeBERTa ensemble. The heading hierarchical similarity is also considered. The three types of similarities are aggregated for an overall score by the AHP-SAW method that combines the similarity values with their weights. The results not only cover the currently mapped language elements, but also identify the mappings of elements that have not previously been included.

The proposed method narrows down the look-up scope by providing four potential candidates of OWL language elements for EXPRESS language elements, which reduces the effort in finding the mapping manually and the potential mistakes from different human understandings. The ideal solution is to provide a one-to-one mapping result however, this is a hard task due to the discrepancies between the inheritance mechanism of EXPRESS and OWL. The discrepancies can also be found when translating the properties of an EXPRESS entity on the data model level. OWL could handle the properties either as object properties by adding new classes or as data properties by linking to appropriate built-in datatypes. More accurate and appropriate mappings might be achieved by training the models with more datasets and refining the NLP models to diminish the discrepancies. The underlying principle of the proposed models is to train them first by telling what items (words, phrases, sentences, and paragraphs) are similar and to what extent they are similar. For example, Dataset SemEval-2014 Task 3 can be used for semantic similarity across different lexical levels, including paragraph to sentence, sentence to phrase, phrase to word, and word to sense (Jurgens et al., Reference Jurgens, Pilehvar and Navigli2014). The more example data sets are learned, the better results the models can produce, especially when the models are fed with various representation formats of similar expressions.

Though targeting at the mapping between EXPRESS and OWL, the two NLP models for semantic similarity analysis were trained with general English datasets. It indicates a potential generalized application so that the models might also be used for converting other descriptive modeling languages, for example converting XML to OWL. This could also consolidate product information expressed by different modeling languages. This paper focuses on finding the corresponding language elements, which is the fundamental step to convert a STEP model to an Ontology model. The full conversion requires replacing the statements defined by the language elements in particular format. The proposed model was also applied to a data model level. It shows the potential of identifying OWL expressions for EXPRESS declarations. This could help experts to extract the mapping patterns and then to design an inference engine for the translation. It could also enable the use of machine learning techniques where the program learns the conversion rules with the identified EXPRESS declarations and OWL expression, and further translates the data models or even the population of data models such as a physical STEP file. In this research, only the EXPRESS declarations in the example STEP file are mapped. More declarations will be extracted and tested in the future work.

Another limitation is that this research has not been applied to the population of data models. Theoretically, the model would work in the same way on this level. Because the free-form text strings consist of words and sentences, the NLP models are capable of analyzing the semantics. However, when translating the EXPRESS based models to OWL models of Layer 3, these free form text strings should be kept as they are. For example, the following is from Part 21 file of the illustrative example in the paper.

#171 = APPLICATION_CONTEXT(“configuration controlled 3D designs of mechanical parts and assemblies”)

The EXPRESS declaration of APPLICATION_CONTEXT is:

ENTITY application_context;

application : text;

INVERSE

context_elements : SET [1:?] OF application_context_element

FOR frame_of_reference;

END_ENTITY;

The OWL expression is:

  • Declaration(Class(:application_context))

  • Declaration(DataProperty(:application_context_hasApplication))

  • DataPropertyDomain(:application_context_hasApplication : application_context)

  • DataPropertyRange(:application_context_hasApplication xsd:string)

  • Declaration(ObjectProperty(:application_context_ hasContextElement))

  • ObjectPropertyDomain(:application_context_hasContextElement

  • :application_context)

  • ObjectPropertyRange(:application_context_hasContextElement ObjectMinCardinality(1 :application_context_ hasContextElement

  • :application_context_element))

  • InverseObjectProperties(:application_context_element_ hasFrameOfReference

  • :application_context_hasContextElement)

When translating this “#171” instance of APPLICATION_CONTEXT to OWL, an individual should be created first and then the text “configuration controlled 3D designs of mechanical parts and assemblies” should be kept and filled as data property.

  • Declaration(NamedIndividual(<khm-14:application_context>))

  • ClassAssertion(<application_context><khm-14:application_context>)

  • DataPropertyAssertion(<application_context_hasApplication>

  • <khm-14:application_context>“configuration controlled 3D designs of mechanical parts and assemblies”^^xsd:string))

The mappings on the population of data models would also be an interesting piece of future work.

Acknowledgements

The authors would like to thank the anonymous reviewers and the editors for the valuable comments that helped them in improving the quality of this article.

Funding

This work was supported by the National Natural Science Foundation of China (Grant number: 62002031) and Scientific Research Start-up Fund of Shantou University (Grant number: NTF21042).

Competing interests

The authors declare none.

Yan Liu received the bachelor degree in software engineering from Harbin Institute of Technology, China; master degree in software engineering from Shanghai Jiaotong University, China; and the PhD degree from the School of Engineering and Innovation, The Open University, U.K. She is an associate professor at Shantou University, Guangdong, China. Her current research interests include intelligent decision-making, knowledge representation and acquisition, and product development process modelling.

Qingquan Jian received his master degree from the School of Computer Science and Technology, Changchun University of Science and Technology, China. He is working as an engineer in a technology company, Guangdong, China. His research work focuses on natural language processing techniques and their applications.

Claudia M. Eckert received the Ph.D. degree in design from The Open University, Milton Keynes, U.K., in 1997, and the M.Sc. degree in applied artificial intelligence from the University of Aberdeen, Aberdeen, U.K., in 1990. She is a professor of design at The Open University, Milton Keynes, U.K. Her research interests include understanding and supporting design processes, and in particular, engineering change and processes planning. She is also working on comparisons between design domains.

Appendix A

This section presents the mapping results for 12 EXPRESS declarations about high-level product definition where the OWL models are defined in the way of Figure 6a that the properties of the entity in EXPRESS are defined as object properties in OWL. For a clear presentation, the mapped OWL segments are labeled in Table A1 and the results are listed in Table A2.

Table A1. The label of OWL segments (in the format of Fig. 6a)

Table A2. The mapping results for 12 EXPRESS declarations

Appendix B

This section presents the mapping results for 12 EXPRESS declarations about high-level product definition are defined in the way of Figure 6b that the properties of the entity in EXPRESS are defined as data properties in OWL. For a clear presentation, the mapped OWL segments are labeled in Table B1 and the results are listed in Table B2.

Table B1. The label of OWL segments (in the format of Fig. 6b)

Table B2. The mapping results for 12 EXPRESS declarations

References

Agirre, E, Alfonseca, E, Hall, K, Kravalova, J, Pasca, M and Soroa, A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, Boulder, Colorado, pp. 19–27.CrossRefGoogle Scholar
Agostinho, C, Dutra, M, Jardim-Gonçalves, R, Ghodous, P and Steiger-Garção, A (2007) EXPRESS to OWL Morphism: Making Possible to Enrich ISO10303 Modules. London: Springer London.Google Scholar
Alkahtani, M, Choudhary, A, De, A and Harding, JA (2019) A decision support system based on ontology and data mining to improve design using warranty data. Computers & Industrial Engineering 128, 10271039.CrossRefGoogle Scholar
Andres, B, Poler, R and Sanchis, R (2021) A data model for collaborative manufacturing environments. Computers in Industry 126, 103398.CrossRefGoogle Scholar
Barbau, R, Krima, S, Rachuri, S, Narayanan, A, Fiorentini, X, Foufou, S and Sriram, RD (2012) OntoSTEP: enriching product model data using ontologies. Computer-Aided Design 44, 575590.CrossRefGoogle Scholar
Beetz, J, van Leeuwen, J and de Vries, B (2008) IfcOWL: a case of transforming EXPRESS schemas into ontologies. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 23, 89101.CrossRefGoogle Scholar
Cer, D, Diab, M, Agirre, E and Specia, L (2017) SemEval-2017 Task 1: semantic textual similarity multilingual and cross-lingual focused evaluation.CrossRefGoogle Scholar
Devlin, J, Chang, M-W, Lee, K and Toutanova, K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google Scholar
Duchi, J, Hazan, E and Singer, Y (2011) Adaptive subgradient methods for online learning and stochasticoptimization. Journal of Machine Learning Research 12, 21212159.Google Scholar
Eslami, MH, Lakemond, N and Brusoni, S (2018) The dynamics of knowledge integration in collaborative product development: evidence from the capital goods industry. Industrial Marketing Management 75, 146159.CrossRefGoogle Scholar
Farias, TMd, Roxin, A and Nicolle, C (2018) A rule-based methodology to extract building model views. Automation in Construction 92, 214229.CrossRefGoogle Scholar
Fraga, AL, Vegetti, M and Leone, HP (2020) Ontology-based solutions for interoperability among product lifecycle management systems: a systematic literature review. Journal of Industrial Information Integration 20, 100176.CrossRefGoogle Scholar
Goikoetxea, J, Soroa, A and Agirre, E (2015) Random walks and neural network language models on knowledge bases. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1434–1439.CrossRefGoogle Scholar
Gong, H, Shi, L, Liu, D, Qian, J and Zhang, Z (2021) Construction and implementation of extraction rules for assembly hierarchy information of a product based on OntoSTEP. Procedia CIRP 97, 514519.CrossRefGoogle Scholar
González, E, Piñeiro, JD, Toledo, J, Arnay, R and Acosta, L (2021) An approach based on the ifcOWL ontology to support indoor navigation. Egyptian Informatics Journal 22, 113.CrossRefGoogle Scholar
He, P, Liu, X, Gao, J and Chen, W (2021) DeBERTa: decoding-enhanced BERT with disentangled attention. 2021 International Conference on Learning Representations.Google Scholar
Jian, Q and Liu, Y (2021) A mapping method between EXPRESS and OWL based on text similarity analysis. 2021 International Conference on Electronic Information Engineering and Computer Science, IEEE.CrossRefGoogle Scholar
Jurgens, D, Pilehvar, MT and Navigli, R (2014) SemEval-2014 Task 3: cross-level semantic similarity. International Conference on Computational Linguistics.CrossRefGoogle Scholar
Kaliszewski, I and Podkopaev, D (2016) Simple additive weighting—A metamodel for multiple criteria decision analysis methods. Expert Systems with Applications 54, 155161.CrossRefGoogle Scholar
Krima, S, Barbau, R, Fiorentini, X, Sudarsan, R and Sriram, RD (2009) OntoSTEP: OWL-DL ontology for STEP. Gaithersburg, MD 20899, USA, National Institute of Standards and Technology, NISTIR 7561.CrossRefGoogle Scholar
Kwon, S, Monnier, LV, Barbau, R and Bernstein, WZ (2020) Enriching standards-based digital thread by fusing as-designed and as-inspected data using knowledge graphs. Advanced Engineering Informatics 46, 101102.CrossRefGoogle Scholar
Mikolov, T, Chen, K, Corrado, G and Dean, J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781.Google Scholar
Miller, GA (1995) Wordnet: a lexical database for English. Communications of the ACM 38, 3941.CrossRefGoogle Scholar
Miller, GA and Charles, WG (1991) Contextual correlates of semantic similarity. Language and Cognitive Processes 6, 128.CrossRefGoogle Scholar
Nallapati, R, Zhai, F and Zhou, B (2017) SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, California, USA: AAAI Press, pp. 3075–3081.CrossRefGoogle Scholar
Pauwels, P and Terkaj, W (2016) EXPRESS to OWL for construction industry: towards a recommendable and usable ifcOWL ontology. Automation in Construction 63, 100133.CrossRefGoogle Scholar
Pauwels, P, Krijnen, T, Terkaj, W and Beetz, J (2017) Enhancing the ifcOWL ontology with an alternative representation for geometric data. Automation in Construction 80, 7794.CrossRefGoogle Scholar
Pennington, J, Socher, R and Manning, CD (2014) GloVe: Global Vectors for Word Representation. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.CrossRefGoogle Scholar
Qin, Y, Lu, W, Qi, Q, Liu, X, Zhong, Y, Scott, PJ and Jiang, X (2017) Status, comparison, and issues of computer-aided design model data exchange methods based on standardized neutral files and web ontology language file. Journal of Computing and Information Science in Engineering 17, 010801.1010801.10.CrossRefGoogle Scholar
Ramos, L (2015) Semantic web for manufacturing, trends and open issues. Computers & Industrial Engineering 90, 444460.CrossRefGoogle Scholar
Rubenstein, H and Goodenough, JB (1965) Contextual correlates of synonymy. Communications of the ACM 8, 627633.CrossRefGoogle Scholar
Saaty, TL (1980) The Analytic Hierarchy Process: planning, Priority Setting, Resources Allocation. New York: McGraw.Google Scholar
Schevers, H and Drogemuller, R (2005) Converting the industry foundation classes to the Web Ontology Language. 2005 First International Conference on Semantics, Knowledge and Grid.CrossRefGoogle Scholar
Terkaj, W and Šojić, A (2015) Ontology-based representation of IFC EXPRESS rules: an enhancement of the ifcOWL ontology. Automation in Construction 57, 188201.CrossRefGoogle Scholar
Tien, NH, Le, NM, Tomohiro, Y and Tatsuya, I (2019) Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity. Information Processing & Management 56, 102090.CrossRefGoogle Scholar
Wagner, A, Sprenger, W, Maurer, C, Kuhn, TE and Rüppel, U (2022) Building product ontology: core ontology for linked building product data. Automation in Construction 133, 103927.CrossRefGoogle Scholar
Wang, Y-J (2019) Interval-valued fuzzy multi-criteria decision-making based on simple additive weighting and relative preference relation. Information Sciences 503, 319335.CrossRefGoogle Scholar
Zheng, C, Xing, J, Wang, Z, Qin, X, Eynard, B, Li, J, Bai, J and Zhang, Y (2022) Knowledge-based program generation approach for robotic manufacturing systems. Robotics and Computer-Integrated Manufacturing 73, 102242.CrossRefGoogle Scholar
Figure 0

Table 1. The mapping results of the existing research

Figure 1

Figure 1. Research overview.

Figure 2

Figure 2. The steps of developing the mapping method.

Figure 3

Figure 3. The proposed framework for mapping EXPRESS and OWL.

Figure 4

Figure 4. Words count of (a) before and (b) after text summarization.

Figure 5

Table 2. Pearson correlation results

Figure 6

Table 3. Spearman correlation results

Figure 7

Table 4. The scores of the DeBERTa ensemble

Figure 8

Table 5. Mapping results for simple elements

Figure 9

Table 6. Mapping results for complex elements

Figure 10

Figure 5. A segment of the STEP file of the example product bearing.

Figure 11

Figure 6. Examples OWL expressions for Entity product_definition_formation: (a) using object property and (b) using data property.

Figure 12

Table 7. Mapping result of OWL segments to two EXPRESS entities in the example

Figure 13

Table A1. The label of OWL segments (in the format of Fig. 6a)

Figure 14

Table A2. The mapping results for 12 EXPRESS declarations

Figure 15

Table B1. The label of OWL segments (in the format of Fig. 6b)

Figure 16

Table B2. The mapping results for 12 EXPRESS declarations