Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Automated Subject Identification Using the Universal Decimal Classification : The ANN Approach


Affiliations
1 Research Scholar, Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734014, West Bengal, India
2 Professor, Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734014, West Bengal, India
     

   Subscribe/Renew Journal


Universal Decimal Classification (UDC) is a popular controlled vocabulary that is used to represent subjects of documents. Text categorization determines a text's category, as evident from the notation-text label format of the Universal Decimal Classification. With the help of machine learning techniques and the Universal Decimal Classification (UDC), the present work aims to develop an end-user (library professional) based recommender system for automatically classifying documents using the UDC scheme. The proposed work is conceived for determining and constructing a complex class number using the syntax of Universal Decimal Classification (UDC). A corpus of documents classified with the UDC scheme is used as a training dataset. The classification of the documents is done with human mediation having proficiency in classificatory approaches. The BERT model and the KNIME software are used for the study. This study uses the classified dataset to fine-tune the pre-trained BERT model to construct the semi-automatic classification model. The results show that the model is constructed with high accuracy and Area Under Curve (AUC) value, although the prediction represented a low accuracy rate. This study reflected that if the model is explicitly trained by annotating each concept and if the full licensed version of UDC class numbers becomes available, there is a greater potency of developing an automated, freely faceted classification scheme for practical use.

Keywords

Automatic Classification, BERT Model, KNIME, Multi-Label Classification, UDC (Universal Decimal Classification).
User
About The Authors

Aditi Roy
Research Scholar, Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734014, West Bengal
India

Saptarshi Ghosh
Professor, Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734014, West Bengal
India


Notifications

  • Automated Subject Identification Using the Universal Decimal Classification : The ANN Approach

Abstract Views: 303  |  PDF Views: 1

Authors

Aditi Roy
Research Scholar, Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734014, West Bengal, India
Saptarshi Ghosh
Professor, Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734014, West Bengal, India

Abstract


Universal Decimal Classification (UDC) is a popular controlled vocabulary that is used to represent subjects of documents. Text categorization determines a text's category, as evident from the notation-text label format of the Universal Decimal Classification. With the help of machine learning techniques and the Universal Decimal Classification (UDC), the present work aims to develop an end-user (library professional) based recommender system for automatically classifying documents using the UDC scheme. The proposed work is conceived for determining and constructing a complex class number using the syntax of Universal Decimal Classification (UDC). A corpus of documents classified with the UDC scheme is used as a training dataset. The classification of the documents is done with human mediation having proficiency in classificatory approaches. The BERT model and the KNIME software are used for the study. This study uses the classified dataset to fine-tune the pre-trained BERT model to construct the semi-automatic classification model. The results show that the model is constructed with high accuracy and Area Under Curve (AUC) value, although the prediction represented a low accuracy rate. This study reflected that if the model is explicitly trained by annotating each concept and if the full licensed version of UDC class numbers becomes available, there is a greater potency of developing an automated, freely faceted classification scheme for practical use.

Keywords


Automatic Classification, BERT Model, KNIME, Multi-Label Classification, UDC (Universal Decimal Classification).

References





DOI: https://doi.org/10.17821/srels%2F2023%2Fv60i2%2F170963