Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020


Affiliations
1 Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal, India
     

   Subscribe/Renew Journal


Academic writing has played an essential role in communicating the cognitive aspects of the human mind. Natural Language Processing (NLP) tools enable us to examine linguistic knowledge. However, writing patterns and applicable linguistic characteristics differ geographically. The study’s primary purpose is to understand the global writing pattern and linguistic diversities of research articles in the LIS domain. The corpus was identified from four SCOPUS-enrolled open-access libraries and information science journals. The journals published in India and outside India were selected for the study in 2020. The syntactic complexity in 147 text documents was measured using the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TASSAC). The corpus was further examined using the Structural Equation Model (SEM) to determine the causal relationship among independent variables such as syntax features and readability scores. The results depict the differences in the patterning of syntactic features at both the global and national levels. Furthermore, the study allows us to see how linguistic diversity is underplayed in research writings and helps to understand writing patterns through cross-country comparisons. Furthermore, the paper employs model-based reasoning to identify global and national latent variables.

Keywords

Corpus Linguistic, Noun Phrase Complexity, Readability, Structural Equation Model, Syntactic Sophistication.
User
About The Authors

Mousumi Saha
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal
India

Saptarshi Ghosh
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal
India


Notifications

  • Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020

Abstract Views: 381  |  PDF Views: 2

Authors

Mousumi Saha
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal, India
Saptarshi Ghosh
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal, India

Abstract


Academic writing has played an essential role in communicating the cognitive aspects of the human mind. Natural Language Processing (NLP) tools enable us to examine linguistic knowledge. However, writing patterns and applicable linguistic characteristics differ geographically. The study’s primary purpose is to understand the global writing pattern and linguistic diversities of research articles in the LIS domain. The corpus was identified from four SCOPUS-enrolled open-access libraries and information science journals. The journals published in India and outside India were selected for the study in 2020. The syntactic complexity in 147 text documents was measured using the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TASSAC). The corpus was further examined using the Structural Equation Model (SEM) to determine the causal relationship among independent variables such as syntax features and readability scores. The results depict the differences in the patterning of syntactic features at both the global and national levels. Furthermore, the study allows us to see how linguistic diversity is underplayed in research writings and helps to understand writing patterns through cross-country comparisons. Furthermore, the paper employs model-based reasoning to identify global and national latent variables.

Keywords


Corpus Linguistic, Noun Phrase Complexity, Readability, Structural Equation Model, Syntactic Sophistication.

References





DOI: https://doi.org/10.17821/srels%2F2023%2Fv60i3%2F171027