Clustering of Scientific Publications Based on Field of Expertise Using Latent Dirichlet Allocation and Normalized PSO-K-means
Abstract
Validating a lecturer's expertise claims often involves scrutinizing their scholarly publications. However, this process can be quite demanding, requiring significant knowledge and time due to the need to assess numerous documents. To address this challenge, this study endeavors to create a model that can categorize documents based on their areas of expertise. The study employs the K-means clustering algorithm to group documents according to the lecturers' fields of expertise. In order to enhance the efficiency of this process, Latent Dirichlet Allocation is utilized to reduce data dimensions. Additionally, Particle Swarm Optimization is used to determine the optimal initial cluster centers for the K-means algorithm. The research yielded promising results, successfully categorizing scholarly publications with a silhouette coefficient of 0.42. Furthermore, by using PSO to identify the optimal cluster centers, the silhouette coefficient was improved by 5.56%. The model's performance was evaluated by comparing the resulting clusters with the provided claims, showing a 75% matching rate and a 25% non-matching rate.
Full text article
Authors
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.