AOBTM - Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis

AOBTM Model

Abstract

Analysis of mobile app reviews has shown its important role in requirement engineering, software maintenance and evolution of mobile apps. Mobile app developers check their users’ reviews frequently to clarify the issues experienced by users or capture the new issues that are introduced due to a recent app update. App reviews have a dynamic nature and their discussed topics change over time. The changes in the topics among collected reviews for different versions of an app can reveal important issues about the app update. A main technique in this analysis is using topic modeling algorithms. However, app reviews are short texts and it is challenging to unveil their latent topics over time. Conventional topic models such as Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) suffer from the sparsity of word co-occurrence patterns while inferring topics for short texts. Furthermore, these algorithms cannot capture topics over numerous consecutive time-slices (or versions). Online topic modeling algorithms such as Online LDA (OLDA) and Online Biterm Topic Model (OBTM) speed up the inference of topic models for the texts collected in the latest time-slice by saving a fraction of data from the previous time-slice. But these algorithms do not analyze the statistical-data of all the previous time-slices, which can confer contributions to the topic distribution of the current time-slice.In this paper, we propose Adaptive Online Biterm Topic Model (AOBTM) to model topics in short texts adaptively. AOBTM alleviates the sparsity problem in short-texts and considers the statistical-data for an optimal number of previous time-slices. We also propose parallel algorithms to automatically determine the optimal number of topics and the best number of previous versions that should be considered in topic inference phase. Automatic evaluation on collections of app reviews and real-world short text datasets confirm that AOBTM can find more coherent topics and outperforms the state-of-the-art baselines. For reproducibility of the results, we open source all scripts.

Publication
In IEEE International Conference on Software Maintenance and Evolution (ICSME)
Mohammad Abdul Hadi
Mohammad Abdul Hadi
MSc Student

My research focuses on the implementation of advanced Machine Learning approaches (i.e., Transfer Learning, Unsupervised Learning, and Online Learning) to solve critical Software Engineering problems.

Fatemeh H Fard
Fatemeh H Fard
Assistant Professor

I am interested in the applications of data science and machine learning for software engineering. Specifically I am working on the detection and prediction of defect/anomalous behaviour in software. This also requires using big data analysis in practice

Next
Previous

Related