AOBTM

AOBTM Process Overview

Analysis of mobile app reviews has shown its important role in requirement engineering, software maintenance and evolution of mobile apps. Mobile app developers check their users’ reviews frequently to clarify the issues experienced by users or capture the new issues that are introduced due to a recent app update. App reviews have a dynamic nature and their discussed topics change over time. The changes in the topics among collected reviews for different versions of an app can reveal important issues about the app update. A main technique in this analysis is using topic modeling algorithms. However, app reviews are short texts and it is challenging to unveil their latent topics over time. Conventional topic models such as Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) suffer from the sparsity of word co-occurrence patterns while inferring topics for short texts. Furthermore, these algorithms cannot capture topics over numerous consecutive time-slices (or versions). Online topic modeling algorithms such as Online LDA (OLDA) and Online Biterm Topic Model (OBTM) speed up the inference of topic models for the texts collected in the latest time-slice by saving a fraction of data from the previous time-slice. But these algorithms do not analyze the statistical-data of all the previous time-slices, which can confer contributions to the topic distribution of the current time-slice.In this paper, we propose Adaptive Online Biterm Topic Model (AOBTM) to model topics in short texts adaptively. AOBTM alleviates the sparsity problem in short-texts and considers the statistical-data for an optimal number of previous time-slices. We also propose parallel algorithms to automatically determine the optimal number of topics and the best number of previous versions that should be considered in topic inference phase. Automatic evaluation on collections of app reviews and real-world short text datasets confirm that AOBTM can find more coherent topics and outperforms the state-of-the-art baselines. For reproducibility of the results, we open source all scripts.

In this paper, we propose a new adaptive online topic model for short texts which takes previous versions’ varying contribution into account. We refer to this novel model as the Adaptive Online Biterm Topic Model (AOBTM). AOBTM inherits the characteristics of BTM to deal with the data sparsity issue. It is an online algorithm that can scale for the increasing volume of the dataset that is generated frequently. AOBTM also endows the statistics of the previous versions with different contributions to the topic distributions of the current version of the dataset. Also, we have employed a preprocessing technique that is useful for yielding better top contributing key-terms to help the manual investigation of the inferred topics. Our contributions are enlisted below:

  • We propose a novel method called AOBTM for version sensitive content analysis for short texts. This method adaptively combines the topic distributions of a selected number prior versions to generate topic distributions of the current version.

  • We propose two parallel algorithms; the first algorithm can identify an optimal number of topics to be derived in the latest version, and the second algorithm can identify the optimal number of previous versions to be taken into consideration for adaptive aggregation of statistical data.

  • To encourage replicability, we make all scripts, codes, and graphs available to the community.

We have conducted experiments on app review datasets and Twitter dataset with large number of records to evaluate performance of AOBTM compared to five baseline algorithms. Also, we integrated AOBTM into the state of the art online app-review analysis framework called IDEA for comparison. Our results show that topics captured by AOBTM are more coherent compared to the topics extracted by baseline methods.

Mohammad Abdul Hadi
Mohammad Abdul Hadi
MSc Student

My research focuses on the implementation of advanced Machine Learning approaches (i.e., Transfer Learning, Unsupervised Learning, and Online Learning) to solve critical Software Engineering problems.

Next
Previous

Related