Empirical Study on PTMs for App Reviews classification

Empirical Study Process Overview

Context:

Mobile app reviews written by users on app stores or social media are significant resources for app developers.Analyzing app reviews have proved to be useful for many areas of software engineering (e.g., requirement engineering, testing). Automatic classification of app reviews requires extensive efforts to manually curate a labeled dataset. When the classification purpose changes (e.g. identifying bugs versus usability issues or sentiment), new datasets should be labeled, which prevents the extensibility of the developed models for new desired classes/tasks in practice. Recent pre-trained neural language models (PTM) are trained on large corpora in an unsupervised manner and have found success in solving similar Natural Language Processing problems. However, the applicability of PTMs is not explored for app review classification.

Objective:

We investigate the benefits of PTMs for app review classification compared to the existing models, as well as the transferability of PTMs in multiple settings.

Method:

We empirically study the accuracy and time efficiency of PTMs compared to prior approaches using six datasets from literature. In addition, we investigate the performance of the PTMs trained on app reviews (i.e. domain-specific PTMs) . We set up different studies to evaluate PTMs in multiple settings: binary vs. multi-class classification, zero-shot classification (when new labels are introduced to the model), multi-task setting, and classification of reviews from different resources. The datasets are manually labeled app review datasets from Google Play Store, Apple App Store, and Twitter data. In all cases, Micro and Macro Precision, Recall, and F1-scores will be used and we will report the time required for training and prediction with the models.

In this study, we aim to explore the benefits of PTMs compared to the existing approaches for app review analysis, specifically app issue-classification tasks. We define the app issue-classification as the task of extracting useful information from users’ feedback which can be requirements related, release planning related, and software maintenance related. The extracted information helps identifying different aspects, including feature requests, aspect evaluations (e.g., feature strength, feature shortcoming, application performance), problem reports, usability, portability, reliability, privacy and security, energy related, appraisals, inquiries about the application, etc. from app reviews. Our goal is to investigate accuracy and time efficiency of PTMs for classification of different datasets with various labels and multiple tasks (i.e. issue classification and sentiment analysis of app reviews). Therefore, experiments will be conducted in different settings. These experiments will provide baselines on the applicability of PTMs for app review analysis including cost of using them (in term of required time for predictions), and their capability to reduce the manual effort required for labeling large datasets. We expect that PTMs can achieve at least the same performance as the current approaches, as well as being beneficial to be used for multiple classification tasks. The contributions of this study are:

  • This is the first study that explores the applicability of PTMs for automatic app issue classification tasks compared to the existing tools.

  • We will conduct an extensive comparison between four PTMs and four existing tools/approaches on six different app review datasets with different sizes and labels.

  • We are the first to explore the performance of general versus domain-specific pre-trained PTMs for app review classification.

  • This is the first empirical study to examine the accuracy and efficiency of PTMs in four different settings: binary vs. multi-class classification, zero shot setting, multi-task setting, and setting in which training data is from one resource (e.g. App Store) and the model is tested on data from another platform (e.g. Twitter).

Mohammad Abdul Hadi
Mohammad Abdul Hadi
MSc Student

My research focuses on the implementation of advanced Machine Learning approaches (i.e., Transfer Learning, Unsupervised Learning, and Online Learning) to solve critical Software Engineering problems.

Previous

Related