• Franz-Xaver Geiger
  • Ivano Malavolta
  • Luca Pascarella
  • Fabio Palomba
  • Dario Di Nucci
  • Alberto Bacchelli
Empirical studies on the engineering of Android apps need to be based on open datasets and tools to allow comparisons, improve generalizability, and enable replicability. However, obtaining a good dataset is problematic and this state of things slows down empirical research on this topic.
In this paper, we contribute to overcome this challenge by presenting the firrst, self-contained, publicly available dataset weaving spread-out data sources about real-world, open-source Android apps. Our dataset is encoded as a graph-based database and contains the following information about 8,431 real open-source Android apps: (i) metadata about their GitHub projects, (ii) Git repositories with full commit history and (iii) metadata extracted from the Google Play store, such as app ratings and permissions. The dataset is available in Docker images to ease adoption.
Original languageEnglish
Title of host publicationin Proceedings of the 15th ACM/IEEE International Conference on Mining Software Repositories, Data Showcase Track
PublisherACM / IEEE
Number of pages4
ISBN (Electronic)978-1-4503-5716-6
StatePublished - 2 Mar 2018
Event15th International Conference on Mining Software Repositories - Gothenburg, Sweden
Duration: 28 May 201829 May 2018
Conference number: 15


Conference15th International Conference on Mining Software Repositories
Abbreviated titleMSR
Internet address

    Research areas

  • Android, Mining Software Repositories, Dataset

ID: 36663690