• Franz-Xaver Geiger
  • Ivano Malavolta
  • Luca Pascarella
  • Fabio Palomba
  • Dario Di Nucci
  • Alberto Bacchelli
Empirical studies on the engineering of Android apps need to be based on open datasets and tools to allow comparisons, improve generalizability, and enable replicability. However, obtaining a good dataset is problematic and this state of things slows down empirical research on this topic.
In this paper, we contribute to overcome this challenge by presenting the firrst, self-contained, publicly available dataset weaving spread-out data sources about real-world, open-source Android apps. Our dataset is encoded as a graph-based database and contains the following information about 8,431 real open-source Android apps: (i) metadata about their GitHub projects, (ii) Git repositories with full commit history and (iii) metadata extracted from the Google Play store, such as app ratings and permissions. The dataset is available in Docker images to ease adoption.
Original languageEnglish
Title of host publicationProceedings - 2018 ACM/IEEE 15th International Conference on Mining Software Repositories, MSR 2018
PublisherACM / IEEE
Number of pages4
ISBN (Electronic)978-1-4503-5716-6
ISBN (Print)9781450357166
Publication statusPublished - 28 May 2018
Event15th International Conference on Mining Software Repositories - Gothenburg, Sweden
Duration: 28 May 201829 May 2018
Conference number: 15

Publication series

NameProceedings of the 15th International Conference on Mining Software Repositories - MSR '18


Conference15th International Conference on Mining Software Repositories
Abbreviated titleMSR
Internet address

    Research areas

  • Android, dataset, mining software repositories

ID: 36663690