In this study, we investigate the impact of data sampling on the stability
of uplift models. The aim of uplift modelling is to predict people’s
reaction to a campaign offering incentives towards taking a particular
action, such as buying a product or not churning from a service.
In the literature, it has been reported that uplift models can be unstable
performance-wise (in terms of the Qini measure for example)
and may consequently be deemed unreliable. We investigate whether
model stability can be improved by modifying the class ratio using
data sampling techniques, without worsening the performance in the
process. Specifically, we use both under- and oversampling, as well as
the methods smote and rose. Whereas the first two remove or duplicate
records, the latter two artificially generate new records taking account
of their neighbourhood while potentially also removing records.
The uplift models are built considering uplift random forests, and the
methods of Lo and Generalized Lai. In our experimental design, we
take account of different class ratios and we use a number of direct
marketing datasets. Furthermore, we check whether these new models
show an increased performance, studying whether there exists a tradeoff
between stability and performance. In some cases, we observe that
sampling methods have a positive impact on both the stability and performance
of uplift models, but this depends on the method and dataset
used.
Original languageEnglish
Title of host publicationEURO 2018: 29th European Conference on Operational Research
PublisherEuro: The Association of European Operational Research Societies
Pages193
Number of pages1
Publication statusPublished - Jul 2018
EventEURO 2018: 29th European Conference on Operational Research - Valencia, Spain
Duration: 8 Jul 201811 Jul 2018
Conference number: 29
http://euro2018valencia.com/

Conference

ConferenceEURO 2018
Abbreviated titleEURO 2018
CountrySpain
CityValencia
Period8/07/1811/07/18
Internet address

ID: 38788888