Liang Tang, Romer Rosales, Ajit P. Singh, Deepak Agarwal
In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM 2013)
Visual design plays an important role in online display advertising: changing the layout of an online ad can increase or decrease its effectiveness, measured in terms of click-through rate (CTR) or total revenue. The decision of which lay- out to use for an ad involves a trade-off: using a layout provides feedback about its effectiveness (exploration), but collecting that feedback requires sacrificing the immediate reward of using a layout we already know is effective (exploitation). To balance exploration with exploitation, we pose automatic layout selection as a contextual bandit problem. There are many bandit algorithms, each generating a policy which must be evaluated. It is impractical to test each policy on live traffic. However, we have found that offline replay (a.k.a. exploration scavenging) can be adapted to provide an accurate estimator for the performance of ad layout policies at Linkedin, using only historical data about the effectiveness of layouts. We describe the development of our offline replayer, and benchmark a number of common bandit algorithms.