Engineering Adventure: May 2014

Note: This essay continues from questions raised by the previous essay on A/B testing in the App Store.

In my last essay, I explored some ideas on how to improve results in the App Store by experimenting with ads. I ran Facebook Ads to A/B Test the Click Through Rates (CTR**) of two different images.

Based on the differing CTRs, I jumped to the conclusion that the composition of the images made the difference. In one ad, an entire iPad was shown. In the other ad, most of an iPhone was cut off, except for the screen. In other words, I believed that the way I depicted the iPad or iPhone resulted in more clicks. I didn't doubt that I was one step closer to buying a tropical island.

Manton Reece read my article and immediately wondered if maybe the iPad vs. iPhone split could account for the difference. In other words, maybe folks were more likely to click on an ad that depicted the device they were holding in their hands. I immediately slapped my forehead. My app island slipped below the waves. I vowed to figure out what really happened with my last experiment.

TLDR: Manton was correct. Also, implementing the changes suggested by Manton’s hypothesis improved my CTRs quite a bit.

A New Experiment

To start examining the Manton Hypothesis, I first tried sifting through the ad data Facebook provides. Perhaps I could see which clicks belonged to iPads and which belonged to iPhones or iPod Touches*. Unfortunately I couldn't find that information. It didn't seem like I could examine Manton’s theory with the ad data I already collected.

No worries, I can find out with another experiment! The Facebook Mobile App Ads for Installs (!) allows the each ad to target a different device. Keeping the other parameters the same, I set the underperforming ad (the one with a photo of an iPad) to only target iPads. Even one day into the experiment, it seemed like Manton was correct. The CTR for the iPad photo jumped up nicely.

After a week and a half of targeting the iPad ad to the iPad, the CTR rose from 1.5% to 3.08%. Double!

I was so impressed, I decided to try the same thing for the ad with the cropped iPhone. I targeted it to only display on iPhones or iPod Touches.

This time I didn't expect to see a huge jump in performance. Why? Remember, in the previous article, the iPhone ad was already out-performing the iPad ad. If the Manton hypothesis was the only explanation for the differences in CTR, that implies that there were a lot more iPhone or iPod Touch users getting my ads. The diagram below shows how the ad with the iPhone Photo gets a higher CTR when both ads get the same mix of iPhone and iPad users.

As the diagram shows, the iPad photo (right side) gets a lower CTR because the viewers as a whole are dominated by iPhone users. According to the Manton Hypothesis, iPhone users aren't as likely to click on a photo of an iPad. That’s why the pie is smaller on the right, and why that pie is mostly iPad clicks.

But the iPhone photo has the reverse situation: the iPhone users still dominate, but this time they are seeing a iPhone photo. So in this case the majority of the users see a photo which matches the device in their hands -- a favorable situation. This leads to the bigger pie on the left. This time the less interested party is the iPad users. They are the smaller percentage of the folks seeing the ad. Their diminished CTR has a smaller impact on the aggregate CTR.

So, what happens when you target the iPhone ad at only the iPhone users? Just as the Manton Hypothesis predicts, the CTR improved, and the impact of targeting the iPhone ad to only iPhone and iPod Touch users wasn’t as large. The CTR for the iPhone photo the week before the change was 2.686%, and was 3.128% after the change.

Ad Description	Combined CTR	CTR Targeting only the pictured device
iPad Photo	1.5%	3.08%
iPhone Photo	2.686%	3.128%

Noise

But here is an important point: the lifetime Combined CTR when I was indiscriminately showing an iPhone photo to both iPhones and iPads was 2.931%. The baseline I picked for the numbers above were for the week before my change. The improvement looks pretty small when compared to the larger baseline. 2.931% vs. 3.128% doesn't seem as exciting as 2.686% vs. 3.128%.

Why was the week-prior CTR lower than the all-time CTR? I don't know for sure. The numbers I’m dealing with here are small. I’m not spending hundreds of dollars a day on ads, and I’m not getting a huge number of installs. I don't have a $50,000 advertising budget. These tests are being done for $5 a day.

My experiments here come from 1,000’s of clicks, not tens of thousands or millions. The sample size may not be large enough. What looks like an insight might just be noise. So take these results with a grain of salt. Or, even better, run your own experiments using your own money.

If you have a small budget like this project, the best cure for uncertainty is to hedge your bets and keep re-checking your assumptions. If you have real money riding on an outcome, it makes sense to double check your work. At the very least, be prepared to revert your changes!

Making a Model

I did have another idea for examining our results. What if we could mathematically model our hypothesis and see how it fits the data. I would feel more confident of the hypothesis if I could make a model that makes a decent prediction about a different data set.

One cool thing about these Ads is that we can see the size of the potential audience for each ad. The audience for the iPad Ad is 182,000 people. The audience for the iPhone Ad is 620,000 people. The only difference between these audiences is that one targets the iPad and the other the iPhone / iPod Touch.

So, lets make lots of assumptions about the size of the audience and the probability of getting a iPad versus a iPhone user. Lets assume that if we don't target the iPad or iPhone specifically, the probability the ad will be shown on either device is proportional to the size of the audience. For instance:

Probability(iPad) = 182,000 / (182,000 + 620,000) = 23%
Probability(iPhone) = 620,000 / (182,000 + 620,000) = 77%

So, now we can make some assumptions and create a model for the iPhone ad targeting both iPads and iPhones:

77% * sameDeviceCTR + 23% * differentDeviceCTR = combinedCTR

Or visually:

Now we can do algebra:

23% * differentDeviceCTR = combinedCTR - (77% * sameDeviceCTR)

differentDeviceCTR = (combinedCTR - (77% * sameDeviceCTR) ) / 23%

now assume combinedCTR = 2.686% (the iPhone image CTR when targeting both devices) and sameDeviceCTR = 3.128% (the iPhone image CTR after targeting only iPhones)

then differentDeviceCTR = 1.2%

Now lets take the differentDeviceCTR we just calculated from the iPhone ad and see if it predicts the outcome for the iPad Photo ad in the same situation: targeting both iPhones and iPads.

In this case, the equation looks a little different because we're flipping sameDevice (now iPad, because we're considering the iPad image) and differentDevice (now iPhone):

77% * differentDeviceCTR + 23% * sameDeviceCTR = combinedCTR

Now we plug in the same numbers from before:

77%* 1.2% + 23% * 3.128% = combinedCTR

combinedCTR = 1.64%

This model doesn't seem too horrible! It predicted 1.64% CTR for the iPad photo ad when targeted against both iPad and iPhone. The reality was 1.5%. I'm pleasantly surprised. Again, the Manton theory seems quite reasonable. I'll leave it to you to see what happens if we use the other iPhone photo combinedCTR of 2.931% baseline -- the model doesn't agree as well.

Conclusions

So what do we conclude from this exercise? For one, targeting your ads specifically to the iPad or iPhone user could be worth your time. I doubled the CTR of my under-performing ad with two clicks.

Even more importantly, running experiment on your ads can really pay off. And the steps aren't difficult: form a hypothesis, run an experiment using ads, collect the data, and make a simple model. With a model, you can try to predict the impact of your change.

With this first bit of new knowledge, maybe I'll save tons of money on ad spend. And then maybe I can apply what I learned to other areas of the sales funnel. And then I can try another experiment, learn, and implement. Maybe my tropical app island isn't entirely out of reach.

I’m really glad that Manton commented on my last article. Thanks! An extra set of eyes is invaluable!

*No, it’s not called the iTouch! Also, be aware that Facebook lumps together the iPhone with the iPod Touch. For certain questions that could be important.

** Yes, technically it should be TTR, not CTR, since you tap on an iOS device.

Engineering Adventure

Tuesday, May 27, 2014

More App Store Ad Experiments: Platform Targeting

A New Experiment

Noise

Making a Model

Conclusions

Links

Howdy

Who is John?

Mobile Apps

Developer Careers

Blog Archive