## Sunday, September 26, 2010

### I Can Haz Estimation?

If you’re like me, there’s nothing you like better after a long work day than kicking back and looking at some grammatically incorrect cats.  And who better to provide all your LOLcat needs than The Cheezburger Network?  In honor of my favorite intertubes diversion, how many pictures of cats are on the entire Internet?

This is a difficult problem.  That being the case, we expect our estimate will likely be very different from the actual number.  In cases like these, it is helpful to determine what the wrong answers are.  To do this, you need to calculate upper and lower bounds.  To start, let’s estimate how many people are cat owners.  You might guess that 10% of people own cats.  Is this reasonable?  Well the actual number is certainly less than 100% of people, and very likely greater than 1% of people.  If you asked a 100 of your friends and family, at least one of them probably has a cat.  The next question you could reasonably ask is how many cats do cat owners typically have.  Some people have 10 cats, while others only have one.  A reasonable guess is 2 cats per owner.

Now comes the tricky part.  What fraction of people put pictures of their cat on the web?  Many of you are tech savvy and have accounts on Facebook, Flickr, etc.  If you have a cat, you very likely have at least one picture of it on the web somewhere.  But what about your parents, grandparents, and friends that may not be as tech savvy as you?  What about people in developing nations that may not have Internet access?  If you average over everyone including people in other countries, what percentage of cat owners will post pictures on the web?  To be safe, lets say 10% again.  Is this reasonable?  As before, the actual number will certainly be less than 100%.  Will it be great than 1%?  Possibly, but there are a lot of people that don’t like putting their information up on the Internet and the ones that do might not put their cat on the Internet.  To be safe, let’s put the lower bound at 0.1%.  We can summarize what we know in a chart like the one below:

 Upper Guess Lower % of people that own cats 100% 10% 1% Cats per owner 10 2 1 % of owners that put cat on web 100% 10% 0.1% Pictures per cat 100 10 1 World population 6.7x109 6.7x109 6.7x109 # of cat pictures 6.7x1013 1.3x109 6.0x104

Our upper and lower bounds are very far apart (i.e. 9 orders of magnitude).  The answer certainly lies between them, but is there any way we can tighten these bounds?

Let’s try calculating the answer in a different way.  According to Netcraft, there are 2.5x1010 web pages as of a few years ago. As an upper bound, we might say that at most, every page has 10 cat pictures on it.  Even in this extreme case, there would only be 2.5x1011 cat pictures, so we’ve effectively reduced our upper bound by a factor of 100.

Now lets see if we can adjust our lower bound.  A quick Google image search for the word “cat” pulls up about 1.5x108 results.  Looking at the pictures on the first page, we can see that the vast majority of the images are of cats, but occasionally you get one that’s not a cat.  This is very rare.  In fact, it seems much more likely that Google is missing cat pictures since it doesn’t have access to private photos on sites like Facebook and Flickr.    For this reason, we could probably take the Google result as a lower bound.  Just to be extra careful, let’s say the lower bound is 1/10th the number of “cat” images found by Google.

Our final results:

Upper Bound -- 2.5x1011 cat pictures
Actual Estimate – 1.3x109
Lower Bound -- 1.5x107