Buckets of Rain
The Story
This site tracks my Information Visualization project. I'm designing a visualization to convey people's movie watching experience. Anyone who watches movies is a target user; I'm specifically targeting Netflix users since it is easy to get their data using Netflix API.
It all started when met Bob Dylan one fine day in a local restaurant. He was chilling and taking a break from his Never Ending Tour. Being a Dylan fan, naturally I walked up to him. But wait... what do I ask him? He is not some girl sitting alone in a bar. He is Bob Dylan an undisputed icon, songwriter, story teller and repeatedly nominated for Nobel Prize. I mustered the courage, walked to his table and asked him "Hey Mr.Dylan, What kind of movies do you like?"
Looking at me he asked, "What do you mean son?"
"Well, each of us has an unique taste in movies. like genres, time periods, languages. So what is yours?"
"We all have our own definitions of all those words. So you won't understand my cowboy horror movie."
Classic Dylan reply, I murmured. That is interesting, may be visualizing a person's movie experience is more expressive. I started day dreaming right there... What if there was a tool to visually communicate someone's movie experience. It could be an image that people can use in their email or forum signature. They can even tattoo the signature on their back. But they have to stop watching movies; since the signature would change as they watch more movies. Hey he is leaving. I snapped out of my dream and asked him, "Any advice for someone like me?"
"Crawl out of your window."
We have interesting stories tied into our movie experience. I want to bring out those stories by visualizing people's movie experience.
Netflix is an movie rental service, having an online interface. Recently Netflix opened their API; so we can write application that can read Netflix customers movie history (with their permission). For this ability I'm targeting Netflix customers. In other words, as long as I can get the movie watching data of a user, I should be able to visualize the user's data.
Another beauty of doing this project in Netflix is the dataset available on Netflix users. Netflix released a huge dataset containing user ratings for movies, with time stamp. This dataset was released as a part of the Netflix Prize competition to develop a recommendation algorithm. The dataset is a good representation of the Netflix customer population.
- 480,189 User ID's
- 17,770 Movies
- 100,480,507 Ratings collected from October 1998 to December 2005
This is a huge dataset, crunching these numbers is a stats project. Moreover the dataset has already been analyzed - a lot. Here are some numbers from Ilya Grigorik's blog.
- Number of movies produced yearly has increased exponentially since 1990s. From ~300 movies in 1990 we have ~1400 movies in 2002.
- Average number of ratings for a movie ~200, blockbuster movies have around 250,000.
- Average of all ratings is 3.8
- 50% of users have rated less than 100 movies. The maximum is ~1000 movies.



