MapReduce usage at Google
Via High Scalability blog (a great addition to any RSS reader out there) there’s a link to Jefrrey Dean’s presentation on MapReduce usage in Google. Actually, his presentation touches upon a few aspects of Google infrastructure, such as GFS, and BigTable, so there’s more on this video. What caught my eye is the relative growth of MapReduce inside Google - 2.2 mln jobs run in September 2007.
In the table above, note the drastic growth of input data analyzed and output data generated. The number of actual MapReduce jobs has also grown significantly and reached 10,000 in September 2007.
Dean also presented an interesting graph about the frequency of commits of new MapReduce jobs into the repository - as you can see there are months when the number of new projects goes through the roof, followed by a spike.
The reason? Summer interns.
Complete set of slides is available from Yahoo! Research, which organized the Data-Intensive Computing Symposium.