MapReduce usage at Google

Via High Scalability blog (a great addition to any RSS reader out there) there’s a link to Jefrrey Dean’s presentation on MapReduce usage in Google. Actually, his presentation touches upon a few aspects of Google infrastructure, such as GFS, and BigTable, so there’s more on this video. What caught my eye is the relative growth of MapReduce inside Google - 2.2 mln jobs run in September 2007.

image

In the table above, note the drastic growth of input data analyzed and output data generated. The number of actual MapReduce jobs has also grown significantly and reached 10,000 in September 2007.

image

Dean also presented an interesting graph about the frequency of commits of new MapReduce jobs into the repository - as you can see there are months when the number of new projects goes through the roof, followed by a spike.

image

The reason? Summer interns.

image

Complete set of slides is available from Yahoo! Research, which organized the Data-Intensive Computing Symposium.

No Comments »

Leave a comment