Designing scalability for personalization

Greg Linden posted another short excerpt on working on Findory - a personalized news search engine that he designed and built. Personalization pretty frequently defies your caching strategies, as delivering a new page for each user is very likely to deliver a pretty low cache hit rate. However, generating the page on the fly has to be fast, especially if it’s the first page the user sees. How does Findory deal with that? By offloading front page generation to offline batch process that pre-computes the data to be shown to the user, which is then fed into a MySQL table:

The way Findory does this is that it pre-computes as much of the expensive personalization as it can. Much of the task of matching interests to content is moved to an offline batch process. The online task of personalization, the part while the user is waiting, is reduced to a few thousand data lookups. Even a few thousand database accesses could be prohibitive given the time constraints. However, much of the content and pre-computed data is effectively read-only data. Findory replicates the read-only data out to its webservers, making these thousands of lookups lightning fast local accesses. Read-write data, such as each reader’s history on Findory, is in MySQL. MyISAM works well for this task since the data is not critical and speed is more important than transaction support.

Posted in General at September 30th, 2007. Trackback URI: trackback

No Responses to “Designing scalability for personalization”

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>