ASM-XML - XML parser written in Assembly
|
|
|
When searching for Nintendo Wii games, Amazon currently provides only 42 titles. This is the master list from Nintendo, which keeps track of existing and upcoming games for Wii.
|
Peter Zaitsev’s and Vadim Tkachenko’s talk on MySQL full-text searching solutions at Euro OSCON 2006 is a great read for anybody who explored MySQL fulltext searching but wondered whether anything more advanced is available. The authors experimented with a full dump of Wikipedia database.
So what does built-in MySQL fulltext search deliver? Support of MyISAM, natural language search, support for boolean operators, ft_min_word_len allowing Webmaster to specify minimal word length, stop word list and frequency-based ranking, the presentation tells us.
How can you optimize fulltext searches in MySQL? The index for the fulltext search should ideally easily fit into the available RAM. Stop words should contain the most widely used noise words, but make sure that users don’t search for them. Keep ft_min_word_len reasonably high - the default is 3, and any decrease would dramatically impact the performance, as particles and other noise words are all of a sudden indexed. Avoid counting the number of matches - although lack of total results is usually poor on user experience. Avoid sorting. Be careful with the WHERE clauses. Large LIMIT offsets, such as 1000, 10, cause significant slowness. Avoid GROUP BY, as usually it waits for all results to be fetched, and then regroups them. Boolean phrase searches are notoriously slow in MySQL fulltext. The MySQL fulltext search index is BTREE-based and every indexable word is a node in that BTREE. So whenever you add a new paragraph of text with a 1,000 new indexable words, there are 1,000 new entries to be added to the fulltext search index.
There are a number of homebrew solution trying to do a more efficient job with fulltext searches. Senna for MySQL is a fulltext search engine for MySQL tables. SQLSearch is another engine. Lucene is a Java-based search engine. mnoGoSearch is a Web search engine using MySQL as its backend. Sphinx Search is a high-performance easy-to-use MySQL fulltext search engine with support for snippets. TBGSearch is a vector-based search engine that the presentation advises for medium-size data sets.
Patrick Galbraith from Grazr is describing MySQL multi-master replication, where two nodes replicate each other’s updates. Useful for the case when you’re running the product out of multiple data centers, and there is no predictability on where the writes will occur, i.e. both are hot MySQL servers.
IBM DeveloperWorks introduces us to XCache on PHP:
XCache is a relative newcomer, but many sites are reporting good results with it. In addition, it is easy to build, install, and configure because it’s implemented as a PHP extension. Recompiling Apache and PHP isn’t required. This article is based on XCache V1.2.0. It reliably supports PHP V4.3.11 to V4.4.4, PHP V5.1.x to V5.2.x, and early versions of PHP V6. (XCache doesn’t support PHP V5.0.x.) XCache works with mod_php and FastCGI, but not with the Common Gateway Interface (CGI) or the command-line PHP interpreter. The XCache source code builds on a variety of systems, including FreeBSD, Sun Solaris, Linux®, and (as shown here) on Mac OS X. XCache can be built on Microsoft® Windows®, as well, using the Cygwin UNIX® emulation environment or Visual C. You can build XCache for Cygwin or for native Win32. The latter target is compatible with the official Win32 release of PHP.
Dr. Dobb’s Journal runs a lengthy introduction to Linden Scripting Language, the language behind avatars and their interaction in Second Life:
LSL is a scripting language that runs server-side, on a piece of software called the simulator. The simulator does just what it’s name implies — it simulates the virtual world of Second Life. Each simulator runs everything for 16 acres of virtual land — buildings, physics, and of course, scripts. While you manipulate the script text in a form that is somewhat easy to read, the actual code that runs on the simulator is compiled. A compiler is a piece of software that takes the text version of the script and converts it into something that can actually run. In the case of LSL, the compiler exists within the Second Life viewer itself. In the future, it is likely that the compiler will move from the viewer into the Second Life simulators, but where the code is compiled isn’t very important. What matters is that the text is converted into a form that can run on the simulators.
There’s a tech talk How Open Source Projects Survive Poisonous People (And You Can Too) on Google Video, where the guys who developed Subversion and currently work at Google (the company tends to hire a bunch of open source developers including VIM creator) share their tips on running an open source project.
Here’s a synopsis of their tips, although the video is worth watching for real-life stories and anecdotes they tell.
I was reading a presentation by Ronny Kohavi and Matt Round from Amazon.com on Amazon using analytics for deciding on what the site looks like, what the top-level navigation looks like, and what the front page of Amazon.com looks like. Hat tip to Greg Linden for linking to it.
Amazon has been widely known for throwing a gauntlet to all the widely accepted Web 2.0 maxims. Standards support? Nah. Using simple URLs? Event front page directs to something like http://www.amazon.com/ref=topnav_gw_gw/104-6121055-7037564, although, granted, it doesn’t have the obidos links with a bunch of hyphenated nouns that it used to.
What’s interesting in that set of slide is defying another maxim - simple is beautiful. Granted, the implementation of simple might differ depending on what function you’re trying to accomplish. There are two slides discussing experimenting with Amazon front page in order to make it simpler.
So why is the front page of Amazon such a hodge-podge of suggestions, recommendations, related items, new additions, shakers and movers and other recently viewed items? The answer is simple - it sells better.
Simple design in Amazon’s case generated higher cart abandonment and statistically significant decreases in customer conversions. Which is all that matters in data-driven e-commerce company. So it looks like in Amazon’s case its customer not only do not do not appreciate simplicity thrown upon them, they actually enjoy and celebrate complexity, partying with their dollars when the front page is complex.
Fresh from VIM talk, I was curious to see Andrei Zmievski post VIM script files from his VIM for PHP programmers presentation. It’s not one of those 3 page presentations ending with “Read the VIM manual” either, it’s a 77-page guide to optimizing one’s VIM experience when writing PHP.