Effective MySQL fulltext searching

Peter Zaitsev’s and Vadim Tkachenko’s talk on MySQL full-text searching solutions at Euro OSCON 2006 is a great read for anybody who explored MySQL fulltext searching but wondered whether anything more advanced is available. The authors experimented with a full dump of Wikipedia database.

So what does built-in MySQL fulltext search deliver? Support of MyISAM, natural language search, support for boolean operators, ft_min_word_len allowing Webmaster to specify minimal word length, stop word list and frequency-based ranking, the presentation tells us.

How can you optimize fulltext searches in MySQL? The index for the fulltext search should ideally easily fit into the available RAM. Stop words should contain the most widely used noise words, but make sure that users don’t search for them. Keep ft_min_word_len reasonably high – the default is 3, and any decrease would dramatically impact the performance, as particles and other noise words are all of a sudden indexed. Avoid counting the number of matches – although lack of total results is usually poor on user experience. Avoid sorting. Be careful with the WHERE clauses. Large LIMIT offsets, such as 1000, 10, cause significant slowness. Avoid GROUP BY, as usually it waits for all results to be fetched, and then regroups them. Boolean phrase searches are notoriously slow in MySQL fulltext. The MySQL fulltext search index is BTREE-based and every indexable word is a node in that BTREE. So whenever you add a new paragraph of text with a 1,000 new indexable words, there are 1,000 new entries to be added to the fulltext search index.

There are a number of homebrew solution trying to do a more efficient job with fulltext searches. Senna for MySQL is a fulltext search engine for MySQL tables. SQLSearch is another engine. Lucene is a Java-based search engine. mnoGoSearch is a Web search engine using MySQL as its backend. Sphinx Search is a high-performance easy-to-use MySQL fulltext search engine with support for snippets. TBGSearch is a vector-based search engine that the presentation advises for medium-size data sets.

Posted Sunday, February 25th, 2007 under MySQL, Programming.

Leave a Reply