Blog archives for February, 2007

ASM-XML - XML parser written in Assembly

Marc Kerbiquet, probably disappointed at the speed of generic XML parsers, wrote an XML parser in Assembly, promising 200 MBps of processing bandwidth.
clipped from mkerbiquet.free.fr

AsmXml is a very fast XML parser and decoder for x86 platforms.
It achieves high speed by using the following features:

To give an idea of the relative speed of AsmXml, the fastest open source
XML parsers process between 10 and 30 MBs of XML per seconds while AsmXml
processes around 200 MBs per seconds (tested on an Athlon XP 1800+).

This parser is intended for applications that need intensive processing of XML.
This project will likely appeal you if XML parsing is a bottleneck in your data-flow.
It is expecially designed for bulk loads into databases.

This is not an all-purpose library, it is not designed to be used with DOM, SAX,
XPath and so on. Here, XML is just considered as an interchange format,
not as a working format.

  powered by clipmarks

History of Minesweeper

GameSetWatch runs a long article on the history of MineSweeper.
clipped from www.gamesetwatch.com

Unlike the previous minefield games, Minesweeper has no avatar. You can check any location on the board without having to find a path there. And because there’s no avatar, the goal is no longer safe passage; instead; you must clear every non-mined square of the grid before you have succeeded. There is a timer and a best-score high-score list. (The high scores are easily manipulable, though; Vista’s Minesweeper looks to be more resistant to such techniques). Using a two-button mouse to quickly reveal and flag squares, the game moves much faster than its predecessors. And in addition, there was a pleasant smiley face at the top of the game.

  powered by clipmarks

Master list of all Nintendo Wii games

When searching for Nintendo Wii games, Amazon currently provides only 42 titles. This is the master list from Nintendo, which keeps track of existing and upcoming games for Wii.

Sony adds an Internet link to Bravia lineup of LCD HDTVs

Marrying Internet content to your television set has always been kinda an issue for the industry. Should it be done through TiVo? Cable set-top box? A Microsoft Media Center you just hook up to a high def TV? Nintendo Wii? Akimbo box? Each company has the incentives to say they have the final solution to delivering the content to the television screen, but now Sony is just putting a WiFi connectivity module to its LCD HDTVs.
clipped from news.sel.sony.com
             Previously, Sony announced the first of its kind BRAVIA Internet Video Link module that allows for direct television access to Internet video content, including high-definition programming, from providers like AOL, Yahoo! and Grouper, as well as Sony Pictures Entertainment and Sony BMG Music. 

            The module mounts on the back of the compatible Sony televisions announced today,  and connects to the Internet via an existing broadband Ethernet connection without the use of a computer.    It provides access to such content as Internet video programs, music videos, movie trailers, user generated videos and RSS feeds – all without any additional charges.

  powered by clipmarks

Effective MySQL fulltext searching

Peter Zaitsev’s and Vadim Tkachenko’s talk on MySQL full-text searching solutions at Euro OSCON 2006 is a great read for anybody who explored MySQL fulltext searching but wondered whether anything more advanced is available. The authors experimented with a full dump of Wikipedia database.

So what does built-in MySQL fulltext search deliver? Support of MyISAM, natural language search, support for boolean operators, ft_min_word_len allowing Webmaster to specify minimal word length, stop word list and frequency-based ranking, the presentation tells us.

How can you optimize fulltext searches in MySQL? The index for the fulltext search should ideally easily fit into the available RAM. Stop words should contain the most widely used noise words, but make sure that users don’t search for them. Keep ft_min_word_len reasonably high - the default is 3, and any decrease would dramatically impact the performance, as particles and other noise words are all of a sudden indexed. Avoid counting the number of matches - although lack of total results is usually poor on user experience. Avoid sorting. Be careful with the WHERE clauses. Large LIMIT offsets, such as 1000, 10, cause significant slowness. Avoid GROUP BY, as usually it waits for all results to be fetched, and then regroups them. Boolean phrase searches are notoriously slow in MySQL fulltext. The MySQL fulltext search index is BTREE-based and every indexable word is a node in that BTREE. So whenever you add a new paragraph of text with a 1,000 new indexable words, there are 1,000 new entries to be added to the fulltext search index.

There are a number of homebrew solution trying to do a more efficient job with fulltext searches. Senna for MySQL is a fulltext search engine for MySQL tables. SQLSearch is another engine. Lucene is a Java-based search engine. mnoGoSearch is a Web search engine using MySQL as its backend. Sphinx Search is a high-performance easy-to-use MySQL fulltext search engine with support for snippets. TBGSearch is a vector-based search engine that the presentation advises for medium-size data sets.

PHP + MySQL links of the day

Patrick Galbraith from Grazr is describing MySQL multi-master replication, where two nodes replicate each other’s updates. Useful for the case when you’re running the product out of multiple data centers, and there is no predictability on where the writes will occur, i.e. both are hot MySQL servers.

IBM DeveloperWorks introduces us to XCache on PHP:

XCache is a relative newcomer, but many sites are reporting good results with it. In addition, it is easy to build, install, and configure because it’s implemented as a PHP extension. Recompiling Apache and PHP isn’t required. This article is based on XCache V1.2.0. It reliably supports PHP V4.3.11 to V4.4.4, PHP V5.1.x to V5.2.x, and early versions of PHP V6. (XCache doesn’t support PHP V5.0.x.) XCache works with mod_php and FastCGI, but not with the Common Gateway Interface (CGI) or the command-line PHP interpreter. The XCache source code builds on a variety of systems, including FreeBSD, Sun Solaris, Linux®, and (as shown here) on Mac OS X. XCache can be built on Microsoft® Windows®, as well, using the Cygwin UNIX® emulation environment or Visual C. You can build XCache for Cygwin or for native Win32. The latter target is compatible with the official Win32 release of PHP.

Linden Scripting Language intro

Dr. Dobb’s Journal runs a lengthy introduction to Linden Scripting Language, the language behind avatars and their interaction in Second Life:

LSL is a scripting language that runs server-side, on a piece of software called the simulator. The simulator does just what it’s name implies — it simulates the virtual world of Second Life. Each simulator runs everything for 16 acres of virtual land — buildings, physics, and of course, scripts. While you manipulate the script text in a form that is somewhat easy to read, the actual code that runs on the simulator is compiled. A compiler is a piece of software that takes the text version of the script and converts it into something that can actually run. In the case of LSL, the compiler exists within the Second Life viewer itself. In the future, it is likely that the compiler will move from the viewer into the Second Life simulators, but where the code is compiled isn’t very important. What matters is that the text is converted into a form that can run on the simulators.

12 rules for running a successful open source project

There’s a tech talk How Open Source Projects Survive Poisonous People (And You Can Too) on Google Video, where the guys who developed Subversion and currently work at Google (the company tends to hire a bunch of open source developers including VIM creator) share their tips on running an open source project.

Here’s a synopsis of their tips, although the video is worth watching for real-life stories and anecdotes they tell.

  1. Don’t strive for perfection - you will end up polishing and improving and adding on instead of releasing actual software?
  2. Don’t “paint the bikeshed” - spend 1 day on discussing the nuclear power plant plans and spend 1 week discussing which color to paint the bike shed for the workers who ride bikes to work.
  3. Don’t get obsessed with the process - you’re building a product, not optimal process.
  4. Have a direction - don’t just try to build the best software ever, and wind up with a feature creep. Pick a direction and stick with it.
  5. Send people to mail archives for a discussion that already happened in the past.
  6. Keep documentation on design decisions, bug fixes, mistakes.
  7. Have developers write consistent log messages.
  8. Send commit e-mails - this allows everybody to be in the loop on what’s being worked on and who does what.
  9. Don’t let people submit mega-projects they quietly worked on for the past few months - no one can review it, no one can test it, there should be incremental commits and branches.
  10. Don’t let people put their names at the top of the source code files - this discourages additions and bug fixes, since people feel the file is being “owned” by the author.
  11. “Patches welcome” should be a reply to the users who request a variety of new features that primarily suits their own specific use case.
  12. Try to avoid people that genuinely would like to code, but cannot follow a self-paced learning schedule, and require hand-holding and explanations almost on every issue they face.

Amazon: screw the simplicity, stick to what sells

I was reading a presentation by Ronny Kohavi and Matt Round from Amazon.com on Amazon using analytics for deciding on what the site looks like, what the top-level navigation looks like, and what the front page of Amazon.com looks like. Hat tip to Greg Linden for linking to it.

Amazon has been widely known for throwing a gauntlet to all the widely accepted Web 2.0 maxims. Standards support? Nah. Using simple URLs? Event front page directs to something like http://www.amazon.com/ref=topnav_gw_gw/104-6121055-7037564, although, granted, it doesn’t have the obidos links with a bunch of hyphenated nouns that it used to.

What’s interesting in that set of slide is defying another maxim - simple is beautiful. Granted, the implementation of simple might differ depending on what function you’re trying to accomplish. There are two slides discussing experimenting with Amazon front page in order to make it simpler.Amazon simple page 1 So why is the front page of Amazon such a hodge-podge of suggestions, recommendations, related items, new additions, shakers and movers and other recently viewed items? The answer is simple - it sells better.

Amazon simple page 2Simple design in Amazon’s case generated higher cart abandonment and statistically significant decreases in customer conversions. Which is all that matters in data-driven e-commerce company. So it looks like in Amazon’s case its customer not only do not do not appreciate simplicity thrown upon them, they actually enjoy and celebrate complexity, partying with their dollars when the front page is complex.

VIM tips for PHP developers

Fresh from VIM talk, I was curious to see Andrei Zmievski post VIM script files from his VIM for PHP programmers presentation. It’s not one of those 3 page presentations ending with “Read the VIM manual” either, it’s a 77-page guide to optimizing one’s VIM experience when writing PHP.