I spent two days last week at Percona Performance Conference – a free event that took place parallel to MySQL User Conference & Expo at Santa Clara Convention Center. The content was good, and the organizers packed an impressive amount of 51 presentations, a sessions of lightning talks, and two open Q&A sessions towards the end of day, into 24+ hours over 2 days.
In lieu of Sun acquisition by Oracle, there was a lot of conversation regarding where MySQL is headed. Michael (Monty) Widenius outlined three directions Oracle could take with MySQL: abandon, sell, embrace, and at Percona conference was giving more details on Maria storage engine for MySQL, which looks a lot like InnoDB storage engine, with a new type of indexes.
Here’s a rundown of the sessions I attended with a possible link to slides where available (many slides are in PDF):
- Return of Gearman – by Eric Day – Gearman is a “manager” server, assigning work to other available boxes, and coordinating the work between clients and workers (servers), so that clients can be unaware of the server pool, and send their requests directly to Gearman. Introducing this extra tier allows then for scaling out the server pool. Gearman also has a variety of native clients and fast protocols (few more presentations from Eric Day).
- Object-Oriented CSS – Nicole Sullivan presented an overview of the project. Contrary to the advertised name, it’s not a project to make CSS behave as expected, but more of a framework to ensure the same behavior of various grids, templates and modules across different browsers.
- Fighting MySQL replication lag – from Peter Zaitsev from Percona. Some pretty useful tips, such as running heavy-duty jobs (ALTER TABLE) directly on the slaves, and reconsidering hardware options for slaves.
- Scaling with Postgres – from Robert Treat – I was surprised at the quantity and quality of Postgres presentations at Percona conference. Having never dealt with Postgres in production, I don’t really have any good opinions on the presentations, but Robert’s slides provide a good deal of gotchas when scaling Postgres.
- Balanced Patricia Tries – Moshe Shadmon from ScaleDB – a good overview of ScaleDB, where it’s useful, and how it uses tries. It’s an interesting proprietary storage engine.
- Working with disk arrays – by Paul Tuckfield – first DBA at YouTube, and also first DBA at Paypal.
- Using proxy architectures – by Robert Hodges – was a pretty in-depth look at proxy architectures and some pitfalls.
- EMT Performance Monitoring – from my former colleague at Yahoo! Eric Bergen, now at Proven Scaling – EMT is a script for data collection and aggregation that Proven Scaling found useful to have on their clients’ boxes to get up-to-the-minute view of what exactly happened to the machine right before collapsing.
- Proactive Operational Measures – from Nicklas Westerlund and Augusto Bott – more like an overview of what could potentially go wrong when you’re tasked with the operations side of things.
I didn’t go to presentations about CouchDB, Amazon cloud recipes, and exploring new hardware (flash memory, multi-core processors) for database servers, but glad to see the authors posting the content of those.
Second day of Percona Conference:
- Disruptive innovations in open source – or where do we go from here in regards to MySQL from Baron Schwartz of Percona
- Performance instrumention – from Cary Millsap – has some pretty good stories even if you know nothing about databases. Such as: don’t ever ask people what the most common performance problems are, since then you’re likely to be led astray. Manuals are very likely to suggest One True Solution for any performance problems, but if you flip through the pages, you will find out there are many One True Solutions to similar sounding problems. Optimization of subsystems might still be useless, when the process on top is broken and un-optimizable.
- Pushing the envelope – by Don MakAskill of SmugMug – Don has to store pretty large amounts of data (raw photos in tens of megapixels) for paying clients, and the presentation covered SmugMug experience.
- Internals of InnoDB disk I/O by Mark Callaghan of Google (slides are posted on his blog)
- Hive – distributed data warehousing at Facebook – Hive has been open-sourced by my employer, and it’s a pretty useful layer if your data lives in Hadoop, but you cannot get everyone at the company to run map/reduce jobs. Ashish’s and Prasad’s presentations provides an overview of what Hive is. It’s also one of the few Facebook projects written in Java.
- Multi-terabyte install of Postgres – pretty impressive from Theo Schlossnagle
- Efficient pagination from Surat Bhati of Yahoo! – every Web developer probably dreads the moment when the pagination code generates something to the extent of SELECT * FROM images LIMIT 1000000,10. Yeah, it’s very unlikely the user will actually browse past the first million images, and Surat’s presentation primarily dealt with the ways of building interfaces around avoiding such queries. Spoiler alert: use Previous and Next, don’t link pages directly, most of the users don’t care about exact counts, and are perfectly fine with seeing “thousands” and “millions”, not “Comments 13,300 – 13,400 of 15,635,611”
- High performance MySQL on a limited hardware budget from Percona
I didn’t see: Hypertable, PostgreSQL trees, Zawodny’s search at Craigslist, InnoDB tuning, common mistakes, non-disruptive backups, high-performance Erlang, and tuning MySQL replication, but thankfully those are online.
A few more cool things: MySQL User Conference has videos of selected presentations posted at blip.tv, I didn’t watch all, but started wathing Don MakAskills’s on SmugMug Tale, and since he didn’t post the slides, that’s the best way to get his presentation. There’s also a full range of presentations from MySQL Conference 2009 available at the conference site.
Jeremy Zawodny posted MySQL and Search at Craigslist on SlideShare, they’re a Sphinx shop, going from 25 MySQL MyISAM FULLTEXT boxes to 10 Sphinx boxes:
Giuseppe Maxia posted presentation on MySQL 5.1 partitions:
Robert Hodges posted slides on Tungsten SQL Router, Tungsten replication and using Tungsten with RightScale.
Kazuho Oku shared experience on building a real-time stats service on top of MySQL (Pathtraq is one of the largest in Japan):
as well as slides on building a reliable message queue service, Q4M:
At MySQL Conference, Anders Karlsson did a talk on using libmysqld inside your application:
MySQL High Availability presentation from MySQL and bwin games:
MySQL 5.1 Event Scheduler:
Running multiple MySQL servers on one box:
