I finished reading Scott Rosenberg’s Dreaming in Code the other day. A story about Chandler project by Open Source Applications Foundation and Mitch Kapor, the book, written by a non-engineer, really tries to provide perspective into software development world and tries to help the reader understand why software projects fail. Through some anecdotalevidence, we mainly find out about large publicized government software project failures. Most of the private failures remain undisclosed, as within large corporations projects never fail, it’s just their resources that get “re-allocated to higher priority projects.”
While Chandler is not a spectacular failure by itself, the book does talk about a few steps that were taken perhaps in the wrong direction. So can this knowledge be summarized into distinct no-nos for anybody participating on a software project?
1. From development project to integration project. Engineers like the concept of reusability, and so do their managers, for whom the the choice between developing their own in-house product, or using a third-party (sometimes open source) solution, is usually straightforward. Unfortunately, few pieces of software integrate smoothly, and if you decide to go with third-party or open source solution, most likely you do not have people with complete expertise in it. Not to reinvent the wheel, engineers of Chandler project were discussing Zope Object Database, but as the product requirements changed in the planning stage, so did the requirements towards ZODB. No one on the team was a ZODB expert, so most of the discussions required further “looking into” ZODB internals.
2. No agreement on final architecture, or inability to settle on one. This seemed to be the case with Chandler team, all consisting of superstars. Having a dream team for the project has one disadvantage - everybody in the team respects the opinions of the others, and if dissenting views are presented, occasionally it’s hard to move forward without hurt feelings. Sometimes people in the room are thinkiing “I disagree, but what do I know”, leaving the discussion up in the air with no decisions made.
3. Feature creep. One of the problems that Chandler had outright was its need to be revolutionary, which required a redesign every now and then. As more things were abstracted (an “item” could be an e-mail, a note, a task on the to-do list, an entry in the address book, etc.), there were more features to add into the product making the next release unattainable. In the software industry, the term feature creep refers to more requirements entering the scene, making a release a highly unlikely event.
4. Poor reusability. Most of the engineers who have been working in the field for a while probably keep a personal library of routines they reuse over and over. So code reusability on a personal level is pretty good. Code reusability within organization is sometimes pretty decent, depending on how the CTO or VP of Engineering sets up the code repository and what practices are followed. Code reusability in the open source world is pretty poor. Even though there are quite a few well-written libraries out there that integrate well with other’s projects, most of the stuff requires heavy customization, if you’re lucky enough to get it in the language that you need. However, Dreaming in Code maintains that perhaps the reason you started writing your own project was to develop some new technologies, not to combine the existing ones into a single package. Therefore integration pain is the price that has to be paid for a final product to be worthwhile.
The recent Slashdot discussion points out how hard it is to learn about system scalability, unless you already are working for a company that’s in the business of building large scalable systems. However, Google’s recent Seattle conference on scalability brought videos of some pretty interesting talks to the Web.
System Abstractions for Handling Large Datasets by Jeff Dean, Google
Building a Scalable Resource Mgmt System for Grid Computing by Khalid Ahmed, Platform Computing Corp.
YouTube Scalability by Cuong Do Cuong
Lessons In Building Scalable Systems by Reza Behforooz, Google Talk
Using MapReduce on Large Geographic Datasets by Barry Brumitt, Google, Inc.
Scaling Google for Every User by Marissa Mayer, Google
SCTPs Reliability and Fault Tolerance
VeriSign’s Global DNS Infrastucture
Lustre File System by Peter Braam, Cluster File Systems
Video conversion tools still remains one of the Windows shareware strongholds, where open source tools have not completely taken over. As a result, a regular search engine query for video conversion is filled with “optimized” commercial results, and it’s pretty hard to download anything that doesn’t stick a watermark in your video, or constraints you to converting (or ripping) just the first few minutes until you pay up $29.95 (which also seems to be the going rate). I am not against commercial software, but it nevertheless makes some free links worth savings. DVDVideoSoftprovides 10 free Windows tools, mostly video converters.
“Say (consumers) have a budget to spend $1,000 on a TV. They could probably buy a premium brand 32- or 37-inch, or they could buy a Vizio 47-inch for the same amount,” Patel said. “The low-price strategy is what’s driving consumers to them.”
Recently it seems everybody is experimenting with the ways to sell international and long-distance VOIP through some non-traditional ways. Perhaps this was triggered by Vonage getting sued, which is going to change the way residential VOIP is offered. Jajah was the first (or among the first) to offer Web-based call origination without being a Web-based VOIP provider, like Yahoo! Messenger with Voice, where you need a broadband Internet connection to call. Raketu is among the latest companies to do the same. You enter your phone number and destination phone number, your phone rings and then you get connected to the destination phone number. No speakers or PC microphone involved - you need to be connected to the Web to initialize the call. Raketu runs some kind of promotion right now:
Starting Monday, August 20, 2007, customers who purchase credits for Raketu’s prepaid voice over Internet protocol (VoIP) dial out calling service will receive up to 1200 minutes of free calling per month for three months to locations in 40 countries. To be eligible for the promotion, Raketu customers must purchase credits of either $9.95 or $24.95 to use towards Raketu’s ultra-low global calling rates.
I haven’t tried Raketu, but some time ago I tried out Jajah and was reasonably happy with the quality, so for $9.99 commitment it’s worth checking out.
Earlier today TechCrunch posted an item regarding Facebook servers exposing raw PHP code, with blogosphere echo chamber making its rounds, telling a more negative story each time around.
There are two important things that need to be addressed. First. No matter how sexy a theory about disgruntled employee or cunning attacker may sound, the story posted by Brandee in TechCrunch comments is somewhat duller - sometimes those .php files end up being served raw, not interpreted by PHP, on an Apache server.
Second. Source code is not user data. Not to go into Web Page Building 101 here (the course might be available at a local friendly community college), but data is stored in the databases, which are then accessed by some code (PHP in this case), and displayed to the user. What’s displayed is always visible to the user (View Source in your browser), the code is sometimes open (Wordpress, Joomla, Drupal) and sometimes not (pretty much any non-standard Web site out there), while DB is always locked down from outside peeks, unless you have developers do some stupid things, like leave username and password in the PHP code, and allow outside access. Generally speaking, even if I have all the source code for a certain Web site, it’s still impossible for me to take a peek at the data.
But most of you didn’t come here for the lesson in basic Web building. Judging by the title, you wanted to get Facebook source. The more the better. So here it is.
Facebook Thrift - developed, supported and actually used by Facebook, this is a set of libraries and code generators to allow for maximum throughput data transfers between a client and a server. If you’ve got some server that speaks C++ or Java, and some client that speaks Python or PHP, you can have those two living in perfect harmony, clients issuing the client requests in whatever language they prefer, and servers responding back with the data structures in their preferred language. Read the whitepaper here or join the group here. And guess what, you can download the source.
phpsh - another product written by Facebook engineers and used throughout the company. Ever wished PHP had an interactive shell, just like the one you get when you download Python? Facebook’s phpsh is written (get this) in Python, but offers some of the best interactive shell features to a PHP developer. Ever need to execute a single function just to see what the output will be? Just type the function name with parameters and see it run. Curious to see where a certain function lives? Just do d function_name to get the definition of that function together with its location in the codebase. e function_name opens up emacs, and gets you to the exact location of that function in the code. It’s downloadable here with source available.
Facebook toolbar for Firefox is also open source, since that’s the way Firefox extensions are distributed. Ever wanted to build a Firefox toolbar of your own incorporating some features of Facebook into it? By installing the toolbar, you get the sources for it placed in your Firefox extensions directory.
Facebook’s APC - what would you give for a copy of Facebook’s APC configuration? Don’t answer yet, as Facebook engineer Brian Shire provides it for free in his APC@Facebook talk he’s given at PHP conferences. It talks about optimal configuration and trade-offs one needs to consider when optimizing a large number of servers running PHP.
And finally, PHP scripting language. Not developed by Facebook, but actively used with some contributions to the codebase as well. In fact, a quick search around mailing list area lets you know what those contributions are. PHP is downloadable, with source, naturally, available to anyone who cares to peruse it.
Hopefully this will satiate any hunger for Facebook code, and when you feel yourself very comfortable with everything described above (or maybe none of that was news to you), feel free to drop me a line with a resume attached, if you so desire. The name is alex, what follow after @ should probably be obvious.
I spent this weekend in Las Vegas attending DefCon 15. This year there weren’t as many announcements of 0-day exploits as last year, but nevertheless three days were information-packed, with 5 tracks on Friday and Saturday starting at 10 am and ending at 10 pm. On Sunday they did a half-day that ended at 4 pm, with 3 tracks of presentations. There were, of course, some pretty cool events, like a Dateline NBC undercover reporter Michelle Madigan being outed. Below are some of the memorable talks.
Q&A with Bruce Schneier. The secret to leading an effective life (Schneier is the top US cryptography expert, a frequent blogger, quote generator, and restaurant reviewer) is not to watch television. He was also able to travel to Vegas without an ID. Apparently if you just tell the airline agent that you’ve lost your ID (which could happen, and you don’t need to be stuck somewhere else waiting for your state to send you back your ID), they will give you a ticket with NO ID stamp on it, which gives you access to the boarding area. Also, Schneier doesn’t think the encryption algorithms will need to change a great deal in the future, because what we have right now seems to be more or less sufficient. When Feds need to get the information off somebody’s computer, they don’t ask NSA to break into 1024-bit encryption on his PC, they just install the keylogger and get the passphrase.
Steve Dunker enlightened the public on facts and myths about police arrests. In case a hypothetical arrest happens, they don’t necessarily need to read you your Miranda rights, contrary to what you might have seen in the latest action movie. They only need to do it if they intend to use what you said in the court, which in the case of an obvious crime, or witnesses being out there, is not necessary.
Founder of Shmoo group Bruce Potter spoke about the “dirty secrets” of the security industry in a packed room. Bruce is a pretty popular guy, and generally attracts crowds to his speeches. He had some good points about security industry lacking fundamentals. We spend billions on firewalls, intrusion detection systems, authentication systems, etc., without realizing that the underlying problem for all the security concerns is crappy code. The reason you need that expensive firewall or IDS is because you cannot trust the application that’s running within your organization to correctly deal with weird data inputs, network connections, injections, etc. If your app was golden, none of this additional spending would need to happen. Instead the responsibility for dealing with data securely is offloaded to a third-party.
Founder of DefCon Dark Tangent told the story of CiscoGate, speaking at his conference for the first time (unfair advantage being the reason he avoids speaking at DefCon). That was a pretty intense talk, complete with lawyers from both ISS and Cisco calling Jeff Moss regarding Mike Lynn’s talk at Black Hat conference, as well as Cisco hiring a brigade of temp workers to rip out the pages of Mike Lynn’s presentation from the book of papers that’s provided to the conference attendees.
Johnny Long’s talks are usually a highlight of any day he presents, and this time he didn’t disappoint. Low-tech hacking presentation was all about figuring out important information without doing anything high tech. Dumpster-diving, getting important information from people’s parking badges, taking pictures of badged employees and then reproducing the badges, or even misrepresenting yourself as an AT&T employee, who’s here to check the integrity of the phone network, with AT&T laminated badge and all. Apparently, the whole process of lamination convinces any representative of the human race that somehow you are now an official employee of the organization, whose logo you display on your badge.
Broward Horne presented an interesting technique of analyzing click fraud through some unorthodox means. IAmFacingForeclosure.com managed to generate tons of negative press towards it, being, as the Web site claimed, a blog of someone who invested too much money into subprime real estate, and now could not make his payments, waiting on the government to kick in abd bail him out. As one can imagine, this strategy generated quite a few of resented readers, and the traffic to blog rose. Broward Horne was doing two things at the same time - measuring the blog’s traffic through Alexa, and also linking to it with the right terms, so that his site would show up on Google result list in proximity to IamFacingForeclosure, and therefore he’d get some portion of the traffic, a bone off the master’s table. Strangely, none of this happened. Alexa graph, unreliable as it is, stayed the same, and even though IamFacingForeclosure’s site traffic was supposedly skyrocketing, the site placed close to it in the search engines received no traffic whatsoever. When both Google and Yahoo! kicked IAmFacingForeclosure off their AdSense and Publisher Network programs, it was obvious that the author was engaged in click fraud - generating high-priced real-estate and mortgage-related content, placing Google and Yahoo! text ads, and then relying on an army of bots to click through the ads, thereby generating substantial revenue for the site. Of course, once the idea is out there, it’s relatively easy to now train the bot to ping Alexa or Compete whenever they’re visiting a site, but the analysis via third-party means was quite interesting nonetheless.
Dan Kaminsky’s talk is usually oversubscribed, and the same happened this time - the gigantic conference room was packed, with people sitting on the floor, and with goons shooing them away due to notorious Fire Marshal concerns. Kaminsky was talking about a current IT security myth that claims that outside attackers cannot get to your internal network due to firewalls and what not. They can, however, present a Web site to the user, suggest a Java applet or Flash application on the Web site, have those applications be granted sufficient permissions by a user on your network, and then access pretty much anything the user has access to. The highlight of the presentation was rebinding the DNS for some popular domains out there. You don’t need to completely divert the DNS, you need only to insert one additional A record specifying that, for example, paypal.com lives not only on the IP addresses defined in their whois, but also your own server. Now, the multitude of IP addresses presented in the DNS record is accompanied by the fact that a browser would choose a random one from the selection available, every once in a while taking the user to your compromised server. Create an invisible iframe with your code, a visible frame with Paypal’s official Web site, and JavaScript’s single origin policy effectively allows you to read and write DOM data to and from any Web site out there.
Gadi Evron spoke about botnets. A Google search for C99Shell returns 5,700 results, and while some of those are discussing the C99Shell, some of the results are the sites that have been compromised, frequently through their upload tools, to host a shell that pretty much has access to anything that the Web server can access. Even when it doesn’t maliciously harm the host, it can be used to generate spam, host files, etc. The Register also reported on the session dedicated to malware marketplace. Gadi Evron also spoke the same evening on “cyber-war” between Russia and Estonia, that according to him, looked more like some vigilante activity than an organized government vs. government attack. Estonia is essentially leading the world in e-government initiatives, with a bunch of their government and financial transactions happening exclusively online. A political scandal related to removal of a Russian monument riled up Russians, who passed the messages through blogosphere (mostly Livejournal and forums), instructing everyone who’s feeling insulted by Estonians to run a ping on major Estonian servers. Gadi did not go into details of the attack, as he was interested mostly in defending. It’s also very alarming that the country was not prepared for such level of attacks, and there was essentially no emergency plan. There’s a little bit more information on Gadi’s blog.
Steve Topletz from Hacktivismo Project announced the release of XB Machine, a completely anonymous virtual machine that can live on a Mini-CD or USB drive and operates via Tor network. Perry also discussed the current architecture of XeroBank (formerly known as TorPark) and reasons behind commercial services that it offers - XeroBank runs its own network in countries with the right privacy legislation, and completely encrypts all browsing transactions, making it impossible even for them to identify you properly. There were other future-looking announcements, but since each one was preceded by “I am not supposed to talk about this”, I won’t go into much detail - XeroBank will release the news when they’re ready.
Daniel Peck & Ben Feinstein introduced CaffeineMonkey, a tool to identify and explore potentially malicious JavaScript. From the tool Web site: “One of the key components of this tool is that it is behavior based, not signature based. It identifies specific behaviors that are indicative of malicious code. Building on the work of several existing client honeypot implementations, their goal is to largely automate the painstaking work of malicious software collection. The focus is on attacks using JavaScript for obfuscation or exploitation.”
Rick Deacon this morning talked about flaws discovered at MySpace.com site, specifically the MYUSERINFO cookie is susceptible to being stolen and then authenticated against MySpace. There are tons of reports on Rick Deacon’s presentation on the news wires today, even though apparently disclosure to MySpace has been made, and the trick only works in older versions of Firefox. Even though at the beginning of presentation he claimed that it might impact quite a few people, you got to respect Firefox’s upgrade model - you basically have no choice but to upgrade, when they tell you to. The new version is downloaded and installed, and then is just waiting for you to restart the browser, bugging you in between.
Aviv Raff & Iftach Ian Amit this morning were able to inject a malicious JavaScript widget into iGoogle homepage, and if that widget is being located on the same page as Gmail widget, the malicious widget can read the data on the page, which limits to Gmail senders and subject lines currently. They also discussed a vulnerability for Live.com RSS reader, which Microsoft fixed upon disclosure, and Yahoo! widget vulnerability, which Yahoo! fixed as well. As a side note, most of the fixes resulted in changing one or two lines of code. I asked Raff and Amit regarding exploitability of the Facebook profile code, and they generally were unfamiliar with the site, but said that external JavaScript was the underlying platform for all of the security exploits, so FBML code pushed by the app developer to the profile is safe. ComputerWorld also attended a session on AJAX exploits.
Brendan O’Connor spends his time studying the underlying security of the banking industry, specifically, the online banking and bill payment services. One error in security in that field, and customer’s information is completely exposed, which combined with e-statements, tax forms, and electronic copies of the checks that current online banking services keep could have a rather dire impact on customer’s finances. Discoveries from the talk? All those images the banks display to you to prevent phishing are sourced from a single database with the primary key into that database displayed in the ALT field. Get an account with an online bank, go to online banking sign up, start choosing your images by moving through their gallery, and within a few minutes or hours, depending on your skills, you should have a complete database of images supposedly verifying that the site is not a phishing site. Also, the challenges presented from the public sources are not consistent, therefore if somebody tries to sign up as yourself at a banking site, the first time they will be asked to verify the car purchased in 1995 - was it a Toyota, Honda, Ford or none of the above. Choose to decline the challenge, come back a few days later and the challenge question will remain the same, with the answers now represented by Mitsubishi, Ford, General Motors or none of the above. Notice anything interesting? Ford is present in both of them, therefore giving a potential attacker right answers about your personal information.
I missed the lockpicking presentation this year, since it was time for me to head for the airport. There was also an interesting WiFi presentation compromising Gmail addresses (but from the description, looks like some other Webmail providers could be vulnerable as well), which I missed.