Hacking Category

From the audience at Velocity2008: David Slays Goliath

The parade of morning keynotes and product introductions included some heavy hitters, namely Keynote and Google, and some lightweights, WhitePages.com. Taking a back seat to these guys is usually fine with us as we stay out of the line of fire and can make mistakes with very few people actually noticing. We also learn a lot from the bigboys and tend to follow their lead. After all, these are the industry heavies and can attract and hire more smart people and buy more infrastructure then we could ever hope to acquire.

Keynote demoed a free product today called KITE (Keynote Interest Testing Environment) which allows developers to instrument a workflow or click-stream in a web application and benchmark it from 3 locations in the US to gauge performance and user experience. For a free service it is pretty good, and the audience seemed somewhat receptive to the idea that it’s okay to send your performance data to Keynote, but someone behind me muttered, “how much does it cost for more than 3 datacenters, I wonder?”

Then Scott took the stage and announced Jiffy, which is basically the same product except you get performance telemetry from all the streams you care to implement from all of your users a few seconds after they happens. Oh, ours is free too and we give you the source.

A little guy with a slingshot (a good idea) and a few round rocks (some wicked free code) can do more than break a few windows …

Scott Ruthfield Announces Jiffy

WANTED: Mascot, no experience necessary

The WP contingent to Velocity2008 is barricaded in my hotel room at the Marriott. With only 10 short hours to go before this team releases Jiffy to the show, tension is the room is thicker than the chocolate sauce that came with our brownie and ice cream dessert. It’s like Dog Day Afternoon only without the profuse sweating and swearing, but it’s still early in the evening.

Velocity Conference Cramming Sessions

Our demands are simple: We want a Jiffy mascot. It’s got to be lovable like the BSD Daemon, on message like the Maytag repair man, and memorable like the Pets.com sock puppet. Image searches on google.com have not yielded anything useful.

So we’re asking, no begging, for someone to take pity on me and send us a Jiffy Mascot meeting the criteria above.

More about the Whitepages Developer API

Now that you’ve read Scott’s big picture posting about the new WhitePages.com public API offering, let me tell you a little about the down and dirty of developing our new API. Our data covers 180 million people and provides approximately 80% coverage of the US: when the opportunity came across my desk to build the API that would allow us to share that data, I was elated.

Let me start by giving you an overview of how we deal with the hard problems of searching those 180 million listings in under a quarter of a second and delivering them to the front-end website.

Some people think we have ‘just a database’ or ‘it’s just a website’. But what we do is hard work. We have multiple data vendors, some onsite and some offsite via their own API calls, all with differing data formats and the resulting merge issues that causes. Our onsite data takes up 3 terabytes of storage (with indexes) is rebuilt monthly with no identification to tie data together and handles billions of queries per year.

We use Oracle, Postgres, MySQL, and BerkeleyDB, depending on which has the strengths we need for any given job. We handle residential, nicknames, households, business listings and work number listings. Our data can be bizarre with fractional streets, decimal house numbers and misleading names like streets named “North”.

Yes, “it’s just a website” that happens to power 1300 affiliate sites, does over 100 million searches and has 34 million unique users per month. We have tiered, redundant systems with strict privacy controls that allow for non-published numbers and our own opt-out list.

All of this is built using Linux, apache mod_perl, our own special sauce and it runs on just 60 boxes (16 run our backend code). Our work includes an internal search API that is strong on speed, comprehensive with its searches and absolutely inappropriate to turn loose on the world (some of our return keys have bizarre names). What we needed was an extensible platform that would allow us to wrap our own API and make it palatable and easier to work with, and to provide multiple output formats.

Back in October, Colin (one of our Architects) and I sat down to sketch out what this would look like. We decided that we would leverage our known strengths and use apache mod_perl, a YAML file for config, Oracle for User preferences and an on-disk cache of those preferences to ensure reliability. It would have to be extensible to allow for new search types and versions and we would have to allow for small developers, large partners who could send millions of queries/day, and internal use. We considered SOAP but decided that a RESTful interface was easier for more people to interact with. We would provide an XSD for people to validate the XML against and JSON output for those who were doing JavaScript. New versions would only rev the version number when the output format changed but that everyone would get additional data entries and data fields as they become available.

We looked at writing our own user management system, but decided the way to go was to partner with Mashery and leave that and the community site up to their infrastructure while we focused on building the actual API.

We build OO Perl here so our first order of business after sketching out the rough requirements was to determine what classes would need to be built. We would need an Apache response handler which would handle the overall logistics, something to clean up and validate input, a class to take the output from the search and process it, and an output transformation class that would take that output and deliver it in whatever output format was requested. All of these factory classes would need to be versionable to allow for changes within our internal API as well as updates to our XSD as we build more functionality into our public API.

Once we got the generic framework worked out and determined that we could leverage it to handle the Mashery integration as well, it allowed us to bring Ewa onboard (giving her a good view of development from the other side of the fence). Ewa has been with WhitePages QA for just over three years and she is my go-to person if I need any question answered about testing our internal API. She stepped up and took point for Mashery integration without missing a beat in addition to her duties doing end to end testing of our final product.

All through November and into December, Colin and I worked out the details and wrote the code, taking the blueprints and making them real. By January 10th we had a rough and ready working version showcasing our three main search types and just in time, as Hack Week was looming and many members of our engineering team were chomping at the bit to get their hands on this API. Some of the products of Hack Week you can see showcased in our sample applications section over at developer.whitepages.com. It also allowed Zine, a new member of our QA team, to hit the ground running and start devising new and intricate ways to torture our poor code before we gave the final stamp of approval.

Hack Week was a lot of fun for me: I babysat the code as it was being really used for the first time and I found out for myself how easy it was to extend the framework to deliver data that isn’t accessed from our internal API. How easy? Well in two days I had two totally different methods built, both of them accessing raw data directly and serving it out. It’s always a pleasure to find out that your design decisions really do work out the way you plan.

So what have we been doing since then? Writing the Technical Documentation that you see at developer.whitepages.com/docs, fixing the bugs we’ve found during the QA testing phase, writing and testing the Mashery integration code, working with business to allow us to expose nearly all of our data to you, and generally wondering when the other shoe would drop. This project has gone way too smoothly and we really couldn’t have done it without the help of our full team. I was reflecting on the number of people who have had a hand in this process and while I won’t name them here, the number exceeds 30 and spans nearly every functional group within the 20% of the company that it represents.

It’s been a fun couple of months and I can’t wait to see what else we come up with for the API. I’ve got my list but I’m even more excited to see what other people will come up with in their own wish lists.

Cheers!

Dan Sabath, api lead dev

IE6’s :hover Padding Bug

It’s the browser that keeps on giving. Bugs that is. If your a web developer, you are probably familiar with the large number of quirks that are present in Internet Explorer. Especially in IE6. Seven years later and we are still finding bugs. Many of them are documented well on the web and some are not. Like the one I ran into recently. I can understand why this hasn’t been written about before. It’s is a bit hard to flush out. In order to reproduce the bug, the conditions need to be just right.

Here is the markup for this example:

<fieldset class="container">
<div class="innerfloat">
<a href="#">Link</a>
<div class="floatright">Right</div>
</div>
</fieldset>

and the stylesheet to accompany it.

fieldset.container{
padding-top:20px;
}
.container div.innerfloat{
float:left;
}
.container div.innerfloat .floatright{
float:right;
}
.container div.innerfloat a:hover{
background:red;
}

Take a look at the result in IE here and mouse over the link. What happens here is that IE recalculates the layout of the box when you mouse over the link and in this case, gets it wrong. IE seems to get confused and thinks it needs to apply any padding thats on the top of the container, on the bottom of the container. ARRG! It seems that anything that makes IE rethink about the container layout will make the bug go away. As far as I can tell, these are the requirements to repro this bug:

  • A container with the ‘hasLayout’ property set to true. That includes elements that have ‘hasLayout’ by default listed here or any element that has any of these css proporties. This element must have top padding applied to it.
  • Within the container, a element that is floated either left or right.
  • Within that element a link with a hover style other then ‘color’ and an element that is floated right.

What is the solution? Unfortunately this bug isn’t one that can be fixed using The Holy Hack. If anything, it is part of the cause of the bug in the first place. The best thing to do is to simply avoid it all together by taking any of the items above out of the equation. You can reset the ‘hasLayout’ of the container or remove the top padding on the container and use top margin on the innerfloat instead. Maybe one of those floated elements don’t need to be floated. There are many ways to go about squashing this bug. Out of curiosity, I contacted John at positioniseverything.net to see if this was a known bug. He said that he has seen variants of this bug before, so it is possible that you might find this again in other circumstances. When you do please let me know!

Chips and Dip and Dorks and Nerds

Like many small tech companies, our dreams are bigger than our staff: we’re always looking for those holes in the work week to try out new technologies, learn a new skill, or build a great prototype. But there’s always one more feature to write or bug to fix, and it’s hard to find the time.

Last week, we found the time. Jan 14-18 was our inaugural Hack Week, where our engineering, IT operations, data, and product design teams dropped their normal workload to build interesting things. Work began on Monday (or earlier), and we went pencils-down on Friday 2pm, for a series of 5-minute talks on each project, plus beer, carrots (not in the beer), and some weird chip-like things with a sour-cream-heavy dip.

We ended up with 30 projects, teams ranging from 1 to 7 in size, including

  • Porting features of the Smalltalk debugger to Perl
  • Using our geographic data to draw different kinds of maps - too bad the Zillow neighborhood data didn’t become available until midweek
  • Testing managing our code base with Subversion, Fisheye, and Crucible (yes, we’re still a cvs shop…)
  • Building bootable Windows CDs with core backup, recovery, and virus scanning tools for our install base (until we get everyone on MacBooks, anyway)

And a host of things we can’t talk about just yet. We have at least three projects which will make it out of Hack Week to the Interwebs - more to come on that later.

We had an esteemed judging panel which chose three winners - one from QA, one from Development, one from IT, so a nice mix - and each one received a chumby. I dropped one of the chumbys - seems like it should keep working, but we gave that one to the IT guy just in case.

The overall feedback was quite positive and we’ll be doing it regularly. One engineer did mention that it was “creativity with a gun to your head:” he’s planning his vacation for the same week. Can’t win ‘em all.