Building and testing at Facebook

I work on products at Facebook and get a lot of questions from friends and family about how we choose what to build. After 6+ years at Facebook focusing on continuous iteration and improvement, I'd like to share my thoughts on how we think about testing and launching products.

Why we test

I can say without reservation that we take our mission very seriously. So, though it may sound cliché, we really do think about how to make the world more open and connected with everything we do. We believe that people want to connect with each other in a way that is personal and meaningful and that modern technology can provide opportunities for doing this in ways and at scales that were previously not possible. While this vision unifies us, it is sufficiently broad that the exact path to realizing it is unclear.

The most obvious approach might be to imagine the future you want and build it. Unfortunately, that doesn't work that well because technology co-evolves with people. It’s a two step—technology pushes people to move forward and then people move past technology and it has to catch up. The way we see the future is constantly evolving and the path you take to get there matters.

Simply plotting the next step may sound easier than planning the entire course, but even that has its own unique challenges. Out of all the possible directions we could go, which do we choose first? How should a new feature behave? How can we transition from what exists today to what will exist in the future? Once we build something, how can we know if it is really a step in the right direction?

Sometimes the answer to these questions is intuitive. Sometimes we do user research. Sometimes we build prototypes and see how they feel. Often, however, we’re working on products that have no analog for comparison in research and whose merits are difficult to gauge in the abstract or at small scale.

To keep improving, we must constantly test different versions of Facebook with real people to even have a chance at creating the best possible experience.

How we test

Every day, we run hundreds of tests on Facebook, most of which are rolled out to a random sample of people to test their impact. For example, you may have seen a small test for saving news feed stories last week.

Other products might require network effects to be properly tested, so in those cases we launch to everyone in a specific market, like a whole country.

That’s why we’ve developed a sophisticated and flexible tool called gatekeeper to make sure tests don't collide with each another and that they provide statistically meaningful results. This allows us to roll things out slowly and make improvements as we go. Not every test we do ends up being integrated into the product, but even failed tests help us understand how to make Facebook easier to use, faster and more engaging.

We are sensitive to the fact that testing in this manner has a real cost. It means that people sometimes experience Facebook in a way that is inconsistent or less polished than they expect. In fact, the odds are good that everyone on Facebook has been, at some time, part of a test. We know this can be annoying so we do our best to minimize the impact of these tests, but the fact we are willing to incur it at all should make it clear how deeply we believe that they are one of our best opportunities to make the product better.

It is worth noting that the vast majority of the tests we do are small. For example: when people go to find friends, we used to show as many names and faces as we could fit on the screen to minimize scrolling. We ran a test which instead reduced the number of people we showed per page by 60% but gave each more space and a larger button to engage with, and we saw a 70% increase in friend requests. Given a more consumable interface, people were more able to find people they wanted to connect with. This may sound obvious but, in the words of growth engineering manager Mike Curtis, "It's always obvious why the winning test won… after you've run the test."

Getting it wrong

I'd like to give you an example that is rather uncharacteristic for Facebook, a product where we didn't follow our normal testing procedures or even our normal development practices. It was a feature I worked on a little over a year ago called the chat bar.

In our research, we found that the vast majority of chats from any individual on Facebook are to a relatively small group of their friends. The previous implementation of the buddy list required people to click, scroll through an alphabetized list of all their friends, and then click again to start the chat. It was pretty cumbersome.

Meanwhile, the average computer or laptop monitor had gotten much wider and many people browsing Facebook had a lot of whitespace on the margins that wasn't being put to good use. So we decided that for people with wide enough monitors, we would just show the friends they usually chat with on the right-hand side of Facebook.

If they wanted to chat with someone who didn’t appear, they could simply type their name into a search bar at the bottom of the window. The goal was to help people chat more quickly and easily with the people they cared about most.

Sounds easy enough, right? But after about two weeks of development we launched it internally to employees and lots of people hated it. Some people, it turns out, really liked to scroll through their buddy list just to see who was online, and we had inadvertently removed that functionality. At the same time, we were reluctant to add a scroll bar as the friend list was against the edge of the screen and there would then be two scroll bars immediately adjacent. What ensued was a month-long debate about which approach was correct. At the end of that month we were no closer to making a decision. I finally urged the team to just ship what we had built and measure the impact.

The result was clear: this was not an improvement. After the novelty wore off we saw the number of chats initiated on Facebook drop by 9%. In a system processing billions of chats per day, that’s a significant drop-off and a very bad product. Thankfully, our internal debate had given us a useful perspective on the problem, so the team quickly built a version that had a scroll bar. We launched it eight days after the initial product went out. With this revision, the product drove chats 4% above the original baseline.

At the end of the day, the month we spent debating the product internally was a waste of time. We understood both sides of the debate within a day but had no data to resolve it. We would have been better off, as would the people using Facebook, if we had just launched the wrong product on day one and fixed it quickly. Even better, instead of launching the product to everyone on Facebook, we should have tested it with a small group first. We would have realized that our solution didn’t work and would have fixed it. Only a handful of our users would have had a bad experience.

Ship early, ship often

This is what we mean when we say ship early, ship often. It means that when we're able to iterate, people on Facebook get better experiences sooner than they would otherwise.

When you see such dramatic results from the smallest tweaks, you realize how much opportunity there is to improve things—and we feel a constant sense of urgency to do so.

When a test goes out we look at the data immediately and adapt the products quickly. We do this on a daily basis. This cycle of iteration is the engine of progress and the people who use Facebook are not just the beneficiaries but are also intimately a part of the process. We don’t just develop this product for them, we develop it with them.