The Beauty of Data

Carrie Fowle, Data Scientist

Human-Centered Search Rankings

Improving results in a Hotel Search App 

This fall, I had the opportunity to work on a team with one other data scientist and two Sloan MBA students to develop a new algorithm for a travel-booking app. The firm we worked with has a strong history working in flight bookings, but recently created a new line of business in hotels. 

For our project, the firm asked us to explore how to best order hotel search results in order to maximize the number of bookings made through the app. Currently, they use a simple but effective approach of getting the likelihood of a hotel being booked using a logistic regression and then ordering hotels from most to least probable. Their data science team gave us access to around eleven million rows of user data encapsulating search and booking behavior.

Better performance in the predictive step

The first major improvement we made in the model came from thinking about which features would drive a user's interest in a particular hotel. We developed two sets of features (one pertaining to timing and the other two location) and added them to the existing model, finding that both sets of features improved the performance. Some of the features added included:

  • Days from Search to Check-in
  • Length of Stay
  • Distance from search location to city center
  • Distance from search location to hotel

Additionally, since the stakeholders were more interested in model performance than interpretability, we were able to use a more powerful model, XGBoost. From the results below, we can see that this shift alone had a large impact on model performance as measured by area under the curve (AUC)

The problem with prediction

We used our improved models to predict the probability that a user would book the hotel, but found that while this mostly worked well, it overlooked an aspect of human psychology:

In the mind of an algorithm, two identical hotels would have the same probability of being booked and therefore should be shown one after another. However, if a person was to book one of the identical hotels, they would book the first one and never see the second; if they weren't going to book the first, they aren't going to book the second, so the second just eats up valuable real estate.

We see this play out in the existing data. A user is shown two nearly identical hotels within a quarter of a mile of one another, but ends up booking a very different hotel much further down the list.

Thinking like people, not computers

Models like ours and the one already in the app come with a certain level of error baked in, so we wanted to order search results in such a way that not only used our prediction but also acknowledged its uncertainty. To do this we used the following method to order results:

  • Add the most probable hotel to the results
  • Discount the booking probability of the remaining hotels by a measure of similarity to the listed hotels
  • Add the hotel with the highest discounted likelihood
  • Update the discounted probability
  • Continue adding hotels to the list by their discounted probability and updating the discounted probability

In Summary

To improve the search results of a hotel search app we:

  • Created intuitive features pertaining to time and location
  • Utilized a more complex and better performing model
  • Considered how users search for hotels to better utilize our prediction