Travel Data 101 (part 2): Data Scrubbing

(I originally posted this article on SupplyExcellence as part of a series on using data in travel procurement)

Let’s assume that you’ve collected all your travel data – now what?  Just like with your other categories, you now have to scrub the data before it’s ready for your sourcing project. These two generic rules shouldn’t surprise you:

  • Discard abnormal values.  Your sourcing data is the basis for representing your future purchase patterns. Throw out the $35,000 plane tickets, the $300 per day car rentals, and the $1,800 per night hotel stays. While these may be fine audit items, they don’t belong in your sourcing data.
  • Embrace the 80/20 rule.  Travel data sets have very long tails…lots and lots of very small purchases in obscure places.  These purchases are of no value to your negotiations, so trim the small stuff from your spend files.  It makes the analysis much easier and the negotiations more relevant.

Here’s what you need to know about scrubbing Airline, Hotel and Rental Car data:

Airline Data

Air spend data needs to be grouped at this level: Point of Sale-City Pair-Carrier-Booking Class.  Data reported at this level are One-way equivalent Segments and Spend.  An ideal data record looks like: USA-LAXORD-AA-M, 25 segments, $60,000 spend.

Points of Sale (POS) indicates the country in which the ticket was sold.  POS has a big impact on the discounts offered by airlines, so get this right and standardize the spelling of each country, or use ISO Country Codes.

City Pairs indicate the origination and destination (aka markets) where your travelers flew.  You need these city pairs to be written in non-directional (aka bi-directional) format using airport codes (LAX, JFK, ORD, LHR) – not city codes (LOS, NYC, CHI, LON); not city names and not airport names.  Note that non-directional format always lists the two airport codes in each city pair in alphabetical order, regardless of the direction the traveler flew. It makes analyzing the data simpler.

Carrier: Convert any airline names to their standard IATA two-character airline codes, so AA, not American Airlines.

Every airline ticket has an inventory class in which each flight segment was booked.  Your agency captures this Booking Class code (a single letter, A-Z) for every ticket it books.  These codes are essential for airline negotiations. Don’t settle for text, such as First, Coach, Economy, Discount Economy or for a narrow set of booking classes, such as simply F (for First Class) and Y (for all Coach and Economy seats).

Segments count the number of one-way equivalents (OWEs) purchased in a city pair.  OWEs disregard any connection that the traveler may have used to get from his origination airport to his destination airport.  So, a round-trip from LAX to JFK, with connections in ORD (O’Hare) counts as two segments, the same as a non-stop round-trip between JFK and LAX.  You have to get these segment counts right, otherwise you’ll get screwy Average Segment Prices.

Travel agencies can report air spend either inclusive or exclusive of taxes.  Just be sure you know how taxes and fees, such as Passenger Facility Charges, fuel surcharges, extra baggage fees, etc., were handled in your travel agency data.  Best practice is to include taxes and exclude all other fees.

Your travel agency should be able to provide your data in the proper format.  If not, seriously consider getting a better travel agency.


Hotel data is just downright messy. Why? Hotel properties change brand names (aka “re-flag”) when they are sold or change franchises, so your Hilton spend last month may be Hyatt spend next month; there is no universal standard database of hotel names, so Marriot Tyson’s Corner is probably the same hotel as Tyson’s Corner Marriott; and while you may know your hotel spend from your corporate credit card reports, you probably won’t know the room rates or the splits between room rates, meals and other incidentals.

The main goal is to know the number of room nights (RNs) produced at each hotel property.  First, merge your travel agency’s hotel bookings with your company’s credit card hotel spend records.  The goal is to get your hotel spend data into a format close to this:

Hotel Name, Address, City, State, Country, Phone Number, $ Charged, $ Booked, # RNs Booked

Next, you need to consolidate the records that are for (or are likely for) the same hotel.  There’s no easy way to do this – it takes a lot of sorting and inspecting.

Once you have the data grouped by Hotel Name, your records will have Booked Room Nights and Booked Spend (these come from your travel agency), and/or an amount for Charged Spend (this comes from your credit card data).  You need to apply some rules for imputing the number of room nights if your Booked Spend is a lot less than your Charged Spend. Use your agency’s average Booked Rate for the property or a nearby competitor as a guide.

Finally, you’ll want to roll up each hotel’s spend and room nights to its brand and chain levels.  Brands are the names chains use for their different types of hotel quality and service offerings – Courtyard by Marriott, Residence Inn and SpringHill Suites are all brands under the Marriott chain.  Not all hotels are part of a brand or chain, but you need to consolidate the spend and room nights of those that are for maximum negotiating leverage.

Rental Cars

The best source for your rental car data is your contracted rental car supplier.  The key fields you need for a basic car rental RFP are:

Station, Car Class, Rentals, Rental Days, Mileage, Time Charges, Mileage Charges

Station means the location where the car was rented; usually an airport code, e.g., SFO or LAX.   Getting the station codes right is probably the hardest part…you want to be able to identify rentals that are in competitive markets, similar to hotels – so within a few miles, at most, of each other.

Fortunately, there’s not much more data scrubbing needed if you get your data from one of the major rental car firms.  You may want to get some additional details about your rental patterns so that you can more accurately model the bids.  These elements include number of rentals by mileage bracket (driven under 50 miles, between 50 and 75 miles, 75-100, etc.), number of inter-city or one-way rentals, number of weekly and month-long rentals, and rentals by day of week.

Data scrubbing is never fun – but it is always worthwhile.  Up next:  Travel Data Reporting

1 thought on “Travel Data 101 (part 2): Data Scrubbing

  1. Pingback: Review: AirPlus Data Reporting Tool « Gillespie's Guide to Travel Procurement

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.