From 176fc215131a1c8b4c44878407f131237a116c5c Mon Sep 17 00:00:00 2001 From: Susan Li Date: Fri, 12 Oct 2018 02:26:21 -0400 Subject: [PATCH] Add file --- Fuzzy String Matching.ipynb | 1142 +++++++++++++++++++++++++++++++++++ room_type.csv | 104 ++++ 2 files changed, 1246 insertions(+) create mode 100644 Fuzzy String Matching.ipynb create mode 100644 room_type.csv diff --git a/Fuzzy String Matching.ipynb b/Fuzzy String Matching.ipynb new file mode 100644 index 0000000..fb8fd7d --- /dev/null +++ b/Fuzzy String Matching.ipynb @@ -0,0 +1,1142 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To ensure high quality in analytical modeling or analysis, data must be validated and cleansed. In our scenerio, we are given two sets of similar room type, one is sourced from Expedia, another is sourced from Booking.com. We will normalize both sets to have a common record. Fuzzy matching is a technique that I am using. It works with matches that may be less than 100% perfect. Fuzzy matching is blind to obvious synonyms.\n", + "\n", + "In this exercise, I take room type from Expedia, compare and match it's associated room type in Booking.com. In another words, we match records between two data sources.\n", + "\n", + "I have defined a match as something more like “a human with some experiences would have guessed these rooms were the same thing”. " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.read_csv('room_type.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ExpediaBooking.com
0Deluxe Room, 1 King BedDeluxe King Room
1Standard Room, 1 King Bed, AccessibleStandard King Roll-in Shower Accessible
2Grand Corner King Room, 1 King BedGrand Corner King Room
3Suite, 1 King Bed (Parlor)King Parlor Suite
4High-Floor Premium Room, 1 King BedHigh-Floor Premium King Room
5Traditional Double Room, 2 Double BedsDouble Room with Two Double Beds
6Room, 1 King Bed, AccessibleKing Room - Disability Access
7Deluxe Room, 1 King BedDeluxe King Room
8Deluxe RoomDeluxe Room (Non Refundable)
9Room, 2 Double Beds (19th to 25th Floors)Two Double Beds - Location Room (19th to 25th ...
\n", + "
" + ], + "text/plain": [ + " Expedia \\\n", + "0 Deluxe Room, 1 King Bed \n", + "1 Standard Room, 1 King Bed, Accessible \n", + "2 Grand Corner King Room, 1 King Bed \n", + "3 Suite, 1 King Bed (Parlor) \n", + "4 High-Floor Premium Room, 1 King Bed \n", + "5 Traditional Double Room, 2 Double Beds \n", + "6 Room, 1 King Bed, Accessible \n", + "7 Deluxe Room, 1 King Bed \n", + "8 Deluxe Room \n", + "9 Room, 2 Double Beds (19th to 25th Floors) \n", + "\n", + " Booking.com \n", + "0 Deluxe King Room \n", + "1 Standard King Roll-in Shower Accessible \n", + "2 Grand Corner King Room \n", + "3 King Parlor Suite \n", + "4 High-Floor Premium King Room \n", + "5 Double Room with Two Double Beds \n", + "6 King Room - Disability Access \n", + "7 Deluxe King Room \n", + "8 Deluxe Room (Non Refundable) \n", + "9 Two Double Beds - Location Room (19th to 25th ... " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I created the data set, so, it is very clean" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### FuzzyWuzzy\n", + "\n", + "Let's give a try, compare and match three pairs of the data.\n", + "\n", + "1). Ratio, - Compares the entire string similarity, in order." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "62" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from fuzzywuzzy import fuzz\n", + "fuzz.ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is telling us that the \"Deluxe Room, 1 King Bed\" and \"Deluxe King Room\" pair are about 62% the same." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "69" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The \"Traditional Double Room, 2 Double Beds\" and \"Double Room with Two Double Beds\" pair are about 69% the same." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "74" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The \"Room, 2 Double Beds (19th to 25th Floors)\" and \"Two Double Beds - Location Room (19th to 25th Floors)\" pair are about 74% the same." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I am a little disappointed with these. It turns out, the naive approach is far too sensitive to minor differences in word order, missing or extra words, and other such issues.\n", + "\n", + "2). Partial ratio, - Compares partial string similarity.\n", + "\n", + "We are still using the same data pairs." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "69" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.partial_ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "83" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.partial_ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "63" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.partial_ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Comparing partial string brings a little better results for some pairs.\n", + "\n", + "3). Token sort ratio, - Ignores word order." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "84" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.token_sort_ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "78" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.token_sort_ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "83" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.token_sort_ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The best so far.\n", + "\n", + "4). Token set ratio, - Ignores duplicated words. It is similar with token sort ratio, but a little bit more flexible." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "100" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.token_set_ratio('Deluxe Room, 1 King Bed', 'Deluxe King Room')" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "78" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.token_set_ratio('Traditional Double Room, 2 Double Beds', 'Double Room with Two Double Beds')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "97" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fuzz.token_set_ratio('Room, 2 Double Beds (19th to 25th Floors)', 'Two Double Beds - Location Room (19th to 25th Floors)')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Seems token set ratio is the best fit for my data. According to this discovery, I decided to apply token set ratio to my entire data set." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When setting ratio > 70." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ExpediaBooking.com
0Deluxe Room, 1 King BedDeluxe King Room
1Standard Room, 1 King Bed, AccessibleStandard King Roll-in Shower Accessible
2Grand Corner King Room, 1 King BedGrand Corner King Room
3Suite, 1 King Bed (Parlor)King Parlor Suite
4High-Floor Premium Room, 1 King BedHigh-Floor Premium King Room
5Traditional Double Room, 2 Double BedsDouble Room with Two Double Beds
6Room, 1 King Bed, AccessibleKing Room - Disability Access
7Deluxe Room, 1 King BedDeluxe King Room
8Deluxe RoomDeluxe Room (Non Refundable)
9Room, 2 Double Beds (19th to 25th Floors)Two Double Beds - Location Room (19th to 25th ...
\n", + "
" + ], + "text/plain": [ + " Expedia \\\n", + "0 Deluxe Room, 1 King Bed \n", + "1 Standard Room, 1 King Bed, Accessible \n", + "2 Grand Corner King Room, 1 King Bed \n", + "3 Suite, 1 King Bed (Parlor) \n", + "4 High-Floor Premium Room, 1 King Bed \n", + "5 Traditional Double Room, 2 Double Beds \n", + "6 Room, 1 King Bed, Accessible \n", + "7 Deluxe Room, 1 King Bed \n", + "8 Deluxe Room \n", + "9 Room, 2 Double Beds (19th to 25th Floors) \n", + "\n", + " Booking.com \n", + "0 Deluxe King Room \n", + "1 Standard King Roll-in Shower Accessible \n", + "2 Grand Corner King Room \n", + "3 King Parlor Suite \n", + "4 High-Floor Premium King Room \n", + "5 Double Room with Two Double Beds \n", + "6 King Room - Disability Access \n", + "7 Deluxe King Room \n", + "8 Deluxe Room (Non Refundable) \n", + "9 Two Double Beds - Location Room (19th to 25th ... " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def get_ratio(row):\n", + " name = row['Expedia']\n", + " name1 = row['Booking.com']\n", + " return fuzz.token_set_ratio(name, name1)\n", + "\n", + "df[df.apply(get_ratio, axis=1) > 70].head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9029126213592233" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df[df.apply(get_ratio, axis=1) > 70]) / len(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Over 90% of the pairs exceed a match score of 70." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ExpediaBooking.com
0Deluxe Room, 1 King BedDeluxe King Room
1Standard Room, 1 King Bed, AccessibleStandard King Roll-in Shower Accessible
2Grand Corner King Room, 1 King BedGrand Corner King Room
3Suite, 1 King Bed (Parlor)King Parlor Suite
4High-Floor Premium Room, 1 King BedHigh-Floor Premium King Room
5Traditional Double Room, 2 Double BedsDouble Room with Two Double Beds
6Room, 1 King Bed, AccessibleKing Room - Disability Access
7Deluxe Room, 1 King BedDeluxe King Room
8Deluxe RoomDeluxe Room (Non Refundable)
9Room, 2 Double Beds (19th to 25th Floors)Two Double Beds - Location Room (19th to 25th ...
10Room, 1 King Bed (19th to 25 Floors)King Bed - Location Room (19th to 25th Floors)
11Deluxe RoomDeluxe Double Room
12Junior Suite, 1 King Bed with Sofa BedJunior Suite
13Signature Room, 2 Queen BedsSignature Two Queen
14Signature Room, 1 King BedSignature One King
15Premium Room, 2 Queen BedsPremium Two Queen
16Studio, 1 King Bed with Sofa bed, CornerCorner King Studio
17Club Room, 2 Queen BedsClub Two Queen
18Club Room, 1 King BedClub One King
19Club Room, Premium 2 Queen BedsClub Premium Two Queen
20Suite, 1 BedroomOne Bedroom Suite
21Deluxe Room, City ViewDeluxe King Or Queen Room
22Deluxe Room, Lake ViewDeluxe King Or Queen Room with Lake View
23Club Room, City View (Club Lounge Access for 2...Club Level King Or Queen Room with City View
25Deluxe Room, 1 King BedDeluxe Room - One King Bed
26Deluxe Room, 2 Queen BedsDeluxe Room - Two Queen Beds
27Premier Room, 1 King Bed (Royal Club)Royal Club Premier Room - One King Bed
28Room, 2 Double Beds, Non SmokingDouble Room with Two Double Beds
29Room, 1 King Bed, Non Smoking (LEISURE)King Room
30Executive Room, 1 King Bed, Non SmokingExecutive King Room
.........
72Standard Room, Lagoon ViewStandard Room Dolphin Lagoon View
73Standard Room, Ocean ViewStandard Room With Ocean View
74Standard Room, OceanfrontStandard Room With Ocean Front View
75City Room, City ViewRoom With City View
76Room, Partial Ocean ViewPartical Ocean View Room
77Deluxe Room, CornerCorner Deluxe Studio
78Room, Ocean ViewRoom With Ocean View
79Room, 1 King Bed, City ViewCity View With One King Bed
80Room, 2 Double Beds, City ViewCity View With Two Double Beds
81Room, 2 Double Beds, Partial Ocean ViewPartial Ocean View With Two Double Beds
82Room, 2 Double Beds, Accessible, Partial Ocean...Accessible Partial Ocean View With Two Double ...
83Room, 2 Double Beds, Ocean View (Diamond Head)Club Diamond Head Ocean View With Two Double B...
84Club Room, 1 King Bed, OceanfrontClub Ocean Front With One King Bed
85Club Suite, 1 King Bed, Accessible, Ocean ViewAccessible Club Ocean View Suite With One King...
86Club Suite, 1 King Bed, Oceanfront (No Resort ...Club Ocean Front Suite King Bed
87Standard Room, Mountain View (City View - Kona...Kona Tower City / Mountain View
88Standard Room, Partial Ocean View (Kona Tower)...Kona Tower Partial Ocean View
89Standard Room, Partial Ocean View (Waikiki Tow...Waikiki Tower Partial Ocean View
90Standard Room, Ocean View (Waikiki Tower) - No...Waikiki Tower Ocean View
91Room, 2 Queen Beds (Waikiki View)Queen Room With Two Queen Beds and Waikiki View
92Room, Ocean ViewKing Or Two Queen Room With Ocean View
94Regency Club, Mountain ViewRegency Club Mountain View
95Regency Club, Ocean ViewRegency Club Ocean View
96Room, 1 King Bed, Ocean ViewOcean View Room With King Bed
97Room, 1 King Bed, Resort View (Alii)Alii Tower Resort View With King Bed
98Room, 1 King Bed, Accessible, Resort View (Ali...Alii Tower Resort View With King Bed - Mobilit...
99Room, 1 King Bed, Accessible, View (Rainbow, B...Rainbow Tower Ocean View With King Bed - Mobil...
100Room, 1 King Bed, Ocean View (Alii)Alii Tower Ocean View With King Bed
101Room, 1 King Bed, Oceanfront (Rainbow)Rainbow Tower Ocean Front with King Bed
102Junior Suite, 1 King Bed, Accessible (Roll-in ...Junior Suite - Accessible Roll-in Shower
\n", + "

101 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Expedia \\\n", + "0 Deluxe Room, 1 King Bed \n", + "1 Standard Room, 1 King Bed, Accessible \n", + "2 Grand Corner King Room, 1 King Bed \n", + "3 Suite, 1 King Bed (Parlor) \n", + "4 High-Floor Premium Room, 1 King Bed \n", + "5 Traditional Double Room, 2 Double Beds \n", + "6 Room, 1 King Bed, Accessible \n", + "7 Deluxe Room, 1 King Bed \n", + "8 Deluxe Room \n", + "9 Room, 2 Double Beds (19th to 25th Floors) \n", + "10 Room, 1 King Bed (19th to 25 Floors) \n", + "11 Deluxe Room \n", + "12 Junior Suite, 1 King Bed with Sofa Bed \n", + "13 Signature Room, 2 Queen Beds \n", + "14 Signature Room, 1 King Bed \n", + "15 Premium Room, 2 Queen Beds \n", + "16 Studio, 1 King Bed with Sofa bed, Corner \n", + "17 Club Room, 2 Queen Beds \n", + "18 Club Room, 1 King Bed \n", + "19 Club Room, Premium 2 Queen Beds \n", + "20 Suite, 1 Bedroom \n", + "21 Deluxe Room, City View \n", + "22 Deluxe Room, Lake View \n", + "23 Club Room, City View (Club Lounge Access for 2... \n", + "25 Deluxe Room, 1 King Bed \n", + "26 Deluxe Room, 2 Queen Beds \n", + "27 Premier Room, 1 King Bed (Royal Club) \n", + "28 Room, 2 Double Beds, Non Smoking \n", + "29 Room, 1 King Bed, Non Smoking (LEISURE) \n", + "30 Executive Room, 1 King Bed, Non Smoking \n", + ".. ... \n", + "72 Standard Room, Lagoon View \n", + "73 Standard Room, Ocean View \n", + "74 Standard Room, Oceanfront \n", + "75 City Room, City View \n", + "76 Room, Partial Ocean View \n", + "77 Deluxe Room, Corner \n", + "78 Room, Ocean View \n", + "79 Room, 1 King Bed, City View \n", + "80 Room, 2 Double Beds, City View \n", + "81 Room, 2 Double Beds, Partial Ocean View \n", + "82 Room, 2 Double Beds, Accessible, Partial Ocean... \n", + "83 Room, 2 Double Beds, Ocean View (Diamond Head) \n", + "84 Club Room, 1 King Bed, Oceanfront \n", + "85 Club Suite, 1 King Bed, Accessible, Ocean View \n", + "86 Club Suite, 1 King Bed, Oceanfront (No Resort ... \n", + "87 Standard Room, Mountain View (City View - Kona... \n", + "88 Standard Room, Partial Ocean View (Kona Tower)... \n", + "89 Standard Room, Partial Ocean View (Waikiki Tow... \n", + "90 Standard Room, Ocean View (Waikiki Tower) - No... \n", + "91 Room, 2 Queen Beds (Waikiki View) \n", + "92 Room, Ocean View \n", + "94 Regency Club, Mountain View \n", + "95 Regency Club, Ocean View \n", + "96 Room, 1 King Bed, Ocean View \n", + "97 Room, 1 King Bed, Resort View (Alii) \n", + "98 Room, 1 King Bed, Accessible, Resort View (Ali... \n", + "99 Room, 1 King Bed, Accessible, View (Rainbow, B... \n", + "100 Room, 1 King Bed, Ocean View (Alii) \n", + "101 Room, 1 King Bed, Oceanfront (Rainbow) \n", + "102 Junior Suite, 1 King Bed, Accessible (Roll-in ... \n", + "\n", + " Booking.com \n", + "0 Deluxe King Room \n", + "1 Standard King Roll-in Shower Accessible \n", + "2 Grand Corner King Room \n", + "3 King Parlor Suite \n", + "4 High-Floor Premium King Room \n", + "5 Double Room with Two Double Beds \n", + "6 King Room - Disability Access \n", + "7 Deluxe King Room \n", + "8 Deluxe Room (Non Refundable) \n", + "9 Two Double Beds - Location Room (19th to 25th ... \n", + "10 King Bed - Location Room (19th to 25th Floors) \n", + "11 Deluxe Double Room \n", + "12 Junior Suite \n", + "13 Signature Two Queen \n", + "14 Signature One King \n", + "15 Premium Two Queen \n", + "16 Corner King Studio \n", + "17 Club Two Queen \n", + "18 Club One King \n", + "19 Club Premium Two Queen \n", + "20 One Bedroom Suite \n", + "21 Deluxe King Or Queen Room \n", + "22 Deluxe King Or Queen Room with Lake View \n", + "23 Club Level King Or Queen Room with City View \n", + "25 Deluxe Room - One King Bed \n", + "26 Deluxe Room - Two Queen Beds \n", + "27 Royal Club Premier Room - One King Bed \n", + "28 Double Room with Two Double Beds \n", + "29 King Room \n", + "30 Executive King Room \n", + ".. ... \n", + "72 Standard Room Dolphin Lagoon View \n", + "73 Standard Room With Ocean View \n", + "74 Standard Room With Ocean Front View \n", + "75 Room With City View \n", + "76 Partical Ocean View Room \n", + "77 Corner Deluxe Studio \n", + "78 Room With Ocean View \n", + "79 City View With One King Bed \n", + "80 City View With Two Double Beds \n", + "81 Partial Ocean View With Two Double Beds \n", + "82 Accessible Partial Ocean View With Two Double ... \n", + "83 Club Diamond Head Ocean View With Two Double B... \n", + "84 Club Ocean Front With One King Bed \n", + "85 Accessible Club Ocean View Suite With One King... \n", + "86 Club Ocean Front Suite King Bed \n", + "87 Kona Tower City / Mountain View \n", + "88 Kona Tower Partial Ocean View \n", + "89 Waikiki Tower Partial Ocean View \n", + "90 Waikiki Tower Ocean View \n", + "91 Queen Room With Two Queen Beds and Waikiki View \n", + "92 King Or Two Queen Room With Ocean View \n", + "94 Regency Club Mountain View \n", + "95 Regency Club Ocean View \n", + "96 Ocean View Room With King Bed \n", + "97 Alii Tower Resort View With King Bed \n", + "98 Alii Tower Resort View With King Bed - Mobilit... \n", + "99 Rainbow Tower Ocean View With King Bed - Mobil... \n", + "100 Alii Tower Ocean View With King Bed \n", + "101 Rainbow Tower Ocean Front with King Bed \n", + "102 Junior Suite - Accessible Roll-in Shower \n", + "\n", + "[101 rows x 2 columns]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.apply(lambda row: fuzz.token_set_ratio(row['Expedia'], row['Booking.com']), axis=1) > 60]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9805825242718447" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df[df.apply(lambda row: fuzz.token_set_ratio(row['Expedia'], row['Booking.com']), axis=1) > 60]) / len(df)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/room_type.csv b/room_type.csv new file mode 100644 index 0000000..cfca570 --- /dev/null +++ b/room_type.csv @@ -0,0 +1,104 @@ +Expedia,Booking.com +"Deluxe Room, 1 King Bed",Deluxe King Room +"Standard Room, 1 King Bed, Accessible",Standard King Roll-in Shower Accessible +"Grand Corner King Room, 1 King Bed",Grand Corner King Room +"Suite, 1 King Bed (Parlor)",King Parlor Suite +"High-Floor Premium Room, 1 King Bed",High-Floor Premium King Room +"Traditional Double Room, 2 Double Beds",Double Room with Two Double Beds +"Room, 1 King Bed, Accessible",King Room - Disability Access +"Deluxe Room, 1 King Bed",Deluxe King Room +Deluxe Room,Deluxe Room (Non Refundable) +"Room, 2 Double Beds (19th to 25th Floors)",Two Double Beds - Location Room (19th to 25th Floors) +"Room, 1 King Bed (19th to 25 Floors)",King Bed - Location Room (19th to 25th Floors) +Deluxe Room,Deluxe Double Room +"Junior Suite, 1 King Bed with Sofa Bed",Junior Suite +"Signature Room, 2 Queen Beds",Signature Two Queen +"Signature Room, 1 King Bed",Signature One King +"Premium Room, 2 Queen Beds",Premium Two Queen +"Studio, 1 King Bed with Sofa bed, Corner",Corner King Studio +"Club Room, 2 Queen Beds",Club Two Queen +"Club Room, 1 King Bed",Club One King +"Club Room, Premium 2 Queen Beds",Club Premium Two Queen +"Suite, 1 Bedroom",One Bedroom Suite +"Deluxe Room, City View",Deluxe King Or Queen Room +"Deluxe Room, Lake View",Deluxe King Or Queen Room with Lake View +"Club Room, City View (Club Lounge Access for 2 guests)",Club Level King Or Queen Room with City View +"Club Room, Lake View (Club Lounge Access for 2 guests)",Club Level King Or Queen Room with Water View +"Deluxe Room, 1 King Bed",Deluxe Room - One King Bed +"Deluxe Room, 2 Queen Beds",Deluxe Room - Two Queen Beds +"Premier Room, 1 King Bed (Royal Club)",Royal Club Premier Room - One King Bed +"Room, 2 Double Beds, Non Smoking",Double Room with Two Double Beds +"Room, 1 King Bed, Non Smoking (LEISURE)",King Room +"Executive Room, 1 King Bed, Non Smoking",Executive King Room +"Suite, 1 King Bed, Non Smoking",King Suite +"Room, 1 Queen Bed, Non Smoking (Fairmont Room)",Queen Room +"Luxury Room, 1 Queen Bed, Non Smoking",Luxury Queen +"Luxury Room, 1 King Bed, Non Smoking",Luxury King +"Luxury Room, 2 Double Beds, Non Smoking",Luxury Double Room +"Deluxe Room, 1 King Bed, Non Smoking",Deluxe King Room +"Signature Room, 2 Double Beds, Non Smoking",Signature Double +"Signature Room, 1 King Bed, Non Smoking",Signature King +"Fairmont Gold, Suite, 1 King Bed, Non Smoking",Gold King Suite +"Room, 1 King Bed, Non Smoking, Business Lounge Access (Fairmont Gold)",Business King Room - Exclusive access to Gold Floor Lounge +"Room, 1 King Bed",King Room +Double Room,Double Room with Two Double Beds +"Business Plan, 1 King Bed",Business King Room +"Room, 1 Queen Bed, City View",Queen Room With City View +"Business Double Room, 2 Double Beds",Business Double Room With Two Double Beds +"Room, 2 Queen Beds, City View",Queen Room With Two Queen Beds and City View +Deluxe Suite,Deluxe Suite +"Room, 1 Queen Bed, Accessible, Bathtub",Queen Room - Disability Access +"Room, 1 King Bed",King Room +"Room, 2 Double Beds",Double Room with Two Double Beds +"King Room, Suite, 1 King Bed with Sofa bed",Deluxe King Suite With Sofa Bed +"Deluxe Suite, 1 King Bed, Non Smoking, Kitchen",Deluxe King Suite With Kitchenette +"Deluxe Room, 1 King Bed",Deluxe King Room +"Premier Room, 1 King Bed",Premier King Room +Premier Twin Room,Premier Queen Room With Two Queen Beds +"Suite, 1 Bedroom",One - Bedroom Suite +"Deluxe Suite, 1 Bedroom",Deluxe One - Bedroom Suite +"Classic Room, 1 King Bed",Classic King Room +"Luxury Room, 2 Queen Beds (Prestige)",Luxury Queen Room With Two Queen Beds +"Signature Suite, 1 Bedroom",One-Bedroom Suite +"Traditional Room, 1 King Bed",Standard King Room +"Room, 1 Queen Bed",Standard Queen Room With One Queen Bed +"Room, Accessible",Queen Room - Disability Access +"Deluxe Room, 1 Queen Bed (High Floor)",Deluxe Queen Room - High Floor With Free Wi-Fi +"Premium Room, 1 King Bed",Premium King Room With Free Wi-Fi +"Premium Room, 1 Queen Bed",Premium Queen Room With Free Wi-Fi +"Room, 1 King Bed, Pool View",King Room With Pool View +"Room, 2 Queen Beds, Garden View",Queen Room With Two Queen Beds and Garden View +"Club Room, 1 King Bed",Club King Room With Free Wi-Fi +"Club Room, 2 Queen Beds",Club Queen Room With Two Queen Beds and Free Wi-Fi +"Standard Room, Mountain View (Scenic)",Standard Room With Mountain View +"Standard Room, Lagoon View",Standard Room Dolphin Lagoon View +"Standard Room, Ocean View",Standard Room With Ocean View +"Standard Room, Oceanfront",Standard Room With Ocean Front View +"City Room, City View",Room With City View +"Room, Partial Ocean View",Partical Ocean View Room +"Deluxe Room, Corner",Corner Deluxe Studio +"Room, Ocean View ",Room With Ocean View +"Room, 1 King Bed, City View",City View With One King Bed +"Room, 2 Double Beds, City View",City View With Two Double Beds +"Room, 2 Double Beds, Partial Ocean View",Partial Ocean View With Two Double Beds +"Room, 2 Double Beds, Accessible, Partial Ocean View",Accessible Partial Ocean View With Two Double Beds +"Room, 2 Double Beds, Ocean View (Diamond Head)",Club Diamond Head Ocean View With Two Double Beds +"Club Room, 1 King Bed, Oceanfront",Club Ocean Front With One King Bed +"Club Suite, 1 King Bed, Accessible, Ocean View",Accessible Club Ocean View Suite With One King Bed +"Club Suite, 1 King Bed, Oceanfront (No Resort Charge)",Club Ocean Front Suite King Bed +"Standard Room, Mountain View (City View - Kona Tower) - No Resort Fee",Kona Tower City / Mountain View +"Standard Room, Partial Ocean View (Kona Tower) - No Resort Fee",Kona Tower Partial Ocean View +"Standard Room, Partial Ocean View (Waikiki Tower) - No Resort Fee",Waikiki Tower Partial Ocean View +"Standard Room, Ocean View (Waikiki Tower) - No Resort Fee",Waikiki Tower Ocean View +"Room, 2 Queen Beds (Waikiki View)",Queen Room With Two Queen Beds and Waikiki View +"Room, Ocean View",King Or Two Queen Room With Ocean View +"Room, Oceanfront",One King Or Two Queens - Ocean Front +"Regency Club, Mountain View",Regency Club Mountain View +"Regency Club, Ocean View",Regency Club Ocean View +"Room, 1 King Bed, Ocean View",Ocean View Room With King Bed +"Room, 1 King Bed, Resort View (Alii)",Alii Tower Resort View With King Bed +"Room, 1 King Bed, Accessible, Resort View (Alii, Bathtub)",Alii Tower Resort View With King Bed - Mobility Accessible Tub +"Room, 1 King Bed, Accessible, View (Rainbow, Bathtub)",Rainbow Tower Ocean View With King Bed - Mobility Accessible Tub +"Room, 1 King Bed, Ocean View (Alii)",Alii Tower Ocean View With King Bed +"Room, 1 King Bed, Oceanfront (Rainbow)",Rainbow Tower Ocean Front with King Bed +"Junior Suite, 1 King Bed, Accessible (Roll-in Shower)",Junior Suite - Accessible Roll-in Shower