Marks: 40
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.
The data contains the different data related to a food order. The detailed data dictionary is given below.
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
#remove warnings
import warnings
warnings.filterwarnings('ignore')
# read the data
df = pd.read_csv('foodhub_order.csv')
# returns the first 5 rows
df.head()
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
---|---|---|---|---|---|---|---|---|---|
0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | Not given | 25 | 20 |
1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | Not given | 25 | 23 |
2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5 | 23 | 28 |
3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3 | 25 | 15 |
4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4 | 25 | 24 |
The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.
df.shape
(1898, 9)
The dataset contains 1898 orders of food, and collects data on nine variables for each order. Those variables are: the order ID number, customer ID number, restaurant name, cuisine type, cost of the order, day of the week, rating, how long the food took to prepare, and the time it took to deliver the order.
# use info() to print a concise summary of the DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null object 3 cuisine_type 1898 non-null object 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null object 6 rating 1898 non-null object 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: float64(1), int64(4), object(4) memory usage: 133.6+ KB
The data set contains three types of data: numerical data with decimals (float), whole number data (integer), and object data type, which includes both numerical and non-numerical categorical data.
# coverting "objects" to "category" reduces the data space required to store the dataframe
# write the code to convert 'restaurant_name', 'cuisine_type', 'day_of_the_week' into categorical data
df["restaurant_name"]=df["restaurant_name"].astype("category")
df["cuisine_type"]=df["cuisine_type"].astype("category")
df["day_of_the_week"]=df["day_of_the_week"].astype("category")
# use info() to print a concise summary of the DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null category 3 cuisine_type 1898 non-null category 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null category 6 rating 1898 non-null object 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: category(3), float64(1), int64(4), object(1) memory usage: 102.7+ KB
Converting the restaurant name, cuisine type, and day of the week into the category data type aids the machine in applying data visualization models more appropriately in the future. By recognizing the restaurant name, cuisine type, and days of the week as categrories which often repeat, the data models in data visualization libraries can group data by category more easily than if it remained an object.
df.describe()
order_id | customer_id | cost_of_the_order | food_preparation_time | delivery_time | |
---|---|---|---|---|---|
count | 1.898000e+03 | 1898.000000 | 1898.000000 | 1898.000000 | 1898.000000 |
mean | 1.477496e+06 | 171168.478398 | 16.498851 | 27.371970 | 24.161749 |
std | 5.480497e+02 | 113698.139743 | 7.483812 | 4.632481 | 4.972637 |
min | 1.476547e+06 | 1311.000000 | 4.470000 | 20.000000 | 15.000000 |
25% | 1.477021e+06 | 77787.750000 | 12.080000 | 23.000000 | 20.000000 |
50% | 1.477496e+06 | 128600.000000 | 14.140000 | 27.000000 | 25.000000 |
75% | 1.477970e+06 | 270525.000000 | 22.297500 | 31.000000 | 28.000000 |
max | 1.478444e+06 | 405334.000000 | 35.410000 | 35.000000 | 33.000000 |
sns.distplot(df["cost_of_the_order"]);
sns.distplot(df["food_preparation_time"]);
sns.distplot(df["delivery_time"]);
For the numerical data given:
Cost of the order indicates that, on average, this sample of orders cost 16.50 dollars with a standard deviation of $7.48.
On average, orders take about 27 minutes to prepare, with a standard deviation of 4.63 minutes.
df["rating"]=df["rating"].replace('Not given', np.NaN)
#replacing all "Not given" ratings with "NaN" to make it recognizable to missing value command.
df["rating"]
0 NaN 1 NaN 2 5 3 3 4 4 ... 1893 5 1894 5 1895 NaN 1896 5 1897 NaN Name: rating, Length: 1898, dtype: object
df["rating"]=df["rating"].astype(float)
#converting categorical variable of rating to float data type to make recognizable for missing value command.
df["rating"]
0 NaN 1 NaN 2 5.0 3 3.0 4 4.0 ... 1893 5.0 1894 5.0 1895 NaN 1896 5.0 1897 NaN Name: rating, Length: 1898, dtype: float64
df["rating"].isna().sum()
736
736 orders, or 39% of values were not rated.
sns.countplot(y="restaurant_name",data=df, order=df["restaurant_name"].value_counts().iloc[:10].index);
#### Observations:
#The top ten restaurants with the most orders include:
#Shake Shack, the Meatball Shop, Blue Ribbon Sushi, Blue Ribbon Fried Chicken, Parm, RedFarm Broadway,
#RedFarm Hudson, TAO, Han Dynasty, and Blue Ribbon Sushi Bar & Grill.
sns.countplot(y="cuisine_type",data=df,order=(df["cuisine_type"].value_counts().index));
#Observation: The top 5 most ordered types of cuisine are: American, Japanese, Italian, Chinese, and Mexican.
sns.distplot(df["cost_of_the_order"], hist=True);
###Observation:
#With a right-skewed distribution, a greater number of orders cost less than the mean of $16.50
#than orders that cost more than the mean.
sns.boxplot(df["cost_of_the_order"]);
sns.countplot(y="day_of_the_week", data=df);
df.day_of_the_week.value_counts()
Weekend 1351 Weekday 547 Name: day_of_the_week, dtype: int64
#Observation:
#Most orders are placed on weekends.
#Of this sample, 1351 orders (or 71% of total orders) were placed on the weekend.
sns.countplot(y="rating", data=df);
df.rating.value_counts()
5.0 588 4.0 386 3.0 188 Name: rating, dtype: int64
#Observation: Aside from the 736 orders with no rating, the remaining 61% of orders received ratings between 3 and 5 stars.
#588 orders received 5 stars
#386 orders received 4 stars
#188 orders received 3 stars
sns.displot(data=df,x="food_preparation_time");
#Observation:
#Most orders take between 20-21 minutes, 25-26 minutes, 30-31 minutes, and 33-34 minutes to prepare.
sns.distplot(df["delivery_time"], hist=True);
#Observation:
#The delivery time has a high variablility and is slightly left-skewed, indicated that there are slightly more orders
#that take more time than average (~24 minutes) to arrive than orders that take less time than 24 minutes to arrive.
sns.countplot(y="restaurant_name",data=df, order=df["restaurant_name"].value_counts().iloc[:5].index);
df.restaurant_name.value_counts().iloc[:5]
Shake Shack 219 The Meatball Shop 132 Blue Ribbon Sushi 119 Blue Ribbon Fried Chicken 96 Parm 68 Name: restaurant_name, dtype: int64
Shake Shack, the Meatball Shop, Blue Ribbon Sushi, Blue Ribbon Fried Chicken, and Parm are the top 5 restaurants with the greatest number of orders, and account for 33.4% of all orders.
df[["cuisine_type","day_of_the_week"]].value_counts(sort=True)
cuisine_type day_of_the_week American Weekend 415 Japanese Weekend 335 Italian Weekend 207 American Weekday 169 Chinese Weekend 163 Japanese Weekday 135 Italian Weekday 91 Mexican Weekend 53 Chinese Weekday 52 Indian Weekend 49 Middle Eastern Weekend 32 Mediterranean Weekend 32 Mexican Weekday 24 Indian Weekday 24 Middle Eastern Weekday 17 Thai Weekend 15 Mediterranean Weekday 14 French Weekend 13 Korean Weekend 11 Southern Weekend 11 Spanish Weekend 11 Southern Weekday 6 French Weekday 5 Thai Weekday 4 Vietnamese Weekend 4 Weekday 3 Korean Weekday 2 Spanish Weekday 1 dtype: int64
American cuisine is the most popular cuisine to order on weekends, with 415 orders.
np.sum(df["cost_of_the_order"]>20)
555
twenty=(df["cost_of_the_order"]>20)
twenty.value_counts(normalize=True)
False 0.707587 True 0.292413 Name: cost_of_the_order, dtype: float64
555 orders, or 29.2% of all orders, cost more than $20.
df["delivery_time"].mean()
24.161749209694417
For all orders in the sample, the average delivery time is 24.16 minutes.
max_cost=df["cost_of_the_order"].max()#find the maximum cost of a single order
df1=df[df["cost_of_the_order"]==max_cost]#select the row containing the maximum cost
df1 #return the maximum row details
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
---|---|---|---|---|---|---|---|---|---|
573 | 1477814 | 62359 | Pylos | Mediterranean | 35.41 | Weekday | 4.0 | 21 | 29 |
For the order that had the maximum cost on a single order, the Customer ID number is 62359. The customer ordered Mediterranean food from the restaurant Pylos during a weekday, which cost $35.41, took 21 minutes to prepare, took 29 minutes to deliver, and was given a 4-star rating.
df1 = df.dropna().reset_index(drop=True) #drop missing values
df1=df1.drop(["customer_id"], axis=1) #drop cust id as numerically meaningless
df1=df1.drop(["order_id"], axis=1) #drop cust id as numerically meaningless
sns.pairplot(df1, hue="rating");#plot bivariate relationships between all numeric varaibles
#Observation: None of the bivariate pairplot analysis (with ratings noted in color) shows significant linear patterns between:
#- delivery time & cost of the order, delivery time & food prep time, delivery time & rating
#- cost of the order & food prep time, cost of the order & rating
#- food prep time & rating
sns.heatmap(df1.corr(), annot=True);
###Observations:
#There are no siginficant correlations between the following numerical variables: cost of the order,
#rating, food preparation time, and delivery time.
#However, cost of the order has a slightly positive correlation with both rating and food prep time, indicating:
#as cost of the order increases, rating may increase slightly.
#And as the time to prepare food increases, the cost of the of the order increases slightly.
#Cost of the order has a slightly negative correlation with delivery time, indicating:
#As cost of the order increases, the delivery time decreases slightly.
sns.boxplot(x="day_of_the_week", y="cost_of_the_order", data=df);
####Observation:
#The range of the cost of orders is slightly lower on weekends. The middle 50% of orders are
#almost exactly the same cost on weekends and weekends and weekdays. Both weekend and weekday orders costs tend
#to skew above their medians (both around $14), with maximums between $30 and $36.
sns.boxplot(x="day_of_the_week", y="delivery_time", data=df);
#The time to deliver orders tends to be higher on the weekdays than the weekends.
#This is shown by the higher minimum, maximum, and IQR for weekday delivery time,
#in comparison to the same measures for weekend order delivery times.
#The range and IQR for weekend orders is greater and has a greater spread than weekday orders, indicating that
#weekend order delivery times may have a higher variance than weekday delivery times.
sns.boxplot(x="day_of_the_week", y="food_preparation_time", data=df);
sns.boxplot(x="cost_of_the_order", y="cuisine_type", data=df);#create a boxplot of order costs for each cuisine
#Observations:
#Korean and Vietnamese food orders tend to be lower in price than other cuisines,
#with a few outliers for both types of foods' order cost.
#Stacked barplot showing top ten restaurants and counts of ratings.
sns.countplot(y="restaurant_name",hue="rating", data=df, order=df["restaurant_name"].value_counts().iloc[:5].index);
#Observations:
#Of the Top 5 restaurants with the largest number of orders, Shake Shack has the highest number of 4 and 5 star ratings.
#The Meatball Shop also has over 50 5-star ratings.
#Blue Ribbon Sushi and Blue Ribbon Fried Chicken have a mix of 3, 4, and 5 star reviews, although total reviews are
#fewer than Shake Shack and the Meatball Shop.
#Parm has fewer total number of reviews than the other four Top 5 restaurants, and has a slightly greater number of
#4 star than 5 star reviews.
#Based on this analysis, a restaurant's popularity seems to be strongly correlated with the number of reviews they have received.
#However, without 1 and 2 star ratings and the missing ratings data, we cannot test this correlation nor assume that
#a restaurant's number of reviews causes more frequent FoodHub orders.
df_valid1=df.groupby("restaurant_name").agg({"rating":["count","mean"]}).dropna()
df_valid1 #create new dataframe aggregating rating counts and average ratings by restaurant and dropping missing
# ratings
rating | ||
---|---|---|
count | mean | |
restaurant_name | ||
'wichcraft | 1 | 5.000000 |
12 Chairs | 2 | 4.500000 |
5 Napkin Burger | 2 | 4.000000 |
67 Burger | 1 | 5.000000 |
Amma | 2 | 4.500000 |
... | ... | ... |
Zero Otto Nove | 1 | 4.000000 |
brgr | 1 | 3.000000 |
da Umberto | 1 | 5.000000 |
ilili Restaurant | 13 | 4.153846 |
indikitch | 2 | 4.500000 |
156 rows × 2 columns
df_valid1.isna().sum()#check that missing values are dropped
rating count 0 mean 0 dtype: int64
df_valid2=df_valid1.loc[(df_valid1["rating"]["count"]>50)&(df_valid1["rating"]["mean"]>4.0)]
df_valid2 #filter new dataframe for restaurants with over 50 ratings and an average rating larger than 4.0
rating | ||
---|---|---|
count | mean | |
restaurant_name | ||
Blue Ribbon Fried Chicken | 64 | 4.328125 |
Blue Ribbon Sushi | 73 | 4.219178 |
Shake Shack | 133 | 4.278195 |
The Meatball Shop | 84 | 4.511905 |
The Blue Ribbon Fried Chicken, Blue Ribbon Sushi, Shake Shack, and the Meatball Shop all have a mean rating above 4.0 and over 50 reviews. Therefore, each of those restaurants should be advertised in the promotional offer.
df_25=df[df["cost_of_the_order"]>20]#create new dataframe containing orders over 20 dollars
df_25
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
---|---|---|---|---|---|---|---|---|---|
0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | NaN | 25 | 20 |
3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3.0 | 25 | 15 |
5 | 1477224 | 147468 | Tamarind TriBeCa | Indian | 25.22 | Weekday | 3.0 | 20 | 24 |
12 | 1476966 | 129969 | Blue Ribbon Fried Chicken | American | 24.30 | Weekend | 5.0 | 23 | 17 |
17 | 1477373 | 139885 | Blue Ribbon Sushi Izakaya | Japanese | 33.03 | Weekend | NaN | 21 | 22 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1884 | 1477437 | 304993 | Shake Shack | American | 31.43 | Weekend | 3.0 | 31 | 24 |
1885 | 1477550 | 97324 | Shake Shack | American | 29.05 | Weekday | 4.0 | 27 | 29 |
1892 | 1477473 | 97838 | Han Dynasty | Chinese | 29.15 | Weekend | NaN | 29 | 21 |
1893 | 1476701 | 292602 | Chipotle Mexican Grill $1.99 Delivery | Mexican | 22.31 | Weekend | 5.0 | 31 | 17 |
1895 | 1477819 | 35309 | Blue Ribbon Sushi | Japanese | 25.22 | Weekday | NaN | 31 | 24 |
555 rows × 9 columns
total_cost_25=df_25["cost_of_the_order"].sum()#sum of cost of all orders over 20 dollars
revenue_25=(total_cost_25*0.25)#generate revenue FoodHub made from orders over 20 dollars
revenue_25
3688.7275
df_5=df[(df["cost_of_the_order"]<=20) &
(df["cost_of_the_order"]>5)]#create new dataframe containing orders between 5 and 20 dollars
df_5
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
---|---|---|---|---|---|---|---|---|---|
1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | NaN | 25 | 23 |
2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5.0 | 23 | 28 |
4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4.0 | 25 | 24 |
6 | 1477894 | 157711 | The Meatball Shop | Italian | 6.07 | Weekend | NaN | 28 | 21 |
7 | 1477859 | 89574 | Barbounia | Mediterranean | 5.97 | Weekday | 3.0 | 33 | 30 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1890 | 1477316 | 164776 | TAO | Japanese | 15.67 | Weekend | 5.0 | 20 | 22 |
1891 | 1476981 | 138586 | Shake Shack | American | 5.82 | Weekend | NaN | 22 | 28 |
1894 | 1477421 | 397537 | The Smile | American | 12.18 | Weekend | 5.0 | 31 | 19 |
1896 | 1477513 | 64151 | Jack's Wife Freda | Mediterranean | 12.18 | Weekday | 5.0 | 23 | 31 |
1897 | 1478056 | 120353 | Blue Ribbon Sushi | Japanese | 19.45 | Weekend | NaN | 28 | 24 |
1334 rows × 9 columns
total_cost_5=df_5["cost_of_the_order"].sum()#sum of cost of all orders between 5 and 20.01 dollars
revenue_5=(total_cost_5*0.15)#revenue from all orders between 5 and 20.01 dollars
revenue_5
2477.5755000000004
total_rev=revenue_25+revenue_5 #total revenue from orders over 20 dollars and orders between 5 and 20.01 dollars
total_rev
6166.303
Total revenue for the sample of orders is $6166.30.
Note: The total delivery time is the summation of the food preparation time and delivery time.
df["total_time"]=df["delivery_time"]+df["food_preparation_time"]#create new column that adds two times together
df
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | total_time | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | NaN | 25 | 20 | 45 |
1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | NaN | 25 | 23 | 48 |
2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5.0 | 23 | 28 | 51 |
3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3.0 | 25 | 15 | 40 |
4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4.0 | 25 | 24 | 49 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1893 | 1476701 | 292602 | Chipotle Mexican Grill $1.99 Delivery | Mexican | 22.31 | Weekend | 5.0 | 31 | 17 | 48 |
1894 | 1477421 | 397537 | The Smile | American | 12.18 | Weekend | 5.0 | 31 | 19 | 50 |
1895 | 1477819 | 35309 | Blue Ribbon Sushi | Japanese | 25.22 | Weekday | NaN | 31 | 24 | 55 |
1896 | 1477513 | 64151 | Jack's Wife Freda | Mediterranean | 12.18 | Weekday | 5.0 | 23 | 31 | 54 |
1897 | 1478056 | 120353 | Blue Ribbon Sushi | Japanese | 19.45 | Weekend | NaN | 28 | 24 | 52 |
1898 rows × 10 columns
df_time=df[df["total_time"]>60]#create new dataframe only containing orders with total times greater than 60
df_time
order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | total_time | |
---|---|---|---|---|---|---|---|---|---|---|
7 | 1477859 | 89574 | Barbounia | Mediterranean | 5.97 | Weekday | 3.0 | 33 | 30 | 63 |
10 | 1477895 | 143926 | Big Wong Restaurant _¤¾Ñ¼ | Chinese | 5.92 | Weekday | NaN | 34 | 28 | 62 |
19 | 1477354 | 67487 | Blue Ribbon Sushi | Japanese | 16.20 | Weekend | 4.0 | 35 | 26 | 61 |
24 | 1476714 | 363783 | Cafe Mogador | Middle Eastern | 15.86 | Weekday | NaN | 32 | 29 | 61 |
54 | 1477760 | 130507 | Jack's Wife Freda | Mediterranean | 22.75 | Weekend | 3.0 | 35 | 29 | 64 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1869 | 1476923 | 50199 | J. G. Melon | American | 19.40 | Weekday | 4.0 | 35 | 26 | 61 |
1873 | 1478148 | 261371 | Shake Shack | American | 22.31 | Weekend | NaN | 35 | 28 | 63 |
1875 | 1478039 | 292343 | Amy Ruth's | Southern | 12.23 | Weekday | NaN | 32 | 33 | 65 |
1880 | 1477466 | 222734 | Shake Shack | American | 13.97 | Weekend | 5.0 | 35 | 27 | 62 |
1889 | 1478190 | 94152 | RedFarm Broadway | Chinese | 8.68 | Weekday | 3.0 | 33 | 30 | 63 |
200 rows × 10 columns
(200/1898)*100 #200 orders took over 60 minutes to prepare and deliver. There were 1898 orders total.
10.537407797681771
200 orders took over 60 minutes to deliver, or about 10.5% of total orders in this sample.
df_weekend=df[df["day_of_the_week"]=="Weekend"]#create new dataframe that only contains orders placed on the weekend
df_weekend["delivery_time"].mean()#for orders placed on the weekend, calculate the mean delivery time
22.4700222057735
df_weekday=df[df["day_of_the_week"]=="Weekday"]#create new dataframe that only contains orders placed on weekdays
df_weekday["delivery_time"].mean()#for orders placed on weekdays, calculate the mean delivery time
28.340036563071298
For deliveries placed on weekends, the average delivery time is 22.47 minutes. For deliveries placed on weekdays, the average delivery time is 28.34 minutes. Weekday deliveries could take longer for a number of reasons:
Restaurant Popularity: The top five restaurants with the most orders (Shake Shack, the Meatball Shop, Blue Ribbon Sushi, Blue Ribbon Fried Chicken, and Parm) account for 33.4% of all 1898 orders in the sample data. Offering promotional advertisments for these restaurants on the FoodHub app has the potential to drive demand for the restaurants and therefore revenue for FoodHub.
- Shake Shack has the highest number of 4 and 5 star ratings.
- The Meatball Shop also has over 50 5-star ratings.
- Blue Ribbon Sushi and Blue Ribbon Fried Chicken have a mix of 3, 4, and 5 star reviews, although total reviews are fewer than Shake Shack and the Meatball Shop.
- Parm has fewer total number of reviews than the other four Top 5 restaurants, and has a slightly greater number of 4 star than 5 star reviews.
- 736 orders, or 39% of all orders, were not rated.
- Only neutral (3 star) and positive (4 and 5 star) reviews are given. 50.6% of reviews given were 5-star reviews, 33.2% were 4-star reviews, and 16.1% were 3 star reviews. Customers with negative FoodHub experiences may not bother to rate the experience, or the values were left out.
In order to gain a better understanding of negative customer experiences, and to gain a holistic view of customer experiences with FoodHub for each restaurant, FoodHub should incentivize customers to leave honest reviews.
Cuisine Preferences: American, Japanese, and Italian food were most popular on both weekends and weekdays. To generate customer demand, FoodHub may want to offer promotions for customers who buy American, Japanese, and Italian food. FoodHub could also consider adding more American, Japanese, and Italian restaurants to their app.
Most cuisines tended to cost between 5 dollars and 35 dollars; however, Korean and Vietnamese food orders tended to be lower and less variable in price than other cuisines.
Order Costs: All orders cost between 4.47 dollars and 35.41 dollars. The average cost of an order is 16.50 dollars, but due to the right skew of the right-skewed distribution of costs, most orders cost more than the average.
Day of the Week: While the day of the week (weekday vs. weekend) doesn't seem to impact the cost of the order (both weekdays and weekends had a similar order cost distributions), or food preparation times, delivery times tend to be higher on weekdays (28 minutes on average) than on weekends (22 minutes on average). However, delivery times vary more during the weekend.
The average delivery time for all orders is 24 minutes, but due to the left skew of the data, most orders took more than 24 minutes to deliver.
In order to improve customer delivery times, FoodHub could recruit more delivery people during weekday rush hour, or optimize traffic routing on their app to avoid areas with heavy traffic on weekdays.
Preparation Time:
Preparation times vary greatly, with wost orders taking between 20-21 minutes, 25-26 minutes, 30-31 minutes, and 33-34 minutes to prepare. The average time to prepare and order is 27 minutes.
In order to shorten the times customers have to wait for food to be prepared, FoodHub could consider incentivizing restaurants that take a longer time to prepare food to prioritize FoodHub orders over other orders. They could achieve this through a promotional offering to the restaurants, or consider writing this condition of prioritization into existing contracts with the restaurants.