Purpose: To use SQL queries on a fictional company’s database to offer insights for their business strategy.
Due to a global health crisis, sales for the fictional company ‘Classic Models’, a model cars wholesaler, have slowed down. In this exercise, we will be using SQL queries to find cost optimizations and ways to improve sales.
The database is a sample MySQL database obtained from https://www.mysqltutorial.org/. It was downloaded on 9/28/2020, and the data was loaded into MySQL. It is based on a company that wholesales classic model cars. The company has seven branch offices worldwide. The company makes sales all around the world. The dates for the records are from 2003-2005.
First, we should examine what the population of the database looks like. How many rows are in each table in the database?
SELECT 'Customers' AS 'Table', COUNT(*) AS Num_Rows FROM customers
UNION
SELECT 'Employees' AS 'Table', COUNT(*) AS Num_Rows FROM employees
UNION
SELECT 'Offices' AS 'Table', COUNT(*) AS Num_Rows FROM offices
UNION
SELECT 'Order Details' AS 'Table', COUNT(*) AS Num_Rows FROM orderdetails
UNION
SELECT 'Orders' AS 'Table', COUNT(*) AS Num_Rows FROM orders
UNION
SELECT 'Payments' AS 'Table', COUNT(*) AS Num_Rows FROM payments
UNION
SELECT 'Product Lines' AS 'Table', COUNT(*) AS Num_Rows FROM productlines
UNION
SELECT 'Products' AS 'Table', COUNT(*) AS Num_Rows FROM products;
Table | Num_Rows |
---|---|
Customers | 122 |
Employees | 23 |
Offices | 7 |
Order Details | 2996 |
Orders | 326 |
Payments | 273 |
Product Lines | 7 |
Products | 110 |
Now, we will examine each of the table in the database to understand the operations of ‘Classic Models’ for the years 2003-2005. Each table will have at least one question to be answered with a query. The results from these queries will help us make recommendations to the company.
Who are the highest spending customers?
SELECT customerName,
contactLastName,
contactFirstname,
city,
state,
SUM(quantityOrdered*priceEach) AS totalSpent,
MAX(orderDate) AS LastOrder
FROM orderdetails JOIN
orders USING (orderNumber) JOIN
customers USING (customerNumber)
GROUP BY customerNumber
ORDER BY totalSpent DESC;
customerName | contactLastName | contactFirstname | city | state | totalSpent | LastOrder |
---|---|---|---|---|---|---|
Euro+ Shopping Channel | Freyre | Diego | Madrid | NA | 820689.5 | 2005-05-31 |
Mini Gifts Distributors Ltd. | Nelson | Susan | San Rafael | CA | 591827.3 | 2005-05-29 |
Australian Collectors, Co. | Ferguson | Peter | Melbourne | Victoria | 180585.1 | 2004-11-29 |
Muscle Machine Inc | Young | Jeff | NYC | NY | 177914.0 | 2004-12-01 |
La Rochelle Gifts | Labrune | Janine | Nantes | NA | 158573.1 | 2005-05-31 |
Dragon Souveniers, Ltd. | Natividad | Eric | Singapore | NA | 156251.0 | 2005-03-02 |
Down Under Souveniers, Inc | Graham | Mike | Auckland | NA | 154622.1 | 2005-04-08 |
Land of Toys Inc. | Lee | Kwai | NYC | NY | 149085.1 | 2004-11-15 |
AV Stores, Co. | Ashworth | Rachel | Manchester | NA | 148410.1 | 2004-11-17 |
The Sharp Gifts Warehouse | Frick | Sue | San Jose | CA | 143536.3 | 2005-04-22 |
The highest spending customer is ‘Euro + Shopping Channel’ located in Madrid. If sales are slow, it might be a good idea to offer discounts to the top customers to incentivize purchases. We can also see in the ‘LastOrder’ column the date of the last order. Customers who have not ordered in some time could be targeted with promotions to remind them to re-stock. Another approach would be to focus on the lesser spending customers. The company could try to incentivize them to spend more.
Which employees have the highest total sales volumes?
SELECT salesRepEmployeeNumber,
employees.lastName,
employees.firstName,
employees.email,
SUM(quantityOrdered*priceEach) AS totalSales
FROM orderdetails JOIN orders USING (orderNumber)
JOIN customers USING (customerNumber)
JOIN employees ON
customers.salesRepEmployeeNumber = employees.employeeNumber
GROUP BY salesRepEmployeeNumber
ORDER BY totalSales DESC;
salesRepEmployeeNumber | lastName | firstName | totalSales | |
---|---|---|---|---|
1370 | Hernandez | Gerard | ghernande@classicmodelcars.com | 1258577.8 |
1165 | Jennings | Leslie | ljennings@classicmodelcars.com | 1081530.5 |
1401 | Castillo | Pamela | pcastillo@classicmodelcars.com | 868220.6 |
1501 | Bott | Larry | lbott@classicmodelcars.com | 732096.8 |
1504 | Jones | Barry | bjones@classicmodelcars.com | 704853.9 |
1323 | Vanauf | George | gvanauf@classicmodelcars.com | 669377.1 |
1612 | Marsh | Peter | pmarsh@classicmodelcars.com | 584593.8 |
1337 | Bondur | Loui | lbondur@classicmodelcars.com | 569485.8 |
1611 | Fixter | Andy | afixter@classicmodelcars.com | 562582.6 |
1216 | Patterson | Steve | spatterson@classicmodelcars.com | 505875.4 |
It could be wise to offer performance bonuses to the highest selling employees. That would motivate employees to pursue more sales. The company could also have tiers of performance bonuses to incentivize employees to reach the next bonus level of sales. If sales are slow, the company may need to layoff the lowest performance employees.
Which office had the highest sales volumes?
SELECT officeCode,
CONCAT(
COALESCE(CONCAT(o.addressLine2,' - '), ''),
COALESCE(CONCAT(o.addressLine1, ', '), ''),
COALESCE(CONCAT(o.city), ''),
COALESCE(CONCAT(', ', o.state), ''),
COALESCE(CONCAT(', ', o.country), '')
) AS Address,
o.phone,
SUM(quantityOrdered*priceEach) AS totalSales
FROM orderdetails JOIN orders USING (orderNumber)
JOIN customers USING (customerNumber)
JOIN employees ON
customers.salesRepEmployeeNumber = employees.employeeNumber
JOIN offices o USING (officeCode)
GROUP BY officeCode
ORDER BY SUM(quantityOrdered*priceEach) DESC;
officeCode | Address | phone | totalSales |
---|---|---|---|
4 | 43 Rue Jouffroy D’abbans, Paris, France | +33 14 723 4404 | 3083761.6 |
7 | Level 7 - 25 Old Broad Street, London, UK | +44 20 7877 2041 | 1436950.7 |
1 | Suite 300 - 100 Market Street, San Francisco, CA, USA | +1 650 219 4782 | 1429063.6 |
3 | apt. 5A - 523 East 53rd Street, NYC, NY, USA | +1 212 555 3000 | 1157589.7 |
6 | Floor #2 - 5-11 Wentworth Avenue, Sydney, Australia | +61 2 9264 2451 | 1147176.4 |
2 | Suite 102 - 1550 Court Place, Boston, MA, USA | +1 215 837 0825 | 892538.6 |
5 | 4-1 Kioicho, Tokyo, Chiyoda-Ku, Japan | +81 33 224 5000 | 457110.1 |
Since the Paris office has the highest revenue, the company may invest more heavily in that office. If sales are slow, it might be necessary to close the Tokyo office, since it collects the lowest revenue. The Asian business would be handled by the Australian office.
Let’s look at the office with the highest total sales in Paris. For this office, calculate the 31-day average number of orders.
SELECT fulldate,
day,
month,
year,
IFNULL(NumOrders,0) AS NumberOfOrders,
AVG(IFNULL(NumOrders,0)) OVER (ORDER BY year, month, day
ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING) AS 31_day_avg
FROM calendars LEFT JOIN
(
SELECT orderDate,
COUNT(*) AS NumOrders
FROM orders
JOIN customers USING (customerNumber)
JOIN employees ON customers.salesRepEmployeeNumber = employees.employeeNumber
WHERE officeCode = 4
GROUP BY orderDate
ORDER BY COUNT(*) DESC
) AS ord
ON calendars.fulldate = ord.orderDate
ORDER BY 31_day_avg DESC;
fulldate | day | month | year | NumberOfOrders | 31_day_avg |
---|---|---|---|---|---|
2004-11-25 | 25 | 11 | 2004 | 2 | 0.4194 |
2003-11-18 | 18 | 11 | 2003 | 0 | 0.3871 |
2003-11-19 | 19 | 11 | 2003 | 1 | 0.3871 |
2004-11-24 | 24 | 11 | 2004 | 3 | 0.3871 |
2004-11-26 | 26 | 11 | 2004 | 0 | 0.3871 |
2004-11-27 | 27 | 11 | 2004 | 0 | 0.3871 |
2004-11-30 | 30 | 11 | 2004 | 0 | 0.3871 |
2004-12-01 | 1 | 12 | 2004 | 0 | 0.3871 |
2004-12-02 | 2 | 12 | 2004 | 1 | 0.3871 |
2004-12-03 | 3 | 12 | 2004 | 0 | 0.3871 |
For the Paris office, we can see that the days with the highest 31-day average of orders are all between mid-November and early-December. Clearly, this office is busiest with orders just prior to the holiday season. This information can be used to prepare staffing and resources for those busy times. In contrast, January, March, and September have the lowest 31-day average of orders. In order to reduce costs, the company can operate under minimal staffing during the slow periods.
How can the client decide how to make promotions to incentivise purchasing greater quantities?
SELECT IF (quantityOrdered < 35.2190, 'few', 'many') AS few_many,
AVG(priceEach) AS avg_price
FROM orderdetails
GROUP BY few_many;
few_many | avg_price |
---|---|
few | 90.49135 |
many | 91.07370 |
According to this data, the average price is higher when the quantities are above average, and the average price is lower when the quantities are below average. It may not be wise to offer discounts for greater quantities without further research.
Which month has the most total items ordered?
SELECT MONTH(ord.orderDate),
SUM(quantityOrdered) AS TotalQuantityOrdered
FROM (
SELECT orderNumber,
orderDate,
quantityOrdered
FROM orders
JOIN orderdetails USING (orderNumber)
) AS ord
GROUP BY MONTH(ord.orderDate)
ORDER BY TotalQuantityOrdered DESC;
MONTH(ord.orderDate) | TotalQuantityOrdered |
---|---|
11 | 21540 |
10 | 11214 |
5 | 9464 |
3 | 8294 |
1 | 7997 |
2 | 7959 |
4 | 7906 |
12 | 7769 |
8 | 6538 |
7 | 5721 |
November has the most items ordered, followed by October and May. Customers are likely stocking up for the holiday shopping season. The company should prepare stock and staffing as needed for each month. From a marketing perspective, Classic Models should continue to capitalize on the holiday shopping season.
Over the three years where data was collected, how many orders were there per year?
SELECT COUNT(*) numOrders,
YEAR(orderDate) AS orderYear
FROM orders
GROUP BY orderYear;
numOrders | orderYear |
---|---|
111 | 2003 |
151 | 2004 |
64 | 2005 |
2004 had the highest number of orders, followed by 2003 and 2005. The company may review their sales strategies from the three years to see if a particular strategy led to the increase in sales in 2004.
Which month has the most orders?
SELECT MONTH(orderDate) AS Month,
COUNT(*) AS Orders
FROM orders
GROUP BY Month
ORDER BY Orders DESC;
Month | Orders |
---|---|
11 | 63 |
10 | 31 |
4 | 29 |
5 | 29 |
3 | 27 |
2 | 26 |
1 | 25 |
12 | 22 |
9 | 20 |
6 | 19 |
By far, November has the most orders. This is likely due to holiday shopping. It is followed by October, April and May. The company should be aware of the yearly trends in order to prepare stock and staffing.
Calculate the 31-day averages of total units shipped for each day.
SELECT fulldate,
day,
month,
year,
IFNULL(ship.QtyShipped,0) AS QtyShipped,
AVG(IFNULL(ship.QtyShipped,0)) OVER (ORDER BY year, month, day
ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING) AS QtyShipped_31_day_avg
FROM calendars LEFT JOIN
(
SELECT shippedDate,
SUM(quantityOrdered) AS QtyShipped
FROM orders
JOIN orderdetails USING (orderNumber)
GROUP BY shippedDate
) AS ship
ON calendars.fulldate = ship.shippedDate
ORDER BY QtyShipped_31_day_avg DESC;
fulldate | day | month | year | QtyShipped | QtyShipped_31_day_avg |
---|---|---|---|---|---|
2003-11-23 | 23 | 11 | 2003 | 0 | 416.8065 |
2003-11-24 | 24 | 11 | 2003 | 646 | 416.8065 |
2003-11-22 | 22 | 11 | 2003 | 0 | 414.6129 |
2003-11-21 | 21 | 11 | 2003 | 0 | 396.1935 |
2004-11-20 | 20 | 11 | 2004 | 261 | 390.8710 |
2004-11-21 | 21 | 11 | 2004 | 158 | 381.0323 |
2003-11-19 | 19 | 11 | 2003 | 0 | 377.2581 |
2003-11-20 | 20 | 11 | 2003 | 1209 | 377.2581 |
2004-11-22 | 22 | 11 | 2004 | 0 | 375.2581 |
2003-11-25 | 25 | 11 | 2003 | 585 | 373.0645 |
The top-10 highest 31-day average quantities shipped are all in mid to late November of 2003 and 2004. It is clear that the warehouse will be busiest between mid-November and early-December. This information will be important for making decisions about resource allocation for the staff, materials, and transportation required by the warehouse. It should also factor into when to stock up the warehouse. On the other hand, January, April and June have low 31-day average quantities shipped. The company could save money by reducing resources during those months.
Over the three years where data was collected, what was the total revenue each year?
SELECT YEAR(paymentDate) AS paymentYear,
FORMAT(SUM(amount), 2) AS totalPaymentsReceived
FROM payments
GROUP BY paymentYear
ORDER BY paymentYear;
paymentYear | totalPaymentsReceived |
---|---|
2003 | 3,250,217.70 |
2004 | 4,313,328.25 |
2005 | 1,290,293.28 |
The total revenue was highest in 2004, followed by 2003 and 2005. The company may review their marketing strategies from the three years to see if there is something that worked very well in 2004. They should attempt to replicate it.
Which product line has the highest sales volume?
SELECT productLine,
SUM(quantityOrdered*priceEach) AS TotalSalesVolume
FROM productlines
JOIN products USING (productLine)
JOIN orderdetails USING (productCode)
GROUP BY productLine
ORDER BY TotalSalesVolume DESC;
productLine | TotalSalesVolume |
---|---|
Classic Cars | 3853922.5 |
Vintage Cars | 1797559.6 |
Motorcycles | 1121426.1 |
Trucks and Buses | 1024113.6 |
Planes | 954637.5 |
Ships | 663998.3 |
Trains | 188532.9 |
Classic Cars has the highest sales volume, followed by Vintage Cars and Motorcycles. The company could seek to further grow their high-selling product lines. If there is an opportunity to grow the lesser-performing product lines, the company should consider it.
Which product line has the highest quantity in stock?
SELECT productLine,
SUM(quantityInStock) AS TotalQuantityInStock
FROM productlines
JOIN products USING (productLine)
GROUP BY productLine
ORDER BY TotalQuantityInStock DESC;
productLine | TotalQuantityInStock |
---|---|
Classic Cars | 219183 |
Vintage Cars | 124880 |
Motorcycles | 69401 |
Planes | 62287 |
Trucks and Buses | 35851 |
Ships | 26833 |
Trains | 16696 |
Classic Cars has the highest quantity in stock, followed by Vintage Cars and Motorcycles. Planes has a higher quantity in stock even though it has a lower total sales volume than Trucks and Buses. The company could potentially have a sale on Planes to lower the stock.
What are the top three selling products in each product line?
SELECT prod.productLine,
prod.productCode,
prod.productName,
prod.totalQtySold,
RANK() OVER (PARTITION BY productLine ORDER BY totalQtySold DESC) AS totalQtySold_rank
FROM (
SELECT productLine,
productCode,
productName,
SUM(quantityOrdered) AS totalQtySold
FROM orderdetails JOIN products USING (productCode)
GROUP BY productCode
ORDER BY totalQtySold DESC
) AS prod
ORDER BY totalQtySold_rank, productLine
LIMIT 21;
productLine | productCode | productName | totalQtySold | totalQtySold_rank |
---|---|---|---|---|
Classic Cars | S18_3232 | 1992 Ferrari 360 Spider red | 1808 | 1 |
Motorcycles | S10_1678 | 1969 Harley Davidson Ultimate Chopper | 1057 | 1 |
Planes | S700_4002 | American Airlines: MD-11S | 1085 | 1 |
Ships | S700_2610 | The USS Constitution Ship | 1020 | 1 |
Trains | S50_1514 | 1962 City of Detroit Streetcar | 966 | 1 |
Trucks and Buses | S18_4600 | 1940s Ford truck | 1061 | 1 |
Vintage Cars | S18_1342 | 1937 Lincoln Berline | 1111 | 1 |
Classic Cars | S24_3856 | 1956 Porsche 356A Coupe | 1052 | 2 |
Motorcycles | S24_1578 | 1997 BMW R 1100 S | 1033 | 2 |
Planes | S24_3949 | Corsair F4U ( Bird Cage) | 1051 | 2 |
Since there are seven product lines, the top three selling items from each product line. The company could recommend these items as popular in their respective product lines. Customers who are interested in trying new product lines would be interested in these recommended items. These items could also be recommended or promoted for new customers looking to get their first models.
Let’s take a look at the products table. Which products contributed to the highest total revenue for the company?
SELECT productCode,
productName,
SUM(quantityOrdered*priceEach) AS totalRevenueFromProduct,
SUM(quantityOrdered) AS totalQuantitySold
FROM orderdetails JOIN products USING (productCode)
GROUP BY productCode
ORDER BY totalRevenueFromProduct DESC;
productCode | productName | totalRevenueFromProduct | totalQuantitySold |
---|---|---|---|
S18_3232 | 1992 Ferrari 360 Spider red | 276840.0 | 1808 |
S12_1108 | 2001 Ferrari Enzo | 190755.9 | 1019 |
S10_1949 | 1952 Alpine Renault 1300 | 190018.0 | 961 |
S10_4698 | 2003 Harley-Davidson Eagle Drag Bike | 170686.0 | 985 |
S12_1099 | 1968 Ford Mustang | 161531.5 | 933 |
S12_3891 | 1969 Ford Falcon | 152543.0 | 965 |
S18_1662 | 1980s Black Hawk Helicopter | 144959.9 | 1040 |
S18_2238 | 1998 Chrysler Plymouth Prowler | 142530.6 | 986 |
S18_1749 | 1917 Grand Touring Sedan | 140535.6 | 918 |
S12_2823 | 2002 Suzuki XREO | 135767.0 | 1028 |
The ‘1992 Ferrari 360 Spider red’ has the highest total revenue. The company should use these products in their promotional material since they are popular with customers. It is likely that these products are desirable to collectors. The company should also make sure they have adequate stock of these products.
Which products have the highest quantity in stock?
SELECT productCode,
productName,
quantityInStock,
SUM(quantityOrdered*priceEach) AS totalSales
FROM products JOIN
orderdetails USING (productCode)
GROUP BY productCode
ORDER BY quantityInStock DESC;
productCode | productName | quantityInStock | totalSales |
---|---|---|---|
S12_2823 | 2002 Suzuki XREO | 9997 | 135767.03 |
S18_1984 | 1995 Honda Civic | 9772 | 119050.95 |
S700_2466 | America West Airlines B757-200 | 9653 | 89347.80 |
S24_3432 | 2002 Chevy Corvette | 9446 | 87404.81 |
S18_2325 | 1932 Model A Ford J-Coupe | 9354 | 109992.01 |
S32_2206 | 1982 Ducati 996 R | 9241 | 33268.76 |
S24_3151 | 1912 Ford Model T Delivery Wagon | 9173 | 77239.92 |
S18_3482 | 1976 Ford Gran Torino | 9127 | 121890.60 |
S12_3380 | 1968 Dodge Charger | 9123 | 98718.76 |
S18_1589 | 1965 Aston Martin DB5 | 9042 | 101778.13 |
The ‘2002 Suzuki XREO’ has the highest quantity in stock. This model is also 10th in total revenue. The most concerning product is the ‘1982 Ducati 996 R’. It has a high quantity in stock, and has only made 33,268.76 in revenue. It may be beneficial to have a sale on the ‘1982 Ducati 996 R’ to reduce the quantity in stock and generate some revenue.
The pareto principal states that 20% of the work for a piece of software will require 80% of the time. This concept can be applied to retail. Use a cumulative distribution to calculate what percentage of models make up 80% of total sales volume.
SELECT prod.productCode,
prod.productName,
prod.TotalSales,
PERCENT_RANK() OVER (ORDER BY TotalSales DESC) AS TotalSales_percent_rank,
CUME_DIST() OVER (ORDER BY TotalSales DESC) AS TotalSales_cume_dist
FROM (
SELECT productCode,
productName,
SUM(quantityOrdered*priceEach) AS TotalSales
FROM products
JOIN orderdetails USING (productCode)
GROUP BY productCode
ORDER BY TotalSales DESC
) AS prod;
productCode | productName | TotalSales | TotalSales_percent_rank | TotalSales_cume_dist |
---|---|---|---|---|
S18_3232 | 1992 Ferrari 360 Spider red | 276840.0 | 0.0000000 | 0.0091743 |
S12_1108 | 2001 Ferrari Enzo | 190755.9 | 0.0092593 | 0.0183486 |
S10_1949 | 1952 Alpine Renault 1300 | 190018.0 | 0.0185185 | 0.0275229 |
S10_4698 | 2003 Harley-Davidson Eagle Drag Bike | 170686.0 | 0.0277778 | 0.0366972 |
S12_1099 | 1968 Ford Mustang | 161531.5 | 0.0370370 | 0.0458716 |
S12_3891 | 1969 Ford Falcon | 152543.0 | 0.0462963 | 0.0550459 |
S18_1662 | 1980s Black Hawk Helicopter | 144959.9 | 0.0555556 | 0.0642202 |
S18_2238 | 1998 Chrysler Plymouth Prowler | 142530.6 | 0.0648148 | 0.0733945 |
S18_1749 | 1917 Grand Touring Sedan | 140535.6 | 0.0740741 | 0.0825688 |
S12_2823 | 2002 Suzuki XREO | 135767.0 | 0.0833333 | 0.0917431 |
Using the cumulative distribution function shows that the distribution of sales throughout the products is balanced. For some companies, a large percentage of their total sales can be attributed to a small number of products. For Classic Models, the sale of products is well balanced. The highest revenue model, ‘1992 Ferrari 360 Spider red’, accounts for 0.92% of all sales. To reach 80% of all sales, we need the top 88 selling products out of 109. It would be advisable for this company to maintain the diversity in its product line.
With seven tables, this database had many areas to explore. There are many places to find insights regarding the operations of the ‘Classic Models’ company. While some knowledge was gained from this analysis, plenty more can be found in a more in-depth project. SQL queries are crucial to most companies. However, to conduct a further statistical analysis, it would be recommended to analyze the dataset using Python libraries or R. That is beyond the scope of this project. This document uses R Markdown. The SQL queries are run on a MySQL database.
The recommendations are explained in the SQL queries section above. To summarize, the main recommendations are: