Background

Purpose: To use SQL queries on a fictional company’s database to offer insights for their business strategy.

Due to a global health crisis, sales for the fictional company ‘Classic Models’, a model cars wholesaler, have slowed down. In this exercise, we will be using SQL queries to find cost optimizations and ways to improve sales.

The Dataset

The database is a sample MySQL database obtained from https://www.mysqltutorial.org/. It was downloaded on 9/28/2020, and the data was loaded into MySQL. It is based on a company that wholesales classic model cars. The company has seven branch offices worldwide. The company makes sales all around the world. The dates for the records are from 2003-2005.

SQL Queries

First, we should examine what the population of the database looks like. How many rows are in each table in the database?

SELECT 'Customers' AS 'Table', COUNT(*) AS Num_Rows FROM customers
UNION
SELECT 'Employees' AS 'Table', COUNT(*) AS Num_Rows FROM employees
UNION
SELECT 'Offices' AS 'Table', COUNT(*) AS Num_Rows FROM offices
UNION
SELECT 'Order Details' AS 'Table', COUNT(*) AS Num_Rows FROM orderdetails
UNION
SELECT 'Orders' AS 'Table', COUNT(*) AS Num_Rows FROM orders
UNION
SELECT 'Payments' AS 'Table', COUNT(*) AS Num_Rows FROM payments
UNION
SELECT 'Product Lines' AS 'Table', COUNT(*) AS Num_Rows FROM productlines
UNION
SELECT 'Products' AS 'Table', COUNT(*) AS Num_Rows FROM products;
8 records
Table Num_Rows
Customers 122
Employees 23
Offices 7
Order Details 2996
Orders 326
Payments 273
Product Lines 7
Products 110

Now, we will examine each of the table in the database to understand the operations of ‘Classic Models’ for the years 2003-2005. Each table will have at least one question to be answered with a query. The results from these queries will help us make recommendations to the company.


Customers:

Who are the highest spending customers?

SELECT  customerName,
        contactLastName,
        contactFirstname,
        city,
        state,
        SUM(quantityOrdered*priceEach) AS totalSpent,
        MAX(orderDate) AS LastOrder
FROM    orderdetails JOIN
        orders USING (orderNumber) JOIN
        customers USING (customerNumber)
GROUP BY    customerNumber
ORDER BY    totalSpent DESC;
Displaying records 1 - 10
customerName contactLastName contactFirstname city state totalSpent LastOrder
Euro+ Shopping Channel Freyre Diego Madrid NA 820689.5 2005-05-31
Mini Gifts Distributors Ltd. Nelson Susan San Rafael CA 591827.3 2005-05-29
Australian Collectors, Co. Ferguson Peter Melbourne Victoria 180585.1 2004-11-29
Muscle Machine Inc Young Jeff NYC NY 177914.0 2004-12-01
La Rochelle Gifts Labrune Janine Nantes NA 158573.1 2005-05-31
Dragon Souveniers, Ltd. Natividad Eric Singapore NA 156251.0 2005-03-02
Down Under Souveniers, Inc Graham Mike Auckland NA 154622.1 2005-04-08
Land of Toys Inc. Lee Kwai NYC NY 149085.1 2004-11-15
AV Stores, Co. Ashworth Rachel Manchester NA 148410.1 2004-11-17
The Sharp Gifts Warehouse Frick Sue San Jose CA 143536.3 2005-04-22

The highest spending customer is ‘Euro + Shopping Channel’ located in Madrid. If sales are slow, it might be a good idea to offer discounts to the top customers to incentivize purchases. We can also see in the ‘LastOrder’ column the date of the last order. Customers who have not ordered in some time could be targeted with promotions to remind them to re-stock. Another approach would be to focus on the lesser spending customers. The company could try to incentivize them to spend more.


Employees:

Which employees have the highest total sales volumes?

SELECT  salesRepEmployeeNumber,
        employees.lastName,
        employees.firstName,
        employees.email,
        SUM(quantityOrdered*priceEach) AS totalSales
FROM    orderdetails JOIN orders USING (orderNumber)
        JOIN customers USING (customerNumber)
        JOIN employees ON
        customers.salesRepEmployeeNumber = employees.employeeNumber
GROUP BY    salesRepEmployeeNumber
ORDER BY    totalSales DESC;
Displaying records 1 - 10
salesRepEmployeeNumber lastName firstName email totalSales
1370 Hernandez Gerard 1258577.8
1165 Jennings Leslie 1081530.5
1401 Castillo Pamela 868220.6
1501 Bott Larry 732096.8
1504 Jones Barry 704853.9
1323 Vanauf George 669377.1
1612 Marsh Peter 584593.8
1337 Bondur Loui 569485.8
1611 Fixter Andy 562582.6
1216 Patterson Steve 505875.4

It could be wise to offer performance bonuses to the highest selling employees. That would motivate employees to pursue more sales. The company could also have tiers of performance bonuses to incentivize employees to reach the next bonus level of sales. If sales are slow, the company may need to layoff the lowest performance employees.


Offices:

Which office had the highest sales volumes?

SELECT  officeCode,
        CONCAT(
                 COALESCE(CONCAT(o.addressLine2,' - '), ''), 
                 COALESCE(CONCAT(o.addressLine1, ', '), ''), 
                 COALESCE(CONCAT(o.city), ''), 
                 COALESCE(CONCAT(', ', o.state), ''),
                 COALESCE(CONCAT(', ', o.country), '')
              ) AS Address,
        o.phone,
        SUM(quantityOrdered*priceEach) AS totalSales
FROM    orderdetails JOIN orders USING (orderNumber)
        JOIN customers USING (customerNumber)
        JOIN employees ON
        customers.salesRepEmployeeNumber = employees.employeeNumber
        JOIN offices o USING (officeCode)
GROUP BY    officeCode
ORDER BY    SUM(quantityOrdered*priceEach) DESC;
7 records
officeCode Address phone totalSales
4 43 Rue Jouffroy D’abbans, Paris, France +33 14 723 4404 3083761.6
7 Level 7 - 25 Old Broad Street, London, UK +44 20 7877 2041 1436950.7
1 Suite 300 - 100 Market Street, San Francisco, CA, USA +1 650 219 4782 1429063.6
3 apt. 5A - 523 East 53rd Street, NYC, NY, USA +1 212 555 3000 1157589.7
6 Floor #2 - 5-11 Wentworth Avenue, Sydney, Australia +61 2 9264 2451 1147176.4
2 Suite 102 - 1550 Court Place, Boston, MA, USA +1 215 837 0825 892538.6
5 4-1 Kioicho, Tokyo, Chiyoda-Ku, Japan +81 33 224 5000 457110.1

Since the Paris office has the highest revenue, the company may invest more heavily in that office. If sales are slow, it might be necessary to close the Tokyo office, since it collects the lowest revenue. The Asian business would be handled by the Australian office.

Let’s look at the office with the highest total sales in Paris. For this office, calculate the 31-day average number of orders.

SELECT  fulldate,
        day,
        month,
        year,
        IFNULL(NumOrders,0) AS NumberOfOrders,
        AVG(IFNULL(NumOrders,0)) OVER (ORDER BY year, month, day
                                       ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING) AS 31_day_avg
FROM    calendars LEFT JOIN
        (
            SELECT  orderDate,
                    COUNT(*) AS NumOrders
            FROM    orders
                    JOIN customers USING (customerNumber)
                    JOIN employees ON customers.salesRepEmployeeNumber = employees.employeeNumber
            WHERE   officeCode = 4
            GROUP BY    orderDate
            ORDER BY    COUNT(*) DESC
        ) AS ord
        ON calendars.fulldate = ord.orderDate
ORDER BY 31_day_avg DESC;
Displaying records 1 - 10
fulldate day month year NumberOfOrders 31_day_avg
2004-11-25 25 11 2004 2 0.4194
2003-11-18 18 11 2003 0 0.3871
2003-11-19 19 11 2003 1 0.3871
2004-11-24 24 11 2004 3 0.3871
2004-11-26 26 11 2004 0 0.3871
2004-11-27 27 11 2004 0 0.3871
2004-11-30 30 11 2004 0 0.3871
2004-12-01 1 12 2004 0 0.3871
2004-12-02 2 12 2004 1 0.3871
2004-12-03 3 12 2004 0 0.3871

For the Paris office, we can see that the days with the highest 31-day average of orders are all between mid-November and early-December. Clearly, this office is busiest with orders just prior to the holiday season. This information can be used to prepare staffing and resources for those busy times. In contrast, January, March, and September have the lowest 31-day average of orders. In order to reduce costs, the company can operate under minimal staffing during the slow periods.


Order Details:

How can the client decide how to make promotions to incentivise purchasing greater quantities?

SELECT  IF (quantityOrdered < 35.2190, 'few', 'many') AS few_many,
        AVG(priceEach) AS avg_price
FROM    orderdetails
GROUP BY    few_many;
2 records
few_many avg_price
few 90.49135
many 91.07370

According to this data, the average price is higher when the quantities are above average, and the average price is lower when the quantities are below average. It may not be wise to offer discounts for greater quantities without further research.

Which month has the most total items ordered?

SELECT  MONTH(ord.orderDate),
        SUM(quantityOrdered) AS TotalQuantityOrdered
FROM    (
            SELECT  orderNumber,
                    orderDate,
                    quantityOrdered
            FROM    orders
                    JOIN orderdetails USING (orderNumber)
        ) AS ord
GROUP BY    MONTH(ord.orderDate)
ORDER BY  TotalQuantityOrdered DESC;
Displaying records 1 - 10
MONTH(ord.orderDate) TotalQuantityOrdered
11 21540
10 11214
5 9464
3 8294
1 7997
2 7959
4 7906
12 7769
8 6538
7 5721

November has the most items ordered, followed by October and May. Customers are likely stocking up for the holiday shopping season. The company should prepare stock and staffing as needed for each month. From a marketing perspective, Classic Models should continue to capitalize on the holiday shopping season.


Orders:

Over the three years where data was collected, how many orders were there per year?

SELECT  COUNT(*) numOrders,
        YEAR(orderDate) AS orderYear
FROM    orders 
GROUP BY orderYear;
3 records
numOrders orderYear
111 2003
151 2004
64 2005

2004 had the highest number of orders, followed by 2003 and 2005. The company may review their sales strategies from the three years to see if a particular strategy led to the increase in sales in 2004.

Which month has the most orders?

SELECT  MONTH(orderDate) AS Month,
        COUNT(*) AS Orders
FROM    orders
GROUP BY    Month
ORDER BY Orders DESC;
Displaying records 1 - 10
Month Orders
11 63
10 31
4 29
5 29
3 27
2 26
1 25
12 22
9 20
6 19

By far, November has the most orders. This is likely due to holiday shopping. It is followed by October, April and May. The company should be aware of the yearly trends in order to prepare stock and staffing.

Calculate the 31-day averages of total units shipped for each day.

SELECT  fulldate,
        day,
        month,
        year,
        IFNULL(ship.QtyShipped,0) AS QtyShipped,
        AVG(IFNULL(ship.QtyShipped,0)) OVER (ORDER BY year, month, day
                                             ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING) AS QtyShipped_31_day_avg
FROM    calendars LEFT JOIN
        (
            SELECT  shippedDate,
                    SUM(quantityOrdered) AS QtyShipped
            FROM    orders
                    JOIN orderdetails USING (orderNumber)
            GROUP BY    shippedDate
        ) AS ship
        ON calendars.fulldate = ship.shippedDate
ORDER BY    QtyShipped_31_day_avg DESC;
Displaying records 1 - 10
fulldate day month year QtyShipped QtyShipped_31_day_avg
2003-11-23 23 11 2003 0 416.8065
2003-11-24 24 11 2003 646 416.8065
2003-11-22 22 11 2003 0 414.6129
2003-11-21 21 11 2003 0 396.1935
2004-11-20 20 11 2004 261 390.8710
2004-11-21 21 11 2004 158 381.0323
2003-11-19 19 11 2003 0 377.2581
2003-11-20 20 11 2003 1209 377.2581
2004-11-22 22 11 2004 0 375.2581
2003-11-25 25 11 2003 585 373.0645

The top-10 highest 31-day average quantities shipped are all in mid to late November of 2003 and 2004. It is clear that the warehouse will be busiest between mid-November and early-December. This information will be important for making decisions about resource allocation for the staff, materials, and transportation required by the warehouse. It should also factor into when to stock up the warehouse. On the other hand, January, April and June have low 31-day average quantities shipped. The company could save money by reducing resources during those months.


Payments:

Over the three years where data was collected, what was the total revenue each year?

SELECT  YEAR(paymentDate) AS paymentYear,
        FORMAT(SUM(amount), 2) AS totalPaymentsReceived
FROM    payments
GROUP BY    paymentYear
ORDER BY paymentYear;
3 records
paymentYear totalPaymentsReceived
2003 3,250,217.70
2004 4,313,328.25
2005 1,290,293.28

The total revenue was highest in 2004, followed by 2003 and 2005. The company may review their marketing strategies from the three years to see if there is something that worked very well in 2004. They should attempt to replicate it.


Product Lines:

Which product line has the highest sales volume?

SELECT  productLine,
        SUM(quantityOrdered*priceEach) AS TotalSalesVolume
FROM    productlines
        JOIN products USING (productLine)
        JOIN orderdetails USING (productCode)
GROUP BY    productLine
ORDER BY    TotalSalesVolume DESC;
7 records
productLine TotalSalesVolume
Classic Cars 3853922.5
Vintage Cars 1797559.6
Motorcycles 1121426.1
Trucks and Buses 1024113.6
Planes 954637.5
Ships 663998.3
Trains 188532.9

Classic Cars has the highest sales volume, followed by Vintage Cars and Motorcycles. The company could seek to further grow their high-selling product lines. If there is an opportunity to grow the lesser-performing product lines, the company should consider it.

Which product line has the highest quantity in stock?

SELECT  productLine,
        SUM(quantityInStock) AS TotalQuantityInStock
FROM    productlines
        JOIN products USING (productLine)
GROUP BY    productLine
ORDER BY    TotalQuantityInStock DESC;
7 records
productLine TotalQuantityInStock
Classic Cars 219183
Vintage Cars 124880
Motorcycles 69401
Planes 62287
Trucks and Buses 35851
Ships 26833
Trains 16696

Classic Cars has the highest quantity in stock, followed by Vintage Cars and Motorcycles. Planes has a higher quantity in stock even though it has a lower total sales volume than Trucks and Buses. The company could potentially have a sale on Planes to lower the stock.

What are the top three selling products in each product line?

SELECT  prod.productLine,
        prod.productCode,
        prod.productName,
        prod.totalQtySold,
        RANK() OVER (PARTITION BY productLine ORDER BY totalQtySold DESC) AS totalQtySold_rank
FROM    (
            SELECT  productLine,
                    productCode,
                    productName,
                    SUM(quantityOrdered) AS totalQtySold
            FROM    orderdetails JOIN products USING (productCode)
            GROUP BY    productCode
            ORDER BY    totalQtySold DESC
        ) AS prod
ORDER BY    totalQtySold_rank, productLine
LIMIT   21;
Displaying records 1 - 10
productLine productCode productName totalQtySold totalQtySold_rank
Classic Cars S18_3232 1992 Ferrari 360 Spider red 1808 1
Motorcycles S10_1678 1969 Harley Davidson Ultimate Chopper 1057 1
Planes S700_4002 American Airlines: MD-11S 1085 1
Ships S700_2610 The USS Constitution Ship 1020 1
Trains S50_1514 1962 City of Detroit Streetcar 966 1
Trucks and Buses S18_4600 1940s Ford truck 1061 1
Vintage Cars S18_1342 1937 Lincoln Berline 1111 1
Classic Cars S24_3856 1956 Porsche 356A Coupe 1052 2
Motorcycles S24_1578 1997 BMW R 1100 S 1033 2
Planes S24_3949 Corsair F4U ( Bird Cage) 1051 2

Since there are seven product lines, the top three selling items from each product line. The company could recommend these items as popular in their respective product lines. Customers who are interested in trying new product lines would be interested in these recommended items. These items could also be recommended or promoted for new customers looking to get their first models.


Products:

Let’s take a look at the products table. Which products contributed to the highest total revenue for the company?

SELECT  productCode,
        productName,
        SUM(quantityOrdered*priceEach) AS totalRevenueFromProduct,
        SUM(quantityOrdered) AS totalQuantitySold
FROM    orderdetails JOIN products USING (productCode)
GROUP BY    productCode
ORDER BY    totalRevenueFromProduct DESC;
Displaying records 1 - 10
productCode productName totalRevenueFromProduct totalQuantitySold
S18_3232 1992 Ferrari 360 Spider red 276840.0 1808
S12_1108 2001 Ferrari Enzo 190755.9 1019
S10_1949 1952 Alpine Renault 1300 190018.0 961
S10_4698 2003 Harley-Davidson Eagle Drag Bike 170686.0 985
S12_1099 1968 Ford Mustang 161531.5 933
S12_3891 1969 Ford Falcon 152543.0 965
S18_1662 1980s Black Hawk Helicopter 144959.9 1040
S18_2238 1998 Chrysler Plymouth Prowler 142530.6 986
S18_1749 1917 Grand Touring Sedan 140535.6 918
S12_2823 2002 Suzuki XREO 135767.0 1028

The ‘1992 Ferrari 360 Spider red’ has the highest total revenue. The company should use these products in their promotional material since they are popular with customers. It is likely that these products are desirable to collectors. The company should also make sure they have adequate stock of these products.

Which products have the highest quantity in stock?

SELECT  productCode,
        productName,
        quantityInStock,
        SUM(quantityOrdered*priceEach) AS totalSales
FROM    products JOIN
        orderdetails USING (productCode)
GROUP BY    productCode
ORDER BY    quantityInStock DESC;
Displaying records 1 - 10
productCode productName quantityInStock totalSales
S12_2823 2002 Suzuki XREO 9997 135767.03
S18_1984 1995 Honda Civic 9772 119050.95
S700_2466 America West Airlines B757-200 9653 89347.80
S24_3432 2002 Chevy Corvette 9446 87404.81
S18_2325 1932 Model A Ford J-Coupe 9354 109992.01
S32_2206 1982 Ducati 996 R 9241 33268.76
S24_3151 1912 Ford Model T Delivery Wagon 9173 77239.92
S18_3482 1976 Ford Gran Torino 9127 121890.60
S12_3380 1968 Dodge Charger 9123 98718.76
S18_1589 1965 Aston Martin DB5 9042 101778.13

The ‘2002 Suzuki XREO’ has the highest quantity in stock. This model is also 10th in total revenue. The most concerning product is the ‘1982 Ducati 996 R’. It has a high quantity in stock, and has only made 33,268.76 in revenue. It may be beneficial to have a sale on the ‘1982 Ducati 996 R’ to reduce the quantity in stock and generate some revenue.

The pareto principal states that 20% of the work for a piece of software will require 80% of the time. This concept can be applied to retail. Use a cumulative distribution to calculate what percentage of models make up 80% of total sales volume.

SELECT  prod.productCode,
        prod.productName,
        prod.TotalSales,
        PERCENT_RANK() OVER (ORDER BY TotalSales DESC) AS TotalSales_percent_rank,
        CUME_DIST() OVER (ORDER BY TotalSales DESC) AS TotalSales_cume_dist
FROM    (
            SELECT  productCode,
                    productName,
                    SUM(quantityOrdered*priceEach) AS TotalSales
            FROM    products
                    JOIN orderdetails USING (productCode)
            GROUP BY    productCode
            ORDER BY    TotalSales DESC
        ) AS prod;
Displaying records 1 - 10
productCode productName TotalSales TotalSales_percent_rank TotalSales_cume_dist
S18_3232 1992 Ferrari 360 Spider red 276840.0 0.0000000 0.0091743
S12_1108 2001 Ferrari Enzo 190755.9 0.0092593 0.0183486
S10_1949 1952 Alpine Renault 1300 190018.0 0.0185185 0.0275229
S10_4698 2003 Harley-Davidson Eagle Drag Bike 170686.0 0.0277778 0.0366972
S12_1099 1968 Ford Mustang 161531.5 0.0370370 0.0458716
S12_3891 1969 Ford Falcon 152543.0 0.0462963 0.0550459
S18_1662 1980s Black Hawk Helicopter 144959.9 0.0555556 0.0642202
S18_2238 1998 Chrysler Plymouth Prowler 142530.6 0.0648148 0.0733945
S18_1749 1917 Grand Touring Sedan 140535.6 0.0740741 0.0825688
S12_2823 2002 Suzuki XREO 135767.0 0.0833333 0.0917431

Using the cumulative distribution function shows that the distribution of sales throughout the products is balanced. For some companies, a large percentage of their total sales can be attributed to a small number of products. For Classic Models, the sale of products is well balanced. The highest revenue model, ‘1992 Ferrari 360 Spider red’, accounts for 0.92% of all sales. To reach 80% of all sales, we need the top 88 selling products out of 109. It would be advisable for this company to maintain the diversity in its product line.


Summary & Recommendations

With seven tables, this database had many areas to explore. There are many places to find insights regarding the operations of the ‘Classic Models’ company. While some knowledge was gained from this analysis, plenty more can be found in a more in-depth project. SQL queries are crucial to most companies. However, to conduct a further statistical analysis, it would be recommended to analyze the dataset using Python libraries or R. That is beyond the scope of this project. This document uses R Markdown. The SQL queries are run on a MySQL database.

The recommendations are explained in the SQL queries section above. To summarize, the main recommendations are: