PLEASE READ ATTACHMENT FIRST FOR DETAILED INSTRUCTIONS AND SCREENSHOTS. The

PLEASE READ ATTACHMENT FIRST FOR DETAILED INSTRUCTIONS AND SCREENSHOTS.

The details for this assignment are attached. Here are the basics:

Attachments:

· 2017_product_data_students-final.csv

· 2018_product_data_students-final.csv

· 2019_product_data_students-final.csv

· Candy_part_1_skeleton_for_students.SQL

Your company wants to merge its old product order data into a new data mart to facilitate analysis. You have been tasked with writing an ETL (extract, transform, and load) code sequence, and executing it on three years’ worth of order data. 

In this assignment, you will produce SQL code which scrubs and imports each of the three years’ worth of data, and produces an output file called stagingTable.

Along with these instructions, there is another document, ‘Additional Clarification on the Week 6 Candy Assignment’. Please read that document carefully.

 You should also read the ‘Data Notes’ in part 3 of this document. It is very important that you understand the data and how the data changes over the three years, so you can create a ‘stagingTable’ the effectively combines the data that might have been captured in different ways over the years.

Let’s get started!

Part A: Upload all the files you will need to SQLlite:

  

1. Import the file called “2017_product_data_students.csv” to SQLiteonline.com.  When you import it, give it the table name “pd2017” (no quotes) and set the column name to “First line.”

2. Import “2018_product_data_students.csv” as “pd2018”

3. Import “2019_product_data_students.csv” as “pd2019”

4. If you SELECT * FROM pd2017, you should see something like the below screenshot. Note you should see all three of the import tables on the left, and the pd2017 data should match what is shown as selected.

  

Part B: Extract and Transform your data

Your job is to use SQL to perform an ETL which will accomplish the following:  INSTRUCTIONS IN ATTACHMENT.

1. Start with the skeleton starter script we give you, attached to this assignment. Modify the CREATE TABLE command so the schema is as follows: SEE ATTACHMENT

2. Get the 2017 bit of the script working. SEE ATTACHMENT

3. Get the 2018 part of the script working. SEE ATTACHMENT

4. Get the 2019 part of the script working. SEE ATTACHMENT

5. The script will load it into one final table and call it stagingTable

6. Run the checksum script to verify you have the stagingTable calculated correctly.

7. Export your final output table under the name “XX_output_final.csv” where XX are your initials.  To export this, you can just use the Export button on the SQLlite menu (it’s right next to the Import button.)

You should do this all in SQLlite. You should not export to Excel and do your manipulations in Excel.

Part C: 2017 Data Notes

Your order 2017 data is contained in the attached file, “2017_product_data_students.csv” and you should have imported it as “pd2017.” A sample of this file’s type of data is contained below in Table 1 Sample of order data from 2017. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Country: text, should all be USA. (All data in this exercise should be USA.)

· Region: text, represents the regions within the country.

· State: text, USPS state abbreviations. Each state is within one region.

· Product: text. This is the name of a packaged food product.

· Per-unit price: integer. This represents the per-unit price in cents; for example, 300 indicates that Orange Creepies sell for $3.00 per package. (For the purposes of this exercise, disregard all currency formatting and just use 300 to represent $3.00.)

· Quantity: integer. This represents how many items were in that particular order. The first order here was for 49 packages of Orange Creepies.

· Order Total: integer. This is the per-unit price x the quantity. The first line here indicates that 300 x 49 = 14700 (or $147.00) was the price of the first order.

Table 1 Sample of order data from 2017 – SEE ATTACHMENT

2018 Data Notes:

Your order 2018 data is contained in the attached file, “2018_product_data_students.csv”

A sample of this file’s data is contained below as Table 2 Sample of order data from 2018. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Region: text, represents the regions within the country.

· Customer_ID: integer, represents the customer’s unique Customer ID number.

· Product: text. This is the name of a packaged food product. 

· Per-unit price: integer. This represents the per-unit price in cents; for example, 363 indicates that PearApple sells for $3.63  per package. (For the purposes of this exercise, you should disregard all currency formatting and just use 363 to represent $3.63.)

· Quantity_1: integer. This represents how many items were in the first shipment of that particular order. This year we had shipping problems, and could often not ship the entire order all at once. Orders were split into two shipments where necessary, and Quantity_1 reflects how many units were shipped first. (Assume all shipments were completed in the month listed, and that no shipments had the first shipment in one month and the second shipment in the subsequent month.) 

· Quantity_2: integer. This represents how many items were in the second shipment of that particular order. A 0 indicates a second shipment was not necessary. To get the total number of items shipped, you need to add Quantity_1 and Quantity_2.

· The first line here reflects that PearApple has a first shipment of 25 units, and a second shipment of 92 unit, all within the month of January, for a total of 25 + 92 = 117 units. 

Table 2 Sample of order data from 2018 – SEE ATTACHED

  

2019 Data Notes:

Your order 2019 data is contained in the attached file, “2019_product_data_students.csv.”

A sample of this file’s data is contained below as Table 3 Sample of order data from 2019. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Country: text, represents the country of the customer. Should all be USA.

· Region: text, represents the regions within the country.

· State: USPS code for the 50 United States.

· Product: text. Same as previous years.

· Per-unit price: integer. This represents the per-unit price in cents; same as previous years.

· Quantity: This represents how many items were in that particular order. The first order here was for 95 packages of Only Pancakes.

· Order Subtotal: This represents the order subtotal, calculated as per-unit price x quantity. For example, the first order here reflects a per-unit price of 413 cents x 95 units, for a subtotal of 39,235 (or $392.35). 

· Quantity Discount: This represents the new policy (effective January 1, 2019) that all orders 90 units and over will automatically earn a 10% discount. An order of 89 units does not earn the discount; an order of 90 units does earn the discount. All order discounts have been rounded to the nearest penny, so you can assume this field has no decimals in it. In the data below, 

o Order 0, on the first line, of 95 Only Pancakes to Florida, did qualify for the Quantity Discount, because an order quantity of 95 exceeded the 90 threshold. The Quantity Discount has been computed as 3924, or 10% of 39235. In this case, the final order total would be 39,235 – 3,924 = 35,311 (or $353.11).

o Order 4, on the fifth line, of 31 Future Toasts to North Carolina, did not qualify for the Quantity Discount. Therefore, the Order total would simply be the Order subtotal.

Table 3 Sample of order data from 2019 – SEE ATTACHED

  

Part D: Check Your Own Work

1.  You can run the following SQL code on your staging table. There is nothing to turn in from this bit. It should yield the following first few rows:

Select region, yearint, monthInt, count(*) from stagingTable where monthInt = 5 group by region, yearInt, monthInt;

2. You can also run the following code to debug. You should get the following rows:

Select yearInt, monthInt, state, customer_id, product_name, orderTotal from stagingTable 

where product_name = ‘Big Waffle’ and monthint=4

order by product_name, yearInt, monthInt, state, customer_id, orderTotal;

  

Now that you’ve debugged your code, it’s time to get a checksum! Run the following code to get a checksum. The checksum will be a number. Put this checksum number on the top of your homework. See table below for help with your CHECKSUM result. 

select sum(yearInt * monthInt * orderTotal)%2341 as checksum from stagingTable;

3. Once you get the result of your CHECKSUM look at table below for ways to troubleshoot any issues with your ETL statements. SEE ATTACHMENT

TURN IN:

1. Your output file, called “XX_output_final.csv” where XX are your initials.

2. All the SQL code you used to execute this.

3. A document that contains

a. CHECKSUM: XXX where XXX is the checksum number produced. Put this in big font right on the top.

b. A one page outline of your ETL process. Which functions did you use, and what logic did you follow? This should be at the level that your boss, who has an MBA but not an IT/database background, can follow it. Do not use “computer-ese” here; use regular business English.

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Consider some of the examples you have brought up in

Consider some of the examples you have brought up in earlier discussion forums about applying models to real-world problems. Choose one of the models covered earlier in the course and describe the key differences in solving a problem with that model versus with a simulation model. In your opinion, which

You will need to pick a topic relating to organized

 You will need to pick a topic relating to organized crime. The topic is of YOUR choice. Please use the notes we covered last week (most of you were in attendance so you should have them) PowerPoint must be 6 slides ( who, what, when, where, why, and how) using

Write and develop an APA formatted, 4 to 6 page

   Write and develop an APA formatted, 4 to 6 page paper that includes: ● Introduction ● Explanation of the three to four of the most important leadership concepts you have learned in this course. Use examples from your own experience and use research along with in-text citations that provide

Close reading

  This essay is a close reading of the text. What is a close reading? A close reading is an analysis that synthesizes and reflects on the various aspects, components, elements (whatever you want to call them) of a text. It’s not necessarily exhaustive, as it may focus on a

Summarizing Healthcare Financing

 You are a senior executive at a healthcare organization. Your boss has asked you to review an article on issues related to healthcare financing. Please read an article and provide the following deliverables. Once you have accessed the website through the link below, you can choose any article of your

For this Assignment, you will document information about a patient

 For this Assignment, you will document information about a patient that you examined during the last 3 weeks, using the Comprehensive Psychiatric Evaluation Template provided. You will then use this note to develop and record a case presentation for this patient. Be sure to incorporate any feedback you received on

Write a review 3-4 pages. Analyze and discuss the material

Write a review 3-4 pages. Analyze and discuss the material from the chapter that you have chosen and address the following:  – What were the central challenges and obstacles faced by Puerto Rico ( in developing a cohesive modern nation state Some issues to consider:  What role had ethnic and

Journalize production activities and compute the costs from production to

  Journalize production activities and compute the costs from production to sold goods. Introduction Note: Accounting requires specific steps that need to be executed in a sequence. The assessments in this course are presented in sequence and must be completed in order. Manufacturing costs are separated out by department or

Genetic fortunetelling – Premium Paper Help

Premium Paper Help is a professional writing service that provides original papers. Our products include academic papers of varying complexity and other personalized services, along with research materials for assistance purposes only. All the materials from our website should be used with proper references.

Write a Python program to enable a user to perform

Write a Python program to enable a user to perform image processing operations. The program can use a GUI or a command line interface. Allow at least the following operations: brighten/darken, increase contrast, change color to grayscale and edge detection. You may choose to allow other operations, such as crop,

Using your favorite search engine, locate the federal budget and

 Using your favorite search engine, locate the federal budget and your state budget – Tennessee  for the current fiscal year. Compare and contrast both budgets i that is at least two pages in length. , discuss the prompts listed below. The budgeting tools utilized by both government types The impact

Use APA style citations and referencing EACH QUESTION Week 5

  Use APA style citations and referencing  EACH QUESTION Week 5 Questions to Answer Please answer the following Questions An alteration in which neurotransmitter is most closely associated with the etiology of Alzheimer’s disease? How does vascular neurocognitive disorder (NCD) differ from NCD due to Alzheimer’s disease? What is pseudodementia?  What

Research methods refer to specific procedures selected based on the

Research methods refer to specific procedures selected based on the chosen design. This is where you will provide detail on how you collected and analyzed your data. For quantitative methodologies, research methods can be quite detailed and require that attention be paid to recruitment, sampling, sampling frame, sample size, surveys,

Hackers in the U.S. in the 1980s.

Use reliable, scholarly sources to help you describe or tell your group’s origin story, its creation myth. Remember that this is myth, not history, so compose the creation story as such. In other words, mythologize the history of the group’s beginnings. It does not have to be true, but it

You are the Lead of the Test team in your

You are the Lead of the Test team in your Department.  Your team is typically exposed to software for testing once it is developed, packaged and ready to be deployed on the web.  They ask that you test and approve it before it is deployed.  Your team usually finds errors

Hello, Week 7 Class Discussion topic comes from – Chapter

  Hello,  Week 7 Class Discussion topic comes from — Chapter Eight “Marketing Strategies for New Market Entries” Chapter Nine “Strategies for Growth Markets” Chapter Ten “Strategies for Mature and Declining Markets” This class expects you to contribute three times: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A product moves through a life cycle from introduction to

Discussion 1: 1. What drives Patagonia to focus on sustainable

Discussion 1: 1. What drives Patagonia to focus on sustainable sourcing of wool? If it’s about customer expectations: do you care about this when buying fashion/sports wear?  2. Can companies like Patagonia be held accountable for what their suppliers do and if so how can they make sure that their suppliers don’t violate

Choose 2 quantitative elements that you would like to research

Choose 2 quantitative elements that you would like to research in relation to an organization of your choice. These elements may be related to products, services, target market, consumer preferences, competition, personnel, resources, supply chain, financing, advertising, or other areas of interest. However, at least one of these elements should be related