COMM 392 - ASSIGNMENT 2: RELATIONAL DATABASES AND SQL
Assignment Overview and Deliverables
The assignment has four major parts (Parts A,B, C and D) for a total of 40 points. The full grading rubric can be found in Appendix 1.
This is an individual assignment.
The data set to be used for this assignment is Chinook.Sqlite.sqlite
Here are the deliverables for this assignment:
Deliverable
|
Submission Mode
|
Deadline
|
Exported KNIME workflow containing the workflow model and the SQL script(s). The project can be submitted following these steps:
1. Once your KNIME workflow and SQL scripts are final, open the file menu at the upper-left corner of the KNIME interface.
2. Select “Export KNIME Workflow …”.
3. Make sure you have selected the right workflow by clicking on “Select …”.
4. Choose the appropriate destination on your local machine for the workflow to be exported using “Browse …”.
5. Make sure to uncheck “Reset Workflow(s) before export”. When exporting a workflow without resetting, the data used in the workflow are exported as well, which will enable us to grade your assignment properly.
6. Click “Finish” .
7. Upload the .knar or .knwf file to the assignment folder on the course website.
|
Submit the .knar or .knwf file containing your KNIME
workflow to the
assignment folder on the course website
|
See
syllabus
|
Answers File
• Your answers to the assignment questions (use the template provided).
• Upload the answer file to the assignment folder along with the KNIME workflow on the course website.
|
Submit an MSWord
file (DOC or
DOCX) to the
assignment folder on the course
website.
|
Each student is expected to submit these deliverables (the .knar/knwf and the MS Word
document) as two separate files that you upload to the assignments folder on the course website.
Assignment Instructions
Use the KNIME analytics platform. and the Chinook_Sqlite.sqlite database to answer the following questions. Please use a different “DB Query Reader” node for each question that requires SQL codesowe can grade each one separately.
To complete the assignment, we provide you with the Chinook database’s entity-relationship diagram (ERD):
This assignment has three graded required parts,A through D.
Part A: Entity Relation Diagrams (ERD), Primary Keys, and Foreign Keys (10 Points)
|
1. For a column to be set as a PRIMARY KEY … (1 point)
a. …the data values in that column can never be .
b. …the data values in that column must always be .
2. TRUE or FALSE questions (5 points)
|
TRUE
|
FALSE
|
a. A primary key must be an integer
|
o
|
o
|
b. A database with primary keys could have no foreign keys.
|
o
|
o
|
c. A table’s primary key must always be a single column.
|
o
|
o
|
d. A table can have two or more primary keys
|
o
|
o
|
e. A table can have two or more foreign keys
|
o
|
o
|
f. A foreign key column can contain NULLs
|
o
|
o
|
g. A foreign key can reference multiple primary keys
|
o
|
o
|
h. A primary key can be referenced by multiple foreign keys
|
o
|
o
|
i. A table must always contain a primary key.
|
o
|
o
|
j. Two rows cannot contain the same value in a column
|
o
|
o
|
that is a foreign key to another table.
|
|
|
The next few questions refer specifically to the Chinook database and its ERD.
3. Please look at the following part of the Chinook ERD. There is a dashed line connecting Album and Artist with a darkened circle and an open rectangle.
a. Which table among Album and Artist has the foreign key? (0.5 points)
b. What is the relationship cardinality, here: one-to-one, many-to-many, or one-to-many (and, if one-to-many, in which direction: Album 1:M Artist or Artist 1:M Album)? (0.5 points)
4. There is a relation shown between the Track table and the PlaylistTrack table but also notice that the line there is solid rather than a dashed line as it is elsewhere in the ERD.
a. What does the solid line imply about the foreign key? (0.5 points)
b. First, on your favorite music app check whether you can define a music playlist which contains the same track twice.
Now, according to Chinook’s data ERD, can a playlist contain the same track twice? Please specify YES or NO and then also explain: which relational constraint ensures this behavior, here? (0.5 points)
5. The PlaylistTrack table is unlike other tables in the Chinook database in a few other ways:
a. PlaylistTrack contains no attributes other than its primary key. If there had been attributes specific to neither Track (that is, a song) nor Playlist but which were specific to a given Playlist and song then they would be listed there in the ERD.
What is an attribute you can think of that if added to the database would be best placed here on PlaylistTrack rather than on either of Track or Playlist tables? (1 point)
b. Track and Playlist don’t have an explicit relationship between them, but if one did exist it would bea M:M (many-to-many) relationship in that a playlist can contain many tracks and a track can exist on multiple playlists. However, M:M relationships are highly undesirable in relational data design.
Fortunately, any single M:M relationship between two tables can be - and almost always is - broken down into two “1:M” relationships of a new, third table between the original two tables. This is in fact the purpose of the PlaylistTrack table in the Chinook database.
What’s the name given to a table like PlaylistTrack (or table “AB”, below) that serves in this capacity between two other tables? (1 point)
Part B: Flat files and relational databases (4 Points)
|
6. Using the Chinook ERD as a basis, explain and illustrate how relational databases are more efficient than flat files to store and manage organizational data. Meaning, explain how a
Chinook relational database is more efficient than a Chinook flat file. (2 points)
7. Consider the table shown below.
a. Is there aprimary key within this relational table? (1 point)
b. Support/explain your answer from (a) by stating the attribute(s) composing the primary key or by explaining why there is no primary key. (1 point)
Table: customer_contact
column name
|
data type
|
customer_ name
|
char(40)
|
customer_birthday
|
char(8)
|
customer_phone_number
|
char(15)
|
home_address_line1
|
char(30)
|
home_address_line2
|
char(30)
|
home_address_city
|
char(30)
|
work_address_line1
|
char(30)
|
work_address_line2
|
char(30)
|
work_address_city
|
char(30)
|
Part C: SQL Queries using Chinook’s Database (16 Points)
|
Albums, Tracks, Composers
8. There are 114 tracks whose name contain the word “love.” How many unique albums have at least one track that has a name containing the word “love”? (2 points)
9. Write a SQL query that lists all albums which “U2” is cited as the Artist. Do not hardcode the ArtistID in your query. Instead, you must use either an INNER JOIN OR use a subquery to the Artist table and then hardcode “ …Name = 'U2' …”. It is okay to hardcode the artist’s name since it has an interpretable, real word meaning whereas the ArtistID does not. (3points)
Hint:
An inner join would be written this way: SELECT {…}
FROM table1 t1
INNER JOIN table2 t2
ON t1.{t1_column} = t2.{t2_column}
WHERE {t1...}
|
A subquery would be written this way: SELECT {…}
FROM table1 t1
WHERE t1.{t1_column} IN ( SELECT t2.{t2_column}
FROM table2 t2
WHERE {t2…} )
|
Employees and Managers
10. Write a SQL query that counts all employees (including all managers) from the Employee table. (2 points)
11. Write a SQL query that returns a list of each manager’s FirstName, LastName, and Title. Identify managers by using the ReportsTo column. (2 points)
Hint: Your query should identify managers by the fact that other employees report to them and not by their titles since hypothetically some managers may not have people reporting to them (e.g. “Inventory Manager”) and some managers don’t have “manager” in their name (eg “CEO”, “Director”). Therefore, you will need to use a common-table expression and/or join and use the ReportsTo column.
Customers, Sales
12. What are Chinook’s top five (5) selling countries in terms of total tracks sold? Your result should have two columns: Country, and SUM(InvoiceLine.Quantity)AS SumQuantity (2 points)
Hint: You will need to join Customer through to the invoiceLine table. Use the “ORDER BY” and “LIMIT” SQL clauses to first sort in descending order and then chop the list for top results.
13. Which single customer has bought the most from Chinook, in total invoiced dollars? Your query should provide their name, country, and total dollars spent. (2 points)
Genres, Tracks
14. List the top 3 most "popular" genres along with the least popular genre. Also show the track count (popularity) for each listed genre and the popularity rank. "Popularity" here is defined as the track count, or the number of distinct tracks, so, nothing to do with sales of tracks. (3 points)
Hint #1: Be patient and build your query one part at a time. Don't proceed to the next part until you have the last aspect added in. Use common-table expressions to build one step at a time.
Hint #2: The ROW_NUMBER function is a type of "window function" that can be useful, because it assigns a sequential integer to each row of a query’s result table. See this link for a definition and examples: https://www.sqlitetutorial.net/sqlite-window-functions/sqlite- row_number/. Check whether arguments are required or optional.
Part D: Key Performance Indicators and New Data (10 points)
|
15. Based on the existing database of tables, identify and define one key performance indicator (KPI) for Chinook’s management. Assume Chinook is a digital media store selling pay-per- use music for listeners (i.e. like Spotify today or iTunes back in the mid-2000s) (2 points)
16. Explain why the KPI chosen is significant and why management should monitor this KPI to evaluate the performance of the business. (2 points)
17. Calculate the KPI chosen by writing the related SQL code and interpret the number generated.
(2 points)
18. Identify one other KPI that is not possible to calculate based on the data that exists. In other words, what important metric in the business is worth monitoring but is NOT possible to track based on the existing data? Why is this KPI worth measuring and monitoring? (2 points)
19. Define the new table, adjustments to existing tables or series of new tables necessary to calculate this other KPI from Question #18. Use the template below to generate the schema for the table(s). (2 points)