Data Management in SQL
Leonid Shpaner
Databases
Database | Description |
---|---|
FaceX | Fabricated dataset composed of a variety of tables meant to mimic the membership profiles of users on a social networking site like Facebook. |
Billboard | This is a small dataset consisting of Billboard rankings of songs, information about artists, their songs, and an extract of their tweets from Twitter. This data can be used to answer different kinds of questions related to the music industry. |
Movies | This is a small dataset that comprises data on gross revenue collection for a few movies and Oscar nominations. It can be used to answer different kinds of questions regarding filtering, summarizing, and joining of data tables. |
Exploring Data
Code: Extracting Data Using SELECT DISTINCT
Database: FaceX
-
Write the query to get the unique concentrations from the
concentration
table (Using the concentration table, your output table should have one column: Concentration). - Modify your query above to sort the output in increasing alphabetical order (Using the concentration table, your output table should have one column: Concentration).
-
Write a query to get all the distinct combinations of Status, Sex, and Political Views from the
profiles
table in the FaceX database (Using the profiles table, your output table should have three columns: Status, Sex, and PoliticalViews). - Modify your query above to sort the output by Political Views, Status, and Sex (Using the profiles table, your output table should have three columns: Status, Sex, and PoliticalViews).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire concentration
table.
1 2 3 4 |
SELECT DISTINCT Concentration FROM concentration |
Result: Click here to view the table for the executed query.
1 2 3 4 5 6 |
SELECT DISTINCT Concentration FROM concentration ORDER BY Concentration; |
Result: Click here to view the table for the executed query.
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire profiles
table.
1 2 3 4 5 6 |
SELECT DISTINCT Status, Sex, PoliticalViews FROM profiles; |
Result: Click here to view the table for the executed query.
1 2 3 4 5 6 7 8 9 10 |
SELECT DISTINCT Status, Sex, PoliticalViews FROM profiles ORDER BY PoliticalViews, Status, Sex; |
Result: Click here to view the table for the executed query.
Module 1 Assignment: Extracting Information From Raw Data
Database: Billboard
Instructions:
In this module assignment, you will apply concepts covered over the course of this module to a series of questions involving extraction. Using the Billboard dataset, respond to each question.
- Write a query to print the song titles from the Songs table and limit your output to 5 rows only (Using the Songs table, your output table should have one column: Song).
- Write a query to print the names of all artists from the Songs table. Your query should not contain any other columns from this table (Using the Songs table, your output table should have one column: Artist).
- Modify the query above so that the names of artists do not repeat. Your query output should contain the name of each distinct artist only once (Using the Songs table, your output table should have one column: Artist).
- Write an aggregation query to count the total number of rows in the Artists table (Using the Artists table, your output table should have one column: count()*).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire songs
table.
1 2 3 4 5 6 |
SELECT Song FROM Songs LIMIT 5; |
Result: Click here to view the table from the query.
1 2 3 4 |
SELECT Artist FROM Songs; |
Result: Click here to view the table from the query.
1 2 3 4 |
SELECT DISTINCT Artist FROM Songs; |
Result: Click here to view the table from the query.
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire artists
table.
1 2 3 4 |
SELECT COUNT(*) FROM Artists; |
Code: Filter Data Using the WHERE
Statement Part 1
Database: FaceX
Activity Questions:
-
Write a query to print the names of all FaceX members in the
profiles
table who were born in the year 1985 (Using the profiles table, your output table should have one column: Name). -
Write a query to print the ProfileIds of all FaceX members who list “Harry Potter” as a favorite book in the
favoritebooks
table (Using the favoritebooks table, your output table should have one column: ProfileID).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire profiles
table.
1 2 3 4 5 6 |
SELECT Name FROM profiles WHERE Birthday = 1985; |
Result: Click here to view the table from the query.
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire favoritebooks
table.
1 2 3 4 5 6 |
SELECT ProfileID FROM favoritebooks WHERE FavoriteBook = "Harry Potter"; |
Result: Click here to view the table from the query.
Code: Filter Data Using the WHERE
Statement Part 2
Database: FaceX
Activity Questions:
-
Using the
favoritemovies
table in the FaceX dataset, write a query to output all of the favorite movies (repeats are fine) that include “harry” anywhere in the title. Your query should not be case-sensitive, i.e., its output should include all movies regardless of whether their names include ‘Harry’, ‘harry’, ‘HARRY’, etc (Using the favoritemovies table, your output table should have one column: FavoriteMovie). -
Write a query on the
profiles
table in the FaceX database to output the names of all users whose name starts in ‘David’. Your query should output only the names of the users (Using the profiles table, your output table should have one column: Name). -
Write a query on the
favoritemovies
table to output all the records where ‘star war’ is included anywhere in the FavoriteMovie column. Your query should not be case-sensitive (Using the favoritemovies table, your output table should include all columns from the original table).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire favoritemovies
table.
1 2 3 4 5 6 |
SELECT FavoriteMovie FROM favoritemovies WHERE FavoriteMovie LIKE '%harry%'; |
Result: Click here to view the table from the query.
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire profiles
table.
1 2 3 4 5 6 |
SELECT Name FROM profiles WHERE Name LIKE 'David %'; |
Result: Click here to view the table from the query.
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire favoritemovies
table
1 2 3 4 5 6 |
SELECT * FROM favoritemovies WHERE FavoriteMovie LIKE '%star war%'; |
Result: Click here to view the table from the query.
Code: Combine Filtration with Aggregation Part 1
Database: FaceX
Activity Questions:
- Write a query on the concentration table in the FaceX dataset to count the number of users whose concentration is ‘Computer Science’ (Using the concentration table, your output table should have one column: count()*).
-
Write a query on the
profiles
table to count the number of FaceX users who are ‘Female’ and whose birthday is 1985 and whose political views are ‘Conservative’ (Using the profiles table, your output table should have one column: count()*). -
Write a query on the
profiles
table to count the number of FaceX users whose Sex is ‘Male’ and who were not born in 1985 or 1986. Use ‘<>’ in the conditions for birthday (Using the profiles table, your output table should have one column: count()*).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire concentration
table.
1 2 3 4 5 6 |
SELECT COUNT(*) FROM concentration WHERE Concentration = 'Computer Science'; |
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire profiles
table.
1 2 3 4 5 6 7 8 |
SELECT COUNT(*) FROM profiles WHERE Sex = 'Female' AND Birthday = '1985' AND PoliticalViews = 'Conservative'; |
1 2 3 4 5 6 7 8 |
SELECT COUNT(*) FROM profiles WHERE Sex = 'Male' AND Birthday <> '1985' AND Birthday <> '1986'; |
Module 2 Assignment: Filtering Relevant Data
Database: Billboard
Instructions:
In this module assignment, you will apply concepts covered over the course of this module to a series of questions involving filtration. Using the Billboard dataset, respond to each question.
- Write a query on the Songs table to print Taylor Swift’s songs along with the name of the Artist (Using the Songs table, your output table should have two columns: Song and Artist).
-
Write a query on the
Rankings
table to print all songs that made it into the top 3 (Using the Rankings table, your output table should have one column: Song). - Note that the output of the query in Question 2 contains repetitions when a song occurs more than once in the top 3. Modify the query so that each such song is listed only once (Using the Rankings table, your output table should have one column: Song).
-
Let’s switch to the
tweets
table. Write a query to output all the tweets by Bruno Mars (Using the Tweets table, your output table should include all columns from the original table). - Modify the above query to compute the average number of retweet counts for Bruno Mars (Using the Tweets table, your output table should have one column: avg(retweet_count)).
- It is thought that tweets that contain a hashtag get retweeted more than those that don’t. To test this hypothesis, modify the query in Question 5 to compute the average of retweet_count for only those tweets of Bruno Mars that contain a ‘#’ anywhere in the text of the tweet (Using the Tweets table, your output table should have one column: avg(retweet_count)).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire songs
table.
1 2 3 4 5 6 7 |
SELECT Song, Artist FROM Songs WHERE Artist = 'Taylor Swift'; |
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire rankings
table.
1 2 3 4 5 6 |
SELECT Song FROM Rankings WHERE Rank <= 3; |
Result: Click here to view the table from the query.
1 2 3 4 5 6 |
SELECT DISTINCT Song FROM Rankings WHERE Rank <= 3; |
Result: Click here to view the table from the query.
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire tweets
table.
1 2 3 4 5 6 |
SELECT * FROM Tweets WHERE user_screen_name = 'BrunoMars'; |
Result: Click here to view the table from the query.
1 2 3 4 5 6 |
SELECT AVG(retweet_count) FROM Tweets WHERE user_screen_name = 'BrunoMars'; |
1 2 3 4 5 6 7 |
SELECT AVG(retweet_count) FROM Tweets WHERE user_screen_name = 'BrunoMars' AND text LIKE '%#%'; |
Code: Combine Filtration with Aggregation Part 2
Database: Movies
Activity Questions:
-
List all distinct persons in the
OscarNoms
table in the MOVIES database who received a nomination for the movie Boyhood (Using the OscarNoms table, your output table should have one column: Person). - Modify the above query to find the number of nominations for each person in the OscarNoms table. Each such person’s name should appear only once in your list (Using the OscarNoms table, your output table should have two columns: Person and freq).
First, let’s query the entire table to get a better visual sense of what is in it:
Click here to view the entire OscarNoms
table.
1 2 3 4 5 6 |
SELECT DISTINCT Person FROM OscarNoms WHERE Movie = 'Boyhood'; |
1 2 3 4 5 6 7 |
Select Person, COUNT(*) AS TotalNominations FROM OscarNoms GROUP BY Person; |
Result: Click here to view the table from the query.