User Tools

Site Tools


labs:namethatmovie

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
labs:namethatmovie [2016/04/22 10:20]
lpeer created
labs:namethatmovie [2020/08/31 21:03] (current)
Line 11: Line 11:
  
   - Extract all relevant entries from the academy award nominees table on the linked website.   - Extract all relevant entries from the academy award nominees table on the linked website.
-  - Design a database schema and create a sqlite database with the collected data. Make sure you give your database the columns: ** year **, ** event **,  **movie**, **actor**, **role** and **won**.+  - Design a database schema and create a sqlite database with the collected data. Make sure you give your database the columns: ** year **, ** event **,  **movie**, **actor**, **role** and **won**. Be careful to [[https://​docs.python.org/​2/​library/​sqlite3.html|escape]] all strings when adding rows to the database.
  
  
Line 18: Line 18:
 Answer the following questions by querying the database you've created in Exercise 1. If you need help and cannot google a solution, feel free to ask the assistants. Answer the following questions by querying the database you've created in Exercise 1. If you need help and cannot google a solution, feel free to ask the assistants.
  
-  - Which actor and which actress have won the first ever Academy Award for Best Actor/​Actress?​+  - Which actor and which actress have won the first ever Academy Award for Best Actor/​Actress ​(in our dataset)?
   - Which actor and which actress have won the most Academy Awards?   - Which actor and which actress have won the most Academy Awards?
   - Which actor and which actress have been nominated for the most Academy Awards?   - Which actor and which actress have been nominated for the most Academy Awards?
Line 25: Line 25:
 ===== Exercise 3 - Crawl Rottentomatoes ===== ===== Exercise 3 - Crawl Rottentomatoes =====
  
-To keep web traffic low and reduce the risk of being blacklisted,​ we have cloned some rottentomatoes pages and are hosting them locally. You can access the detail page through a unique url. Combine the year and movie title like this: http://​10.0.0.2/​m/​year/​title to access the local clone of the movie detail page. ++ Hint  | Transform the movie title to lower case. Remove any apostrophe characters (') and replace spaces and backslashes (/) with underline characters (_)  +++To keep web traffic low and reduce the risk of being blacklisted,​ we have cloned some rottentomatoes pages and are hosting them locally. You can access the detail page through a unique url. Combine the year and movie title like this: http://​10.0.0.1/​m/​year/​title to access the local clone of the movie detail page. (Transform the movie title to lower case. Remove any apostrophe characters (') and replace spaces and backslashes (/) with underline characters (_)).
  
     - Visit any of the local movie sites. Which element contains the [[https://​en.wikipedia.org/​wiki/​Rotten_Tomatoes#​Tomatometer_critic_aggregate_score|tomatometer]] score of the movie? Which element contains the audience score?     - Visit any of the local movie sites. Which element contains the [[https://​en.wikipedia.org/​wiki/​Rotten_Tomatoes#​Tomatometer_critic_aggregate_score|tomatometer]] score of the movie? Which element contains the audience score?
-    - Access each of the cloned websites, extract the tomatometer and the audience score and insert them into the previously created Academy Awards database as additional columns **tomatometer** and ** audience_score **. You may not necessarily find every movie on our local server. Also, occasionally,​ you'll see movies that don't have a tomatometer score. ​How should ​you handle ​the situation when you encounter ​such a missing page/tomatometer?+    - Access each of the cloned websites, extract the tomatometer and the audience score and insert them into the previously created Academy Awards database as additional columns **tomatometer** and ** audience_score **. Some movies are missing ​on our local server. Also, occasionally,​ you'll see movies that don't have a tomatometer score. ​Think about how you want to handle such a missing ​movie page or tomatometer ​score.
  
  
 ===== Exercise 4 - Query the IMDB Database ===== ===== Exercise 4 - Query the IMDB Database =====
  
-We've collected some of information about movies from various sources and have compiled a database with a number of tables. You can find the sqlite [[http://pc-10129.ethz.ch/sqlquery/images/​moviedb.sqlite|here]]. Familiarize yourself with the [[http://​pc-10129.ethz.ch/​sqlquery/​schema|schema]] and contents and answer the following questions. If you need help and cannot google a solution, feel free to ask the assistants.+We've collected some of information about movies from various sources and have compiled a database with a number of tables. You can find the sqlite [[http://10.0.0.1/download/​moviedb.sqlite|here]]. Familiarize yourself with the [[http://​pc-10129.ethz.ch/​sqlquery/​schema|schema]] and contents and answer the following questions. If you need help and cannot google a solution, feel free to ask the assistants.
   - The creator of the database was sloppy and accidentally entered some movies twice. How can you find out which? Remove them from the database! Make sure to also remove the dependent foreign key constraints from other tables.   - The creator of the database was sloppy and accidentally entered some movies twice. How can you find out which? Remove them from the database! Make sure to also remove the dependent foreign key constraints from other tables.
   - How many of the movies you crawled in the first exercise are already in the IMDB db? Which are missing?   - How many of the movies you crawled in the first exercise are already in the IMDB db? Which are missing?
Line 52: Line 52:
   - Use a plot to decide upon the truth of the following statement: "The movie title length is a direct indicator of the tomatometer score of the movie"   - Use a plot to decide upon the truth of the following statement: "The movie title length is a direct indicator of the tomatometer score of the movie"
   - Plot Robert De Niro's career trajectory visualizing the year on the x-axis and the average tomatometer score on the y-axis.   - Plot Robert De Niro's career trajectory visualizing the year on the x-axis and the average tomatometer score on the y-axis.
-  - Plot the average tomatometer score over all movies by year. How do you explain the downward trend? +  - Plot the average tomatometer score over all movies by year. 
-  - Plot the average tomatometer score per year of the top five actors and the top five actresses. ​++ Hint  | Rank the actors according to their overall average tomatometer score. Add a legend to the plot with the names of the actors.  +++  - Plot the average tomatometer score per year of the top five actors and the top five actresses. ​(Rank the actors according to their overall average tomatometer score. Add a legend to the plot with the names of the actors)
 ===== Bonus - Crawl Again ===== ===== Bonus - Crawl Again =====
  
 Scrape any information from a website of your choosing. Scrape any information from a website of your choosing.
  
labs/namethatmovie.1461313238.txt.gz · Last modified: 2020/08/31 21:03 (external edit)