User Tools

Site Tools


labs:crawling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
labs:crawling [2017/04/04 12:10]
sitanne [Exercise 2 - Crawl Rotten Tomatoes]
labs:crawling [2020/08/31 21:03] (current)
Line 1: Line 1:
-====== Crawling ======+====== ​Web Crawling ======
 Today we're going to learn how to crawl the web. Today we're going to learn how to crawl the web.
 The goal of today'​s lab is that you learn which elements are contained within websites and how to extract this structured information. The goal of today'​s lab is that you learn which elements are contained within websites and how to extract this structured information.
Line 15: Line 15:
   - Generate a text file with the information from the website, each entry on a new line.   - Generate a text file with the information from the website, each entry on a new line.
  
 +Hints:
 +  * Press ''​Ctrl''​ + ''​Shift''​ + ''​C''​ in Firefox to open the inspector.
 +  * Have a look at the available methods in jsoup to select elements: ''​getElementById'',​ ''​getElementsByTag'',​ ''​children'',​ ''​select'',​ etc.
 ===== Exercise 2 - Crawl Rotten Tomatoes ===== ===== Exercise 2 - Crawl Rotten Tomatoes =====
  
labs/crawling.1491300644.txt.gz ยท Last modified: 2020/08/31 21:03 (external edit)