This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
labs:crawling [2017/04/04 12:10] sitanne [Exercise 2 - Crawl Rotten Tomatoes] |
labs:crawling [2020/08/31 21:03] (current) |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Crawling ====== | + | ====== Web Crawling ====== |
| Today we're going to learn how to crawl the web. | Today we're going to learn how to crawl the web. | ||
| The goal of today's lab is that you learn which elements are contained within websites and how to extract this structured information. | The goal of today's lab is that you learn which elements are contained within websites and how to extract this structured information. | ||
| Line 15: | Line 15: | ||
| - Generate a text file with the information from the website, each entry on a new line. | - Generate a text file with the information from the website, each entry on a new line. | ||
| + | Hints: | ||
| + | * Press ''Ctrl'' + ''Shift'' + ''C'' in Firefox to open the inspector. | ||
| + | * Have a look at the available methods in jsoup to select elements: ''getElementById'', ''getElementsByTag'', ''children'', ''select'', etc. | ||
| ===== Exercise 2 - Crawl Rotten Tomatoes ===== | ===== Exercise 2 - Crawl Rotten Tomatoes ===== | ||