[go: nahoru, domu]

跳至主要內容
前往資訊主頁
不確定該從哪裡著手嗎?歡迎進行簡短測驗,以便取得個人化建議。
課程 9 共 13 堂
Google Sheets: Scraping data from the internet
Data Journalism
Permissions: Source Google Data.
Dataset Search Quickstart Guide
Public Data Explorer: Access a world of data.
Google Trends: Understanding the data.
Google Data GIF Maker
Google Sheets: Visualizing data
Data Source: Global Forest Watch.
Google Sheets: Cleaning data
Data Studio: Make interactive data visualizations
Visualizing Data: Introduction to Tilegrams.
Visualizing Data: Advanced Tilegrams.
課程
0% 已完成
5 分鐘 以完成

Google Sheets: Scraping data from the internet

DataJournalism_GoogleSheetsScrapingDatafromtheInternet_lessonoverview_UEMWXbb.jpg

Build your own data sets using Google Sheets.

DataJournalism_GoogleSheetsScrapingDatafromtheInternet_lessonoverview_UEMWXbb.jpg

Learn to build your own data sets using Google Sheets.

DataJournalism_GoogleSheetsScrapingDatafromtheInternet_lessonoverview.jpg

There is a massive amount of data available on the internet that you can use to research and visualize stories. Finding the data, and getting it into a format you can work with is the first step.

  1. Starting a new spreadsheet.
  2. Finding reliable data.
  3. Importing data to Google Sheets.
  4. Troubleshooting and error messages. 
  5. Displaying your data.

For more Data Journalism lessons, visit:

https://newsinitiative.withgoogle.com/training/course/data-journalism


DataJournalism_GoogleSheetsScrapingDatafromtheInternet_lessonoverview.jpg

Starting a new spreadsheet.

Starting a new spreadsheet.


First, you need to create a blank spreadsheet. Go to sheets.google.com. Under Start a new spreadsheet, click the + icon.


To name your spreadsheet, click the text in the top left corner. Let's name this one "Highest Grossing Movies."

Finding reliable data.

By sourcing data from government sites, scientific publications, Wikipedia, Google Public Data Explorer and more, you can tell data stories on almost any topic. In this lesson, we’ll practice with data about movies.



Go to google.com and search highest grossing films. One of the first links should be a Wikipedia entry with multiple tables. One list, called “the top 50 highest-grossing films of all time” cites multiple references, so we will use that one. Always check to make sure you’re scraping data from reliable sources.


To import this table to Google Sheets, copy the address of the Wikipedia page by highlighting the URL, right clicking on it, and selecting copy.

Importing data to Google Sheets.

We’ll use importHTML to import the table from Wikipedia to our spreadsheet. This powerful formula is built into Google Sheets to help you import tables or lists from web pages. To learn more about how importHTML works and see examples, read the Google Sheets documentation pages.



The importHTML tool needs three parameters to work: 1) a URL2) the type of data we’re collecting, either a table or list3) the number representing the position of the table or list in the HTML code. In this example, the first instance of a table would be numbered as one, as the table we want is the first one that shows up in the HTML. You can use trial and error to find what the position of the table is (1, 2, 3, etc.) or right click the webpage, select Inspect > Find to locate the table in the code.


Go  to the blank sheet you created and navigate to cell A1. Type:=importHTML("https://en.wikipedia.org/wiki/List_of_highest-grossing_films", "table", 1)


Notice that the URL and the element type (in our case, table) go between quotes — this will make the parameters green. The last parameter is a number not within quotes and it will be colored blue.

Troubleshooting and error messages.

ScrapingData_Troubleshooting_and_error_messages.jpg

If you get an ERROR! Message, check to make sure the quotes are double quotes as shown in the example. 


If you get a VALUE! error, check to make sure you don’t have extra parentheses or quotation marks in the cell.

ScrapingData_Troubleshooting_and_error_messages.jpg

Displaying your data.

ScrapingData_Displaying_your_data_mcss7kz.jpg

Once your ImportHTML formula is correct, press enter and give Google Sheets a couple of seconds. The table should load with all the rows and columns formatted. 


Notice that there are some elements we need to remove so that we can visualize this data. We will learn this in the next lesson, “Google Sheets: Cleaning data.”

ScrapingData_Displaying_your_data_mcss7kz.jpg

Congratulations!

CleaningData_Overview_9zSutWO.jpg

You completed “Google Sheets: Scraping data from the internet.”


To continue building your digital journalism skills and work toward Google News Initiative certification, go to our Training Center website and take another lesson:

For more Data Journalism lessons, visit:

newsinitiative.withgoogle.com/training/course/data-journalism


CleaningData_Overview_9zSutWO.jpg
恭喜!您已完成 Google Sheets: Scraping data from the internet 是,正在進行中
為你推薦
你對這個課程的評價為何?
你的意見可協助我們持續改進我們的課程!
要離開並失去進度嗎?
如果離開這個頁面,你將失去當前所完成的所有課程進度。確定要繼續並失去已完成的課程進度嗎?