Scraping various COVID-19 Data to create comparisons and tweeting the findings

Bot in action

There are 5 components to this project:

1.) Beautiful soup webscraper of country specific covid data

Used Beautiful Soup to obtain information from (https://www.worldometers.info/coronavirus/) about the "Total cases per 1M Population" for the following countries: Hong Kong, Germany, India, USA
Searched for elements with the class attribute of "mt_a" as they represented the country names
Filtered out the countries I was interested in, by looking at the .text attribute
Used find_parent() to get the whole row data of the country, and used find_siblings() to get all statistics
The 8th index of find_siblings() was the "Total cases per 1M Population" information
Stored the information in a dictionary and returned this dictionary in the function

2.) Selenium webscraper of indian city specific covid data

Used Selenium and Chrome webdriver to obtain information from a dynamic website (https://www.incovid19.org/) about total confirmed cases and daily change in cases for the following cities/states: Chennai, Bangalore (Urban), Delhi
WARNING: The chromedriver on this repo is for Linux
Found all relevant elements to click and scrape data from using the full XPath
General process
1.) click on relevant element to sort the states by alphabetical order
2.) click on the state
3.) click on relevant element to sort the districts/cities of the state by alphabetical order
4.) scrape the relevant data from the particular district/city
5.) store this data in a dictionary
Returned the dictionary of total confirmed cases in respective function, and also returned daily change in cases for respective function

3.) Storing data in JSON files

Since the values of the dictionaries were still strings, I had to first removed any special characters/punctuation such as commas or up arrows, and then convert the values to integer type. I achieved this by using a list comprehension of the items() of the original dictionary but with my changes, and then creating the new dictionary using dict(). Also used Python context managers during file handling, and json.dump() to "dump" the JSON object in the file.

4.) Created Graphs

Used Matplotlib to make graphs, then saved them locally, and returned filepath of the graph image for future use.

5.) Tweeted results

Retrieved my twitter developer keys/tokens from local JSON file (not on Github because of Gitignore)
Used Tweepy library and my keys/tokens to make it easier to create the "auth" using the OAuthHandler, set the access_token of auth, and finally create my API
Used the result dictionaries from above web scraping functions to output the relevant data in a tweet by updating status.
Similarly tweeted the matplotlib generated graph images by updating status.

Extra

Added decorators to measure the time taken for each particular data scraping function
Applied some Python unit testing on 12th April 2022 for one of the data scraping functions

Future plans for project:

1.) Deploy the python script using Cron on local Linux virtual machine or on the cloud using AWS Lambda, so I don't have to press run
2.) Add type annotations for better documentation
3.) Break down the large functions into many smaller functions and better variable/function names\

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.vscode		.vscode
__pycache__		__pycache__
covid_graph_images		covid_graph_images
jsonCovidData		jsonCovidData
.gitignore		.gitignore
CountryCovidScraper.py		CountryCovidScraper.py
README.md		README.md
chromedriver		chromedriver
create_graph.py		create_graph.py
indianCitiesCovidScraper.py		indianCitiesCovidScraper.py
json_creation.py		json_creation.py
main.py		main.py
test_covid_webscraper.py		test_covid_webscraper.py
timer_decorator.py		timer_decorator.py
tweeting.py		tweeting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping various COVID-19 Data to create comparisons and tweeting the findings

Bot in action

1.) Beautiful soup webscraper of country specific covid data

2.) Selenium webscraper of indian city specific covid data

3.) Storing data in JSON files

4.) Created Graphs

5.) Tweeted results

Extra

Future plans for project:

About

Releases

Packages

Languages

HS116/Covid-Webscraper-with-Twitter-Bot

Folders and files

Latest commit

History

Repository files navigation

Scraping various COVID-19 Data to create comparisons and tweeting the findings

Bot in action

1.) Beautiful soup webscraper of country specific covid data

2.) Selenium webscraper of indian city specific covid data

3.) Storing data in JSON files

4.) Created Graphs

5.) Tweeted results

Extra

Future plans for project:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages