Wikipedia Crawler

Objective

The objective of the package is to crawl Wikipedia articles in order to find a path from a given source page to a destination page through hyperlinks.

Overview

The web scraper works by categorizing the destination page into one of ten possible categories: Business, Entertainment, Food, Graphics, Historical, Medical, Politics, Space, Sport, and Technology. The classification is performed using a Gaussian Naive Bayes classifier. The scraper then evaluates all pages that are hyperlinked to the source page, and estimates how relevant the pages are to the category of the destination page. Pages are visited in decreasing order of relevance to the category of the destination page, and the process is repeated until the destination is reached.

Dataset Details

Dataset name: Dataset Text Document Classification
Uploader: Jensen Baxter
Source: Kaggle

Link to dataset : Dataset-text-document-classification

Technologies Used

Model            - Gaussian Naive Bayes
Search Algorithm - Greedy Best First Search
Frontend         - HTML with Bootstrap, jQuery
Backend          - Django

Website

Team

Saketh Raman KS - 19PW26
Sanjay T - 19PW28

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
core		core
model		model
pages		pages
scrapper		scrapper
screenshots		screenshots
.gitignore		.gitignore
config.py		config.py
main.py		main.py
manage.py		manage.py
readme.md		readme.md
requirements.txt		requirements.txt
train_model.ipynb		train_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Crawler

Objective

Overview

Dataset Details

Technologies Used

Website

Team

Project Motive

Final Project for Artificial Intelligence

About

Releases

Packages

Languages

sanjay-06/wikipedia_scrapper

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Crawler

Objective

Overview

Dataset Details

Technologies Used

Website

Team

Project Motive

Final Project for Artificial Intelligence

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages