[go: nahoru, domu]

Skip to content

The objective of the package is to crawl Wikipedia articles in order to find a path from a given source page to a destination page through hyperlinks.

Notifications You must be signed in to change notification settings

sanjay-06/wikipedia_scrapper

Repository files navigation

Wikipedia Crawler

Objective

The objective of the package is to crawl Wikipedia articles in order to find a path from a given source page to a destination page through hyperlinks.

Overview

The web scraper works by categorizing the destination page into one of ten possible categories: Business, Entertainment, Food, Graphics, Historical, Medical, Politics, Space, Sport, and Technology. The classification is performed using a Gaussian Naive Bayes classifier. The scraper then evaluates all pages that are hyperlinked to the source page, and estimates how relevant the pages are to the category of the destination page. Pages are visited in decreasing order of relevance to the category of the destination page, and the process is repeated until the destination is reached.

Dataset Details

Dataset name: Dataset Text Document Classification
Uploader: Jensen Baxter
Source: Kaggle

Link to dataset :  Dataset-text-document-classification

Technologies Used

Model            - Gaussian Naive Bayes
Search Algorithm - Greedy Best First Search
Frontend         - HTML with Bootstrap, jQuery
Backend          - Django    

Website


login


Team

Saketh Raman KS - 19PW26
Sanjay T - 19PW28

Project Motive

Final Project for Artificial Intelligence

About

The objective of the package is to crawl Wikipedia articles in order to find a path from a given source page to a destination page through hyperlinks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published