[go: nahoru, domu]

Skip to content

Analyzing MovieLens 25M Dataset and MovieLens 10M Dataset and building a Movie Recommendation System using Pearson Coefficient.

Notifications You must be signed in to change notification settings

silentkinght25/MovieLens_DataSet

Repository files navigation

MovieLens_DataSet

The main objective of this project is to analyse the data and create a movie recommender system.
Going in detailed, this project will walk through the steps importing python libraries, loading data into dataframe, optimising dataframe, data manipulation.

We will divide our work in following categories:

  1. Data Analysis
    • Descriptive statistcs: provide ground knowldege about the features and relations within the dataset
    • Visualization: good for overview & understanding underlying relation between data using dynamic plots like plotly and seaborn, and creating wordcloud.

  2. Building Movie Recommendation System
    • Loading Raw Data in a seperate notebook
    • Creating a pivot table in batches and appending the dataframe for optimisation and analysing what possible error one can encounter while running huge batches
    • Computing correlation between columns of data
    • Cleaning up the final movie suggestions

GroupLens Research has collected and made available rating data sets from the MovieLens web site (https://movielens.org). The dataset I’m downloading and using is the “MovieLens 25M Dataset” which includes 25 million reviews. The data sets were collected over various periods of time with the most recent data from 2019.

Releases

No releases published

Packages

No packages published