[go: nahoru, domu]

Skip to content

AmbiTyga/BestsellerLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BestsellerLLM

The project provides insight on scraping raw data from e-commerce website using Scrapy, and we save it to MongoDB using pymongo item-pipeline method. I wrote a series of blog which will help with understanding the project elements and how to implement:

Test Cases:

T1.mp4
T2.mp4
T3.mp4

Setup:

Create a python environment, I will be using virtual env. Open your OS CLI and run this:

python -m venv bestsellerLLM

with this you have setup your virtual env, lets intialize it, run the following command from the same directory:

  • For windows user:
    bestsellerLLM\\Scripts\\activate
    
  • For OSX or Linux users:
    source bestseller/bin/activate
    

Clone this repo:

git clone https://github.com/AmbiTyga/BestsellerLLM.git
cd BestsellerLLM

Next install dependencies of the project to your virtual env using the requirements.txt, by running:

pip install -r requirements.txt

Scraping

First setup your MongoDB, follow the instructions:

  • Download MongoDB Compass GUI from here and install it in your local system.
  • Open MongoDB compass and connect to your database server or create one if there's not any.
  • On the left-hand-side panel, right next to ↻ symbol, there's a ➕symbol. Click it to create a new database. Rename it to amazon.
  • Click on the database, and you'll see another ➕symbol, click it to create a new collection. Rename it to bestsellers.

Get back to project directory using your command line interface.

  • Move to ./scraper directory.
cd scraper
  • Initialize the spider using following command:
scrapy crawl amzSpider

Your data will get dumped into MongoDB database at each extraction

Loading LLM on UI

Note: I expect you have 10GB of VRAM in your system, or you are using a dedicated server with GPU runtime.

Move to ./Indexer directory using command line interface

cd Indexer

Run the gradio app:

python app.py

Go to localhost:8888

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages