[go: nahoru, domu]

Skip to content

weaviate-tutorials/DEMO-text-search-video-captions

Repository files navigation

Caption Search

This project's origin is here.

This is a demo example to show how to perform caption search using weaviate.
We will first fetch all the captions of a video and store them in weaviate. Then we will map all the indexes in that caption text to a particular time stamp at which those parts of the caption occur. We will be using youtube-transcript-api to fetch the captions.

caption.search.mp4

This example uses HTML, CSS, and Js for the frontend and NodeJs for the backend.

Prerequisites

  1. Install Docker and Docker-Compose
  2. Install Node and npm

Setup instructions

Follow the following steps to reproduce the example

  1. Run the following command to run the weaviate docker file
sudo docker-compose up -d
  1. Run the following command in the directory to install all required dependencies
pip install -r requirements.txt
  1. After installing all required python packages run the following command to install all required node modules.
npm install
  1. After adding data and installing modules run the following command and navigate to http://localhost:3000/, After reaching there enter the video URL and start performing Q&A on that video.
npm run start

Usage instructions

Some descriptions about queries:-
We have majorly used only two queries for this demo

  1. Query to fetch an answer,startIndex,endIndex for a particular searched question:- This query adds an additional ask {} parameter in the Get query of weaviate. This query returns a maximum of 1 answer which is available in _additional {} field of the results. The answer with the highest certainty will be returned. More Information of this query can be found here.
client.graphql
    .get()
    .withClassName('Caption')
    .withAsk({
      question: searched_question,
      properties: ["text"],
    })
    .withFields('_additional { answer { hasAnswer certainty property result startPosition endPosition } }')
    .withLimit(1)
    .do()
    .then(info => {
      return info
    })
    .catch(err => {
      console.error(err)
    });
  1. Query to fetch timestamp for particular starting index:-This query uses the where filter provided in weaviate which allows us to perform various arithmetic comparisons. More information about the Where filter can be found here. For this example we used the GreaterThan operator of the Where filter which allows us to filter the results which are greater than a certain threshold. More information on the GreaterThan operator can be found here.
client.graphql
    .get()
    .withClassName('Timestamps')
    .withFields(['startIndex', 'time'])
    .withLimit(1)
    .withWhere({
      operator: 'GreaterThan',
      path: ['endIndex'],
      valueNumber: parseInt(start_index)
    })
    .withSort([{ path: ['startIndex'], order: 'asc' }])
    .do()
    .then(info => {
      return info;
    })
    .catch(err => {
      console.error(err)
    })

Dataset license

(TO DO)