-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using metadata to boost the performance of ExtractiveReader #5640
Comments
Hey @sjrl, could this be a feature of If so, let's change the title and mark this as a Haystack 2.x feature request. If not, let's figure out why 🙂 |
Yes definitely. This could be a feature for ExtractiveReader. |
I dont understand why it should be a meta field. Can't this info be added to documents during preprocessing? In any case, if it is urgent for any of the clients, feel free to open a lightweight PR. I would prefer though to handle it outside of the Reader. |
I think often we will not want this additional information to be allowed to be returned as an answer by the reader. So this point from my original description:
That's why just directly adding it to the preprocessed document would not work.
Given that I think preferably we would not allow this additional text to be returned as an answer I think it would be better to integrate it within the ExtractiveReader. What do you think? |
Mh, still not sure about this. In the prompt, users can check what was passed to the model. With Extractive QA we want to ensure even more that the user can check the predictions properly. Without the adiitional_context this might not be possible. What I like about this idea is that it is similarly designed like embed_meta_fields of embedders. Feel free to open a lightweight PR for this feature. |
I would say that
However, this is a really good point. Maybe a compromise could be that we add the additional_context to the document in the returned Haystack Answer so the user can see it, but we still restrict the model from returning the additional_context as part of the answer? |
Is your feature request related to a problem? Please describe.
I would like to be able to use meta information to provide context to the TransformerReader or the FARMReader to boost the performance of answering questions in a similar way to how we can use
embed_meta_fields
to boost the performance of EmbeddingRetrievers. Sometimes meta information is needed to distinguish between similar documents.We have had multiple clients face this exact problem because they are retrieving info from lots of legal PDF files which have a lot of boilerplate text and often define things like company name once at the beginning of a 60-page PDF.
Describe the solution you'd like
As motivation I'd like to walk through an example where being able to add meta information from a document to the Reader at query time would be beneficial. Pretend I have two docs that have a similar structure and contain similar information, but about two different companies:
Document 1 (comes from pear_llc_contract.pdf)
Document 2 (comes from rainforest_contract.pdf)
I would like to ask the question "What is the company ID of Pear LLC?" However, nowhere in the content of the document does it specify the name of the companies involved in the deal. So if provide these two documents to a FARMReader I should get about a 50/50 chance of getting the correct answer.
However, if I could specify a new variable (e.g.
embed_meta_fields
like we can for EmbeddingRetrieversthen the FARMReader will have the necessary context to answer the question.
Additional context
additional_context
as an answer, since the additional_context will not be present in the returned Document to the user.The text was updated successfully, but these errors were encountered: