[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgSQLLoader PR 1 Base Commit #11

Closed
wants to merge 3 commits into from

Conversation

ram-searce
Copy link
Contributor

Files Added:
pgsql_loader.py: Document Loader Class file
pgsql_engine.py: Temporary (To be updated post Vectorstore PR Merge)

Implemented Methods:
load() - Partial Implementation
alazy_load()
load_and_split() - Partial Implementation

Completed Scope:
[P0] Integrate with AlloyDBEngine & PgSQLEngine
[P0] Load documents via default table
[P0] Load documents via custom table/metadata
[P0] Load documents via custom page content columns [P0] Load documents via custom metadata columns
[P0] Load documents via query
If a JSON column is listed as a metadata column with the name, “langchain_metadata”, it will be used as the base dictionary. Other column data will be added and may overwrite the original value

To Be Done:
Integration Testing
[P0] Support text splitter
[P1] Set page content format
[P1] Read only query protection
[P2] Use custom page content formaer
[P3] Set timeout for query
Code Comments
Testing of optional and mandatory params and their behavior load() returns an async generator instead of a list. Need to understand what would be the right approach AlloyDBDocumentSaver Class
Update pgsql_engine once VectorStore PR is approved

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Files Added:
pgsql_loader.py: Document Loader Class file
pgsql_engine.py: Temporary (To be updated post Vectorstore PR Merge)

Implemented Methods:
load() - Partial Implementation
alazy_load()
load_and_split() - Partial Implementation

Completed Scope:
[P0] Integrate with AlloyDBEngine & PgSQLEngine
[P0] Load documents via default table
[P0] Load documents via custom table/metadata
[P0] Load documents via custom page content columns
[P0] Load documents via custom metadata columns
[P0] Load documents via query
If a JSON column is listed as a metadata column with the name, “langchain_metadata”, it will be used as the base dictionary. Other column data will be added and may overwrite the original value

To Be Done:
Integration Testing
[P0] Support text splitter
[P1] Set page content format
[P1] Read only query protection
[P2] Use custom page content formaer
[P3] Set timeout for query
Code Comments
Testing of optional and mandatory params and their behavior
load() returns an async generator instead of a list. Need to understand what would be the right approach
AlloyDBDocumentSaver Class
Update pgsql_engine once VectorStore PR is approved
@ram-searce ram-searce requested a review from a team as a code owner February 6, 2024 15:21
@product-auto-label product-auto-label bot added the api: cloudsql-postgres Issues related to the googleapis/langchain-google-cloud-sql-pg-python API. label Feb 6, 2024
Copy link
Collaborator
@averikitsch averikitsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is headed in the right direction. googleapis/langchain-google-cloud-sql-mysql-python#16 can be used as an example

return self.alazy_load()

# Partially Implemented
def load_and_split(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be inherited from the interface we just need "load()" defined

self.metadata_columns = metadata_columns
self.format = format
self.read_only = read_only
self.time_out = time_out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Partially Implemented
def load(self) -> List[Document]:
"""Load CloudSQL Postgres data into Document objects."""
return self.alazy_load()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.alazy_load()
return list(self.alazy_load())

list() will be needed to convert the iterator to a list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: cloudsql-postgres Issues related to the googleapis/langchain-google-cloud-sql-pg-python API. tests: run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants