PgSQLLoader PR 1 Base Commit #11

ram-searce · 2024-02-06T15:21:38Z

Files Added:
pgsql_loader.py: Document Loader Class file
pgsql_engine.py: Temporary (To be updated post Vectorstore PR Merge)

Implemented Methods:
load() - Partial Implementation
alazy_load()
load_and_split() - Partial Implementation

Completed Scope:
[P0] Integrate with AlloyDBEngine & PgSQLEngine
[P0] Load documents via default table
[P0] Load documents via custom table/metadata
[P0] Load documents via custom page content columns [P0] Load documents via custom metadata columns
[P0] Load documents via query
If a JSON column is listed as a metadata column with the name, “langchain_metadata”, it will be used as the base dictionary. Other column data will be added and may overwrite the original value

To Be Done:
Integration Testing
[P0] Support text splitter
[P1] Set page content format
[P1] Read only query protection
[P2] Use custom page content formaer
[P3] Set timeout for query
Code Comments
Testing of optional and mandatory params and their behavior load() returns an async generator instead of a list. Need to understand what would be the right approach AlloyDBDocumentSaver Class
Update pgsql_engine once VectorStore PR is approved

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Files Added: pgsql_loader.py: Document Loader Class file pgsql_engine.py: Temporary (To be updated post Vectorstore PR Merge) Implemented Methods: load() - Partial Implementation alazy_load() load_and_split() - Partial Implementation Completed Scope: [P0] Integrate with AlloyDBEngine & PgSQLEngine [P0] Load documents via default table [P0] Load documents via custom table/metadata [P0] Load documents via custom page content columns [P0] Load documents via custom metadata columns [P0] Load documents via query If a JSON column is listed as a metadata column with the name, “langchain_metadata”, it will be used as the base dictionary. Other column data will be added and may overwrite the original value To Be Done: Integration Testing [P0] Support text splitter [P1] Set page content format [P1] Read only query protection [P2] Use custom page content formaer [P3] Set timeout for query Code Comments Testing of optional and mandatory params and their behavior load() returns an async generator instead of a list. Need to understand what would be the right approach AlloyDBDocumentSaver Class Update pgsql_engine once VectorStore PR is approved

averikitsch

This is headed in the right direction. googleapis/langchain-google-cloud-sql-mysql-python#16 can be used as an example

averikitsch · 2024-02-06T22:43:13Z

src/langchain_google_cloud_sql_pg/pgsql_loader.py

+        return self.alazy_load()
+
+    # Partially Implemented
+    def load_and_split(


This should be inherited from the interface we just need "load()" defined

averikitsch · 2024-02-06T22:45:13Z

src/langchain_google_cloud_sql_pg/pgsql_loader.py

+        self.metadata_columns = metadata_columns
+        self.format = format
+        self.read_only = read_only
+        self.time_out = time_out


Here's an example of some checks needed https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/pull/16/files#diff-b2d76ce581e196ff223982e658332b500382105a001bf6808ac73a36486cfeb5R94

averikitsch · 2024-02-06T22:46:24Z

src/langchain_google_cloud_sql_pg/pgsql_loader.py

+    # Partially Implemented
+    def load(self) -> List[Document]:
+        """Load CloudSQL Postgres data into Document objects."""
+        return self.alazy_load()


Suggested change

return self.alazy_load()

return list(self.alazy_load())

list() will be needed to convert the iterator to a list

ram-searce requested a review from a team as a code owner February 6, 2024 15:21

product-auto-label bot added the api: cloudsql-postgres Issues related to the googleapis/langchain-google-cloud-sql-pg-python API. label Feb 6, 2024

averikitsch added 2 commits February 6, 2024 13:28

Merge branch 'main' into feat-add-pgsql-loader

4e7fe96

Merge branch 'main' into feat-add-pgsql-loader

b757f6b

averikitsch added the tests: run label Feb 6, 2024

averikitsch reviewed Feb 6, 2024

View reviewed changes

averikitsch closed this Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PgSQLLoader PR 1 Base Commit #11

PgSQLLoader PR 1 Base Commit #11

PgSQLLoader PR 1 Base Commit #11

PgSQLLoader PR 1 Base Commit #11

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment