[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Airbyte-ci): add command generate-erd-schema #43310

Merged
merged 24 commits into from
Aug 20, 2024

Conversation

artem1205
Copy link
Collaborator
@artem1205 artem1205 commented Aug 5, 2024

What

add new command in airbyte-ci connectors generate-erd and addresses https://github.com/airbytehq/airbyte-internal-issues/issues/9172

How

report /steps example
image

Review guide

  1. airbyte-ci/connectors/erd/relationships.py: how we manage relationships
  2. airbyte-ci/connectors/erd/dbml_assembler.py: how we generate the dbml file
  3. airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/generate_erd: the command logic
  4. the rest

User Impact

Given a couple of pre-requisite, users will now be able to generate ERDs in dbdocs using airbyte-ci. For more information, read the README

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Copy link
vercel bot commented Aug 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Aug 20, 2024 3:06pm

artem1205 and others added 9 commits August 6, 2024 12:22
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
add parsers for discover, gemini md output;
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
[skip ci]

Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
@maxi297 maxi297 marked this pull request as ready for review August 15, 2024 18:36
@maxi297 maxi297 requested a review from a team as a code owner August 15, 2024 18:36
@@ -18,6 +18,9 @@ more-itertools = "^8.11.0"
docker = "^7"
semver = "^3"
airbyte-protocol-models = "*"
# In an ideal world, we would import the version of airbyte-ci for the connector airbyte-ci will interact with
# Also note that the version that this will currently import is <2 because of the version clash with pydantic
airbyte-cdk = "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment to highlight this comment

Copy link
Contributor
@maxi297 maxi297 Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved outside of the pipelines and therefore is not relevant anymore. See #43310 (review) for more details

Copy link
Contributor
@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recreating comments as GitHub has flagged them as outdated

.with_exec(["npm", "install", "-g", "dbdocs"])
.with_env_variable("DBDOCS_TOKEN", DBDOCS_TOKEN)
.with_workdir("/airbyte_dbdocs")
.with_new_file("/airbyte_dbdocs/dbdocs.dbml", contents=source_dbml_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen some weird cases where the dbdocs diagram wasn't updated as part of the airbyte-ci command but running the same command locally would update the dbdocs project. Can we trust that if source_dbml_content is different, the cache won't be triggered or is it safer to add .with_env_variable("CACHEBUSTER", str(uuid.uuid4()))?

Copy link
Collaborator Author
@artem1205 artem1205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor
@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting change to ask you to follow the airbyte-ci is just an orchestrator principle

I suggest declaring ERD generation logic in an independent python package which would take inputs passed from airbyte-ci.
airbyte-ci would consume your package output and export back the ERDs to the connector container:

Your airbyte-ci step would:

  • Run discover on the connector and capture the catalog
  • Mount the catalog to the container in which your package will be installed
  • Run the erd generation command in your package
  • Export the generated ERD files back to the host in the connector folder

Copy link
Contributor
@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I did not review the internal logic of your ERD packages, I focused on airbyte-ci.

I've a lot of minor suggestion but nothing blocking! Thanks for following my refactoring original suggest, you got this!

I'm not sure if you want to version control the ERD.
If you want: the current logic is 🆗 .

If you don't: I think it's better to pass the generated file from the erd container to your UploadToDbDocs step. It will avoid an intermediate "write to host FS / repo" operation.

If you still want to keep the same flow but don't plan on committing generated files (as it could be helpful for debugging to have the generated file on your FS), please add a .gitignore rule.

@maxi297
Copy link
Contributor
maxi297 commented Aug 19, 2024

If you still want to keep the same flow but don't plan on committing generated files (as it could be helpful for debugging to have the generated file on your FS), please add a .gitignore rule.

It has been discussed that we should have to ability to commit it so that we can see the different versions and debug. I'm not sure if we will do this at first but I want us to have the ability to do so.

@maxi297 maxi297 merged commit 9603c3c into master Aug 20, 2024
33 checks passed
@maxi297 maxi297 deleted the artem1205/airbyte_ci_erd_gen branch August 20, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants