[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python-sources): add unit integration testing utilities for simplification #43338

Merged
merged 28 commits into from
Aug 14, 2024

Conversation

strosek
Copy link
Contributor
@strosek strosek commented Aug 6, 2024

What

This PR tackles a few challenges we have when writing integration tests:

  1. Too much duplicated code
  2. Difficult to match request urls
  3. Brittle assertions
  4. Test data is too disorganized

Here is provided a new, more generalized way of creating integration tests that reduces size of tests by up to 75%, enhances thoroughness, makes mocking simpler and powerful, and makes matching requests much easier. The suite of woocommerce tests was migrated to show this in action.

How

  1. Configure tests using functions and data, not classes
  2. Use a single test to test all streams with similar structure, goal is one test per family of test cross-streams
  3. Make test data easy to spot, don't hide behind a lot of functions/classes
  4. Use standard mocker, which is more flexible
  5. Don't repeat common code, place it in a common location
  6. Provide more thorough and flexible, common assertions

Review guide

  1. test_read.py - implements 22 test cases covering basic reads and a multiple reads test.
  2. conftest.py - implements specific test configurations.
  3. common.py - implements common helpers for a source.
  4. utils.py - implements reusable test utilities.
  5. assertion.py - implements reusable, thorough assertion functions.
  6. data.py - implements utility function to load test JSON data files.

User Impact

None, just easier testing improves probability of quality.

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link
vercel bot commented Aug 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 14, 2024 0:51am

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/source/woocommerce labels Aug 6, 2024
from airbyte_cdk.test.entrypoint_wrapper import EntrypointOutput, read


def catalog(stream_name: str, sync_mode: SyncMode) -> ConfiguredAirbyteCatalog:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes a ton of sense to me to bring these two methods in the CDK since they're reimplemented by all connectors!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally skeptic about this one because of the "brittleness" I mentioned here. I feel like the builder is as readable as this method and I don't know what is the benefit of this method

Copy link
Contributor
@girarda girarda Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with a builder. what I like about the current approach is we get to remove the duplicated _read static methods which is maybe not related to this specific function, so my bad

Copy link
Contributor Author
@strosek strosek Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are what many tests use, I just moved them. These are provided in a common location for people to use them when they don't need the full builder or an __init__() with default arguments. It's difficult to generalize everything, but these little functions come handy in many cases, if they are not handy, people can fall back to the builder.

Copy link
Contributor
@pnilan pnilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean. Nice work.

Do you think you'll add State assertions, too?

Erick Corona added 4 commits August 6, 2024 18:47
# Conflicts:
#	airbyte-integrations/connectors/source-woocommerce/poetry.lock
#	airbyte-integrations/connectors/source-woocommerce/pyproject.toml
#	airbyte-integrations/connectors/source-woocommerce/unit_tests/integration/test_coupons.py
@strosek strosek self-assigned this Aug 8, 2024
@strosek strosek changed the title Draft basic read integration test feat(python-sources): make woocommerce integration tests simple, add test utilities Aug 8, 2024
@strosek strosek changed the title feat(python-sources): make woocommerce integration tests simple, add test utilities feat(python-sources): make integration tests simple, add test utilities, migrate woocommerce tests Aug 9, 2024
Copy link
Contributor
@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments. What I like most about this PR is pushing more validation logic in the CDK

airbyte-cdk/python/airbyte_cdk/test/utils/assertions.py Outdated Show resolved Hide resolved
airbyte-cdk/python/airbyte_cdk/test/utils/assertions.py Outdated Show resolved Hide resolved
airbyte-cdk/python/airbyte_cdk/test/utils/assertions.py Outdated Show resolved Hide resolved
airbyte-cdk/python/airbyte_cdk/test/utils/data.py Outdated Show resolved Hide resolved
return CatalogBuilder().with_stream(stream_name, sync_mode).build()


def read_records(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this interface brittle. If at some point we need more than just the stream_name, sync_mode to describe the catalog, we will have to update this interface. Why not pass the catalog directly? We have a builder that should ease the process of instantiating it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other comment above in reading.py

"stream_name, num_records, http_calls",
[("orders", 2, orders_empty_last_page())]
)
def test_read_with_multiple_pages_with_empty_last_page_successfully(stream_name, num_records, http_calls) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ensures that we trigger pagination here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The freeze date that goes beyond the first page of one month, and creates second http call. Do you think that is not reliable?

Copy link
Contributor
@maxi297 maxi297 Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the benefit from the HttpMocker was that it validated that all the HTTP requests that were mocked were called (at least when it is used as an annotation but we have also started to use it through the unittest.TestCase so we should probably add the same logic to the exit

Copy link
Contributor Author
@strosek strosek Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. However, the intention of this modeling of tests is not to provide all the features but provide a simpler/flexible alternative. For now I added a call-count assertion. Is it OK if we add such features in subsequent improvements?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the HttpMocker if it's already implemented there? Which makes me think that I don't see from airbyte_cdk.test.utils.http_mocking import register_mock_responses available on this branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry about the missing file. I'm not using that mocker because it requires the specific Request type objects and I wanted to make it simpler. The vanilla http_mocking.Mocker can use simple requests and regexes that make it easier to match things.

@strosek strosek requested a review from maxi297 August 9, 2024 17:49
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Aug 12, 2024
@strosek strosek changed the title feat(python-sources): make integration tests simple, add test utilities, migrate woocommerce tests feat(python-sources): add unit integration testing utilities for simplification Aug 12, 2024
Copy link
Contributor
@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... the consequence of using private interfaces!

@@ -11,7 +11,7 @@
from airbyte_cdk.test.catalog_builder import CatalogBuilder
from airbyte_cdk.test.entrypoint_wrapper import EntrypointOutput, read
from airbyte_cdk.test.mock_http import HttpMocker
from airbyte_cdk.test.mock_http.response_builder import _get_unit_test_folder
from airbyte_cdk.test.mock_http.response_builder import get_unit_test_folder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this is unfortunate! We will need to update the CDK version before doing that. It uses CDK version 0.90.0 which means a lot of breaking changes.

The options I see are:

  • Pay the debt of importing private methods now by updating the CDK (this seems to increase the scope way too much)
  • Keep _get_unit_test_folder and have it just call get_unit_test_folder internally with a big red comment in the CDK and source-amazon-seller-partner that says why we have this and when we can clean this. In that case, if you want to move get_unit_test_folder to data, that works too

I prefer the second option personally but will let you choose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, going with the second option.

@strosek strosek requested a review from maxi297 August 12, 2024 22:05
Copy link
Contributor
@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for tackling the comments diligently

@octavia-approvington
Copy link
Contributor

Lets merge it
sheep thumbs up

@octavia-approvington octavia-approvington merged commit 198f83e into master Aug 14, 2024
33 of 40 checks passed
@octavia-approvington octavia-approvington deleted the strosek/test_utils branch August 14, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants