-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support retentionCount for dataset version database #2423
Conversation
c18a3e4
to
5e5d8f5
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2423 +/- ##
=============================================
+ Coverage 14.44% 73.43% +58.98%
- Complexity 2357 10862 +8505
=============================================
Files 773 773
Lines 60726 60849 +123
Branches 7440 7453 +13
=============================================
+ Hits 8771 44683 +35912
+ Misses 51442 13718 -37724
- Partials 513 2448 +1935
... and 604 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
5e5d8f5
to
336f1a6
Compare
ambry-account/src/main/java/com/github/ambry/account/mysql/AccountDao.java
Outdated
Show resolved
Hide resolved
ambry-frontend/src/main/java/com/github/ambry/frontend/NamedBlobPutHandler.java
Show resolved
Hide resolved
for (DatasetVersionRecord record : datasetVersionRecordList) { | ||
String version = record.getVersion(); | ||
RequestPath requestPath = getRequestPath(restRequest); | ||
RequestPath newRequestPath = new RequestPath(requestPath.getPrefix(), requestPath.getClusterName(), | ||
requestPath.getPathAfterPrefixes(), | ||
NAMED_BLOB_PREFIX + SLASH + accountName + SLASH + containerName + SLASH + datasetName + SLASH + version, | ||
requestPath.getSubResource(), requestPath.getBlobSegmentIdx()); | ||
// Replace RequestPath in the RestRequest and call DeleteBlobHandler.handle. | ||
restRequest.setArg(InternalKeys.REQUEST_PATH, newRequestPath); | ||
restRequest.setArg(InternalKeys.TARGET_ACCOUNT_KEY, null); | ||
restRequest.setArg(InternalKeys.TARGET_CONTAINER_KEY, null); | ||
try { | ||
deleteBlobHandler.handle(restRequest, restResponseChannel, null); | ||
} catch (Exception e) { | ||
//do not throw the exception or handle it since we don't care if it has been deleted successfully. | ||
LOGGER.error("Failed to delete the datasetVersionRecord :" + record); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is probably not going to work.
You see, the delete handler is an async method, so it will not finished the execution before handle method returns. And if we have multiple versions to delete, then we would override the RequestPath in this for loop since the they share the same restRequest.
For example:
Say we have version1 and version2 to delete.
for version1, we replace RequestPath in restRequest to version1 and call deleteBlobHandler.handle(restRequest).
Before it finishes, we move on to version2, and now we replace the RequestPath in restRequest and call deleteBlobHandler.handle(restRequest) again.
But since deleting version 1 isn't done yet, any logic in deleting version1 would now see version2.
We might have to create a new restRequest for each version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I feel it's hard to clone a restRequest, so I decided to use the recursive callback to delete dataset version synchronously, and ignore the delete failures. Normally there would be only one version need to be deleted.
String containerName = dataset.getContainerName(); | ||
String datasetName = dataset.getDatasetName(); | ||
List<DatasetVersionRecord> datasetVersionRecordList = | ||
accountService.getAllValidVersionsOutOfRetentionCount(accountName, containerName, datasetName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think if we have thread race condition when we try to delete the old versions.
My understand is this function is running is the PutBlob handler. Let's say the retention count is 3.
We already have version 1 and version2.
- thread1 adds the version 3.
- thread2 adds the version 4.
- thread1 check the version count. It's 4. It tries to delete version 1.
- thread2 check the version count. It's 4. It tries to delete version 1.
One of the thread will be successful and one of the thread will get NOT_FOUND. Looks ok. We still have 3 revisions retended.
I don't see a case that we accidentally delete too many versions.
Just bring this topic up to see if I miss any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good concern. I think normally we should be good. Since we always delete once we successfully put a new version into the database. And we always leverage the same primary DB to do the get and delete. There's one case that when user failed to push the version and the version has not been deleted from DB after user successfully issue another new put, it will consider the failed put as a version and still delete the old one. But we never promise user that the version will be a valid version. Considering we already tell user that we are not guarantee to rollback the failed put version and it might need manual delete, so I think it should be good(this is how wormhole supported as well). But I think we could have some update in the future to add a new filed to the database when the put is done successfully, this might be better. Let me know if you have any concern about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me.
*/ | ||
private Callback<Void> recursiveCallback(List<DatasetVersionRecord> datasetVersionRecordList, int idx, String accountName, | ||
String containerName, String datasetName) { | ||
if (idx == datasetVersionRecordList.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some print message(or metric) to record the status of the background deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add it in a follow up pr.
* Support retentionCount for dataset version database * Address comments --------- Co-authored-by: Sophie Guo <sopguo@sopguo-mn1.linkedin.biz>
This pr supported the retentionCount logic for our dataset version db.
When user issue a put request to put a new version of dataset, we will check if the number of existing dataset version of this dataset is out of the retention count number which has been set at dataset level.
And we will issue a delete to try to delete the dataset version out of the retention count.
And since it happened to every request, we will handle it async and only log the error messaged if delete failed. And retry with next put. So it won't affect the availability of put.