-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Episode on data access and parallelization #86
Episode on data access and parallelization #86
Conversation
This is looking awesome! Accessing Data EpisodeI like the Objectives of this lesson. I think we can potentially split out the import rioxarray
# ... or we can open them directly (and stream content only when necessary)
blue_band_href = assets["B02"].href
blue_band = rioxarray.open_rasterio(blue_band_href)
blue_band and a separate Parallalizing Raster computation with Dask lesson. I think the final cell for the Access Data Episode could be saving out the raster with rioxarry. this would involve reassigning the CRS to the mosaicked xarray DataArray we produced with stackstac and then using the Parallalizing Raster computation with DaskI love that you already cover guidelines on how to set the chunk size! An additional topic to cover here could be how to tell if your code is running faster with dask or without dask. For this we could cover using For the Raster calculations portion, instead of I think it would be valuable to show that solution and for a median composite. Setup instructions will also need to be updated with new dependencies. I've seen the most success with not pinning specific versions to allow a more flexible solve for different machines: https://carpentries-incubator.github.io/geospatial-python/setup.html A third episode focused on working with a cool looking mosaic could focus on xarray-spatial's raster calc funcs. One idea: computing spectral indices, thresholding them, and polygonizing the result (maybe areas with especially high NDVI): https://github.com/makepath/xarray-spatial |
I also like the inclusion of the Dask task graph image. including other images of intermediate results, such as plots of the blue band, could be good to include prior to the final challenge. Also when this gets formatted to the lesson markdown, I think we can create a set of tooltips that refer to other sources for folks to read up on COG, STAC, and Dask, while also briefly summarizing their utility for geospatial. |
Hi @rbavery , I have created a first version of a full data access episode. Basically, I have converted the Jupyter notebook that you already had a look at into a .md file and I have added some explanatory text in between the code blocks. Whenever you have time to review it, I would be happy to have any kind of feedback - thanks in advance! I have also added a first exercise following up on your idea to have participants exploring a STAC catalog even before having the search tool introduced - what do you think about having it formulated in this way? Still working on the second episode (on parallel raster computation with Dask). |
@fnattino thanks I'll give this a review this evening |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for all your work on this @fnattino. I think it's close to being able to merge. I'm meeting with NASA DEVELOP folks next week and I think this is already in great shape to teach if there's time.
_episodes/XX-access-data.md
Outdated
> ## Exercise: Discover a STAC catalog | ||
> Open the following STAC API link using your web browser: https://earth-search.aws.element84.com/v0. | ||
> Navigate through the links to find out which collections are available and how many scenes are indexed. Where may one | ||
> find information on how to query the API for the desired scenes? Can you find out which parameters can be provided | ||
> in the queries? | ||
{: .challenge} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to show learners a graphical tool to browse STAC Catalogs. This one shows the spatial extent and summarizes the information about any STAC catalog url you paste into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, and thanks for the tips - I hadn't seen yet the new STAC browser! Unfortunately the filtering tools do not seem to work with the Earth Search STAC API (maybe because this is an older STAC API version, 0.9), but I have read this is still a "demo" version, so things might be fixed soon. Anyway, for the purpose of the exercise, i.e. browsing through the items, works very well!
_episodes/XX-access-data.md
Outdated
# save processed image to disk | ||
visual_clip.rio.to_raster("amsterdam_tci.tif", driver="COG") | ||
~~~ | ||
{: .language-python} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome. I think we should end with a challenge so they can reproduce these steps and build some muscle memory for how interacting with a STAC API via pystac and then working with the result in rioxarray feels.
I think a good option would be to direct them to this STAC catalog and have them download data that intersects a specific lat, lon and date (specified in the challenge text): https://radiantearth.github.io/stac-browser/#/external/earth-search.aws.element84.com/v0/collections/landsat-8-l1-c1
the solution to the challenge could be to save a single band at that location and date to disk with rioxarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I have added a challenge using the Landsat 8 dataset. This collection unfortunately seems not to be continuously updated here (and at a certain point might be dropped?) so we might have to find new sources in future!
_episodes/XX-access-data.md
Outdated
~~~ | ||
{: .language-python} | ||
|
||
> ## Exercise: Discover a STAC catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before this exercise I think we can show an image of the radiant earth stac browser to give people a visual of what information a STAC catalog contains. Looking at the lesson webpage, it's a dense in the amount of text before the first image so I think this will make the first part of the lesson more engaging for someone who is browsing the lesson material or seeking out guidance on STAC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a figure - the best "composition" I could come up with.. If you have suggestions for improvements, let me know!
@fnattino thanks for addressing these reviews! once this data access episode is finished, can we merge that PR and finish the parallelization episode in a separate PR? Feel free to merge this as is now, I or somebody could add a challenge later unless you are already working on it. |
Hi @rbavery - thanks a lot for having already a look. I am finishing up the last challenge, I'll ping you as soon as I have pushed it! |
Hi @rbavery, this is it - I have added the final challenge. I have also updated the setup instructions and the Merging this first and opening a second PR for the parallelisation episode sounds good - I have removed the corresponding notebook from this branch. One last thing: should this become episode 19? I could set the number and merge if this is alright with you. Really thanks a lot for all the feedback and suggestions! |
Fantastic!!! Yes let's make this episode 19 for now. Really looking forward to teaching this! Lgtm feel free to merge. |
…bench-intro Update episodes 1-4
This is work-in-progress to address #82 .
@rbavery: I have added a notebook with a first sketch of how the episode on data access/parallelization could look like, any feedback is more than welcome!