[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Component selection for dimensionality reduction #23

Closed
IAGO1215 opened this issue Oct 3, 2023 · 5 comments
Closed

Problem with Component selection for dimensionality reduction #23

IAGO1215 opened this issue Oct 3, 2023 · 5 comments

Comments

@IAGO1215
Copy link
IAGO1215 commented Oct 3, 2023

Not related to the code itself, but how to select the principle components in a correct manner?

I was following the tutorial and after obtaining the images of 8 bands generated by PCA, I find it confusing to figure out which bands should be discarded and which bands should be chosen and written in "Selected_Components.txt".

I used QGIS to render the image of each band so the images in my end are not using the same style as the results showed in your tutorial website. But I am not sure if the style matters here in order to choose the desired principle components.

Edit: Thanks in advance. But if this thread were not appropriate to be post here since it is not relevant to the code, feel free to delete it.

@jbferet
Copy link
Owner
jbferet commented Oct 4, 2023

Thank you for reporting this difficulty.
Yes indeed, the feature selection is somehow subjective, and requires a certain expertise on the type of system used to map diversity metrics. This is one weaknesses of the method.
When using high spatial resolution images over forested areas, this selection is somehow intuitive, as selected features should i) display contrasted values among tree crowns, and ii) display minimal noise/irrelevant signal related to low SNR, sensor artifact, BRDF and so on.
However, some RS data types or ecosystems are not so easy to handle, and the guidelines which are valid for forested ecosystems should be adjusted. The relevance of the method should also be questioned (as pixels are assumed to capture information related to a level of organization of interest, it seems inappropriate to try to estimate alpha diversity of grasslands from metric spatial scale data, as the pixel info will not match with the hypothesis of species level information... maybe a higher level of organization...).

An alternative possibility to selection based on a visual criterion is to perform supervised feature selection (using forward feature selection for example), if the amount of ground observations available allows such procedure.

If using QGIS for visual feature selection, you are free to use any visualization scale fitting with your visual preferences. There is a minimum expert knowledge to mobilize when performing this selection, to identify if patterns evidenced in the features to be used are relevant for the dimension of biodiversity to be explored.

I hope this helps.

@IAGO1215
Copy link
Author
IAGO1215 commented Oct 4, 2023

Thanks for your solutions and I will try to understand and practice it with the data of our project (for the forest as well), which probably requires a number of try and error.

Last note, I have seen that you are already working on automatic selection of these components and I do hope that function will come true soon.

@jbferet
Copy link
Owner
jbferet commented Oct 4, 2023

One possibility for you, which is implemented and documented, but still not validated due to lack of ground data and time is described in this tutorial.

The principle is as described in my comment for issue#24: instead of relying on data transformation (such as PCA, SPCA or MNF) to produce relevant feature, you assume that a selection of wisely selected spectral indices is meaningful information to produce RS diversity metrics which can relate to the diversity metrics you want to assess.
Then you use a stack of spectral indices instead of the PCA file, and 'trick' the function map_spectral_species, by providing the stack of spectral indices as PCA_file defined in the SpectralSpace_Output variable.

This would require validation, and if possible prior selection of spectral indices in order to identify the most relevant ones for the assessment of the diversity metrics you are interested in.

For future developments, this strategy may be prefered to automated feature selection, as tthe performance of such automated feature selection may strongly depend on the type of ecosystem and imagery used with biodivMapR.

Cheers,

jb

@IAGO1215
Copy link
Author
IAGO1215 commented Oct 5, 2023

Thanks for these insights and I will give this method a try.

@IAGO1215
Copy link
Author
IAGO1215 commented Oct 5, 2023

Therefore if the only spectral indice useful to my case is the vegetation indice, then I can just bypass the PCA step and create a raster combined of blue, green, red, and NIR wavelengths. Then use this raster as the result of PCA and continue the following steps such as spectral species, alpha and beta indices, etc.

I am not sure if I understand it correctly this alternative to bypass the entire PCA step. And thanks again.

BTW, since we do have sufficient ground truth field plot data, so we can validate the result of this method if I understand it correctly.

Edit: I didn't see your link to the tutorial and now I have seen it and have got the methodology to bypass PCA but use spectral indices instead. Thanks for the link and I will follow it and eventually test the result with ground truth.

@jbferet jbferet closed this as completed Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants