-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime warning by rank_genes_groups #653
Comments
Hey! Just something that crossed my mind... could it be that after subsetting, you have 0 variance in some genes? You may have to rerun |
@LuckyMD Hi, Thanks a lot for your reply. Well the thing is I have already filtered my cells before and after the preprocessing I did imputation on my data using MAGIC and the first clustering was with no warning. I wonder how would it be possible my values have changed because I just subset some clusters and if there was something wrong within some gene values it should have popped up before too because basically I am taking the same genes into consideration for my subset analysis...that is very confusing to me...as long as it doesn't really effect my results I wouldn't mind it but I want to be sure about it and also discover the reason behind it but so far I am clueless |
So my idea was the following: |
@LuckyMD Dear Malter, Thanks, now I got your point. I just filtered my cells and genes again using |
In that case my idea was wrong and that was not the warning. The invalid value encountered does sound like a |
@LuckyMD The thing is after imputation for sure I do get some negative values and I have observed it but such warning was not popping up before and NaN I am doubtful because otherwise I could see a warning message when I ran the imputation for all of my genes. p.s. This is how I made my subset
|
I think you are using a view of the anndata object, rather than the object with that method of subsetting. That shouldn't be related to the issue, but if you want to work with the subset, I would use So I think the issue is the It is likely that this only pops up now, as the average expression value for the e.g., "not cluster 3" data is now negative, where before it wasn't as there was different data to average over. If this is the issue, I'm not entirely sure what to do about this... fold changes aren't defined for such a case. I would either:
You could take the code in the |
@LuckyMD Dear Malte, Thanks a lot for your hint and reply. |
So the reason you didn't get this before, would be that if you do a
Turning negative values to 0, doesn't mean you lose the data. You have some expression space, of which 0 is a valid number. The question is really what does a negative expression value mean after MAGIC? Is it just a confidence of the gene not being expressed? Then putting it to zero makes sense. Again... if you ignore this, you will just ignore particular genes which are likely differentially expression, because MAGIC has rescaled the expression values in the "rest of the dataset" to a negative value. |
This is of course another theory, but it makes sense to me at least. Going through the function step by step will allow you to find out if this is actually the case. |
@LuckyMD by rescaling to 10 I meant this
And regarding to the negative values in MAGIC, this is what one the creators has mentioned about it
TBH I am not sure if being an artifact would mean that it's ok to put them into zero. Would it be still your suggestion ? |
I guess if they regard the relative values as accurate, then scaling is the way forward. Does |
@LuckyMD yes, one get still negative values after scaling |
Maybe rescale yourself then... just add the min expression value per gene to each gene... then you'll get at least a rebasing of expression values. Relative expression within genes will still be okay. |
I will try it out and see what will come up ! Thanks a lot for all of your suggestions ! |
good luck! |
@LuckyMD Ok so I basically found the reason of it. it's simply the scaling! What I used to do is that after imputation and selecting HVG I was scaling my data and then getting the clusters and subsetting and running hvg agaian. But apparently the scaled data cause the warning but I am not sure why ! because it doesn't turn any of my expression to NaN when I checked my adata.X and negative values shouldn't be the source of the warning too because I already had negative values after imputation and didn't cause any warning. One thing I also should mention is that this warning happens too if I compute the HVG so its not really cluster specific or so |
I believe this is the reason why it happens. If one of those two averages are negative, then your fold change is negative, and you get an error when feeding that into |
@LuckyMD yea after some further investigation I do agree with you! do you already know a method which can I use to scale my data to non-negative values? then I can scale my data only once and right after imputation and that should be fine for the rest of downstream analysis including the sub-clustering |
I guess negative values can mean different things across imputation methods. So having one standard way to scale is maybe not the best approach. That being said, I would probably simply do this: Here, you would also put expression values to 0 if all expression values are +ve and non-zero. Otherwise, you should only do this for genes where the min() is -ve. The above scaling solution would keep the relative scale between the genes. If you however prefer to scale to values between 0 and 1 (which I usually don't do, but others advocate; this would ensure equal weighting between genes for PCA), you can also rescale by expression range like this: Overall though, I'm not a big fan of imputation... especially after this paper |
Interesting paper... the question is which is worse false signals from experiment or from the imputation. |
After imputation you don't have your initial variability in your data. Ideally you have only the biologically relevant variability, but that's another question. Also, imputation methods take different data as input. Our DCA method takes count data, but MAGIC takes pre-processed data I think. |
Dear all,
I am receiving the following runtime warning when I search for markers within my clusters using
sc.tl.rank_genes_groups
The warning only happened after I subset my initial clustering and keep few clusters and then again run PCA and HVG analysis on them and do the clustering. It still though run the command and I get the results. Does anyone know why is it happening right after I want to analyze my subset? and is it something that I should worry about ?
Thanks
The text was updated successfully, but these errors were encountered: