Artificial intelligent assistant

How to assess if biological measurements follow a normal or a log-normal distribution I am using a dataset composed by $m$ samples and $n$ features (genes). Each data point is real number. I want to understand how to preprocess data before analysis, in particular: do data points follow a normal or a log-normal distribution? I thought about using qqplots and searching for different tests to assess the form of the distribution, but I have a doubt: Do I have to assess the form of: * each sample distribution * each feature (gene) distribution * the whole dataset ($m$ samples x $n$ features (genes)) ?

From personal experience, nearly all count data whether from microarray or reads from RNAseq of some kind, requires a log transformation of the counts. Usually a small fraction is added to all values before doing so to zero protect. Log2(counts + 0.5) or some such. This is independent of the treatments. If you log transform one sample, you will do the same for all samples. To examine for normality, a simple way is to look at the histogram of counts (by all samples or by each sample) before and after transformation. Roughly bell shaped -> proceed.

Pictures below from my data. Although the data are from RNAseq, microarray data should be similar.

R code here:


hist(t$counts,breaks=100,main="Histogram of Raw Counts from RNAseq")
hist(log(t$counts + 0.5,2),breaks=100,main="Histogram of Log2
transformed Counts from RNAseq")


![enter image description here](

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy ae4f41535eae24f710bce5692f125155