假设有一个数据框如下所示:
> A B C D E
> TGFBI 0.027442647 9.756301e-03 0.0056374607 0.0248263371 0.0056703467
> OLFM4 0.022665292 -1.906351e-03 -0.0135277027 0.0551336843 0.0001602728
> CD177 0.029256398 2.259310e-03 -0.0218761784 0.0008816893 -0.0138302621
> LCN2 0.024944813 1.838820e-02 -0.0058928266 0.0440654781 -0.0108800098
> CEACAM8 0.029996651 3.432132e-02 -0.0251011180 0.0370074902 -0.0138822167
> HLA-DPB1 0.028016101 3.483277e-02 -0.0081639565 0.0223873901 0.0103236673
> DEFA3 -0.031190483 4.124520e-02 -0.0410158867 0.0607274629 0.0158699504我想逐行计算IQR和之前计算的IQR (>=,<=)之外的abs(值)的数量,并将其输出到最终的表中。
换句话说,我想计算每一行中有多少个极值。
data.frame包含174列和8000行。
发布于 2019-09-03 22:15:46
正如乔戈在他的评论中指出的那样,要求第一和第三个四分位数以外的分数没有太大意义。如果您将'outlier‘定义为四分位数外的点,则可以对以下代码进行一些修改:
# sample data
df <- read.table(text = " A B C D E
TGFBI 0.027442647 9.756301e-03 0.0056374607 0.0248263371 0.0056703467
OLFM4 0.022665292 -1.906351e-03 -0.0135277027 0.0551336843 0.0001602728
CD177 0.029256398 2.259310e-03 -0.0218761784 0.0008816893 -0.0138302621
LCN2 0.024944813 1.838820e-02 -0.0058928266 0.0440654781 -0.0108800098
CEACAM8 0.029996651 3.432132e-02 -0.0251011180 0.0370074902 -0.0138822167
HLA-DPB1 0.028016101 3.483277e-02 -0.0081639565 0.0223873901 0.0103236673
DEFA3 -0.031190483 4.124520e-02 -0.0410158867 0.0607274629 0.0158699504",
header = TRUE, stringsAsFactors = FALSE)
# apply a custom function to rows of the data frame
apply(df, 1, function(x){
qrt <- quantile(x, c(0.25, 0.75))
iqr <- qrt[2] - qrt[1]
out1 <- sum(x < qrt[1] - 1.5*iqr) # or use other value than 1.5*iqr
out2 <- sum(x > qrt[2] + 1.5*iqr)
return(out1 + out2) # returns just the number of outliers
})
#TGFBI OLFM4 CD177 LCN2 CEACAM8 HLA-DPB1 DEFA3
#0 0 1 0 0 0 0 https://stackoverflow.com/questions/57773284
复制相似问题