我一直在寻找/思考如何提取第n个值(例如,第2、第5、第7等)。从我的数据帧中的每一行。
例如,我有以下列:
ID Q1-2013 Q2-2013 Q3-2013 Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014在每一列下都有给定的值。我想要做的是从季度向量(第二-第八列)中提取每一行的第n个值。因此,例如,如果我从每一行中寻找第二个值,我想要的公式/函数将从第2-8列(第1-2013年到第4-2014年)中从每一行中提取/拉出第二个值。此外,公式/函数也将忽略每行中的空白/NA值。
发布于 2014-12-16 19:50:23
也许这就是你想要的。
我首先在每一列中使用一些NAs修改了虹膜数据集:
iris[] <- lapply(iris, function(x){ x[sample(150, 30, F)] <- NA; x})
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 NA setosa
#2 NA NA 1.4 NA setosa
#3 NA NA 1.3 0.2 setosa
#4 4.6 3.1 1.5 NA setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 NA 1.7 0.4 setosa然后,要提取每个行可以使用的第二个非空项和非NA项,可以使用apply (我知道,不建议在数据帧上使用,但它会执行脏的工作):
apply(iris, 1, function(x) x[which(!is.na(x) & x != "")[2]])
# [1] "3.5" "setosa" "0.2" "3.1" "3.6" "1.7" "3.4" "3.4" "2.9" "3.1" "setosa"
#[12] "3.4" "1.4" "1.1" "1.2" "4.4" "3.9" "3.5" "3.8" "3.8" "0.2" "3.7"
#[23] "3.6" "1.7" "1.9" "3.0" "3.4" "1.5" "3.4" "3.2" "3.1" "3.4" "4.1"
#[34] "4.2" "3.1" "3.2" "3.5" "3.6" "setosa" "1.5" "1.3" "2.3" "1.3" "0.6"
#[45] "0.4" "3.0" "3.8" "3.2" "3.7" "3.3" "3.2" "3.2" "1.5" "2.3" "2.8"
#[56] "2.8" "3.3" "2.4" "4.6" "1.4" "2.0" "3.0" "1.0" "2.9" "2.9" "3.1"
#[67] "3.0" "2.7" "4.5" "3.9" "3.2" "4.0" "2.5" "4.7" "4.3" "3.0" "2.8"
#[78] "5.0" "2.9" "3.5" "3.8" "2.4" "2.7" "2.7" "3.0" "3.4" "3.1" "1.3"
#[89] "4.1" "1.3" "2.6" "3.0" "2.6" "2.3" "4.2" "3.0" "2.9" "2.9" "2.5"
#[100] "2.8" "3.3" "2.7" "3.0" "2.9" "3.0" "3.0" "4.5" "2.9" "5.8" "3.6"
#[111] "3.2" "1.9" "5.5" "2.0" "5.1" "3.2" "5.5" "3.8" "virginica" "1.5" "3.2"
#[122] "2.8" "2.8" "2.7" "2.1" "6.0" "2.8" "3.0" "2.8" "5.8" "2.8" "3.8"
#[133] "5.6" "1.5" "2.6" "3.0" "5.6" "5.5" "4.8" "3.1" "5.6" "5.1" "2.7"
#[144] "3.2" "3.3" "3.0" "2.5" "5.2" "5.4" "3.0" 因为apply将首先将数据帧转换为matrix,所以所有列都被转换为与本例中的character相同的类型。稍后,您可以将其转换为您想要的任何内容(但请注意,在本例中,不能将输出向量直接转换为数字,因为它包含一些字符串,如"setosa“等)。
发布于 2014-12-17 17:42:44
您还可以使用来自convenient的naLast函数library(SOfun)。
library(SOfun)
dat[dat==''] <- NA #convert all `blank` cells to `NA`
n <- 2 # the row/column index that needs to be extracted
naLast(dat, by='col')[n,] #get the 2nd non-empty/nonNA element for each columns
#V1 V2 V3 V4 V5
#"G" "B" "B" "B" "C" apply的情况也是一样
apply(dat, 2, function(x) x[which(!is.na(x) & x!='')[2]])
#V1 V2 V3 V4 V5
#"G" "B" "B" "B" "C" 您还可以指定by='row'
naLast(dat, by='row')[,n] #get the 2nd non-empty/nonNA element for each row
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#"G" "D" "B" "G" "E" "B" "J" "F" "F" "A" "H" "C" "A" "D" "H" "D" "J" "C" "A" "A" 数据
set.seed(25)
dat <- as.data.frame(matrix(sample(c(NA,'',LETTERS[1:10]),
20*5, replace=TRUE), ncol=5), stringsAsFactors=FALSE)您可以通过以下方式安装该软件包
library(devtools)
install_github("mrdwab/SOfun")https://stackoverflow.com/questions/27511547
复制相似问题