我有一个数据矩阵(900列和5000行),我想在上面做一个pca。
这个矩阵在excel中看起来非常好(意味着所有的值都是定量的),但是在我读取R中的文件并尝试运行pca代码后,我得到了一个错误,指出“以下变量不是定量的”,并且我得到了一个非定量变量的列表。
因此,一般来说,一些变量是定量的,而另一些则不是。请参见下面的示例。当我检查变量1时,它是正确的和定量的。(随机地,一些变量在文件中是定量的)当我检查变量2时,它是不正确的,并且是非定量的。(随机地,像这样的一些变量在文件中是非定量的)
> data$variable1[1:5]
[1] -0.7617504 -0.9740939 -0.5089303 -0.1032487 -0.1245882
> data$variable2[1:5]
[1] -0.183546332959017 -0.179283451229594 -0.191165669598284 -0.187060515423038
[5] -0.184409474669824
731 Levels: -0.001841783473108 -0.001855956210119 ... -1,97E+05所以我的问题是,我如何把所有的非定量变量都变成定量变量?
使文件简短无济于事,因为值本身就是量化的。我不知道发生了什么。下面是我的原始文件<- https://docs.google.com/file/d/0BzP-YLnUNCdwakc4dnhYdEpudjQ/edit的链接
我也尝试了下面给出的答案,但仍然没有帮助。
所以让我来展示一下我到底做了什么,
> data <- read.delim("file.txt", header=T)
> res.pca = PCA(data, quali.sup=1, graph=T)
Error in PCA(data, quali.sup = 1, graph = T) :
The following variables are not quantitative: batch
The following variables are not quantitative: target79
The following variables are not quantitative: target148
The following variables are not quantitative: target151
The following variables are not quantitative: target217
The following variables are not quantitative: target266
The following variables are not quantitative: target515
The following variables are not quantitative: target530
The following variables are not quantitative: target587
The following variables are not quantitative: target620
The following variables are not quantitative: target730
The following variables are not quantitative: target739
The following variables are not quantitative: target801
The following variables are not quantitative: target803
The following variables are not quantitative: target809
The following variables are not quantitative: target819
The following variables are not quantitative: target868
The following variables a
In addition: There were 50 or more warnings (use warnings() to see the first 50)发布于 2013-02-28 19:07:21
R将您的变量视为因素,正如Arun所提到的。因此,它会生成一个data.frame (实际上是一个列表)。有许多方法可以解决这个问题,其中一个方法是将其转换为数据矩阵,方法如下;
matrix <- as.numeric(as.matrix(data))
dim(matrix) <- dim(data)现在,您可以在矩阵上运行PCA。
编辑:
扩展一下这个例子,查利的建议的第二部分将不起作用。复制以下会话,看看它是如何工作的;
d <- data.frame(
a = factor(runif(2000)),
b = factor(runif(2000)),
c = factor(runif(2000)))
as.numeric(d) #does not work on a list (data frame is a list)
as.numeric(d$a) # does work, because d$a is a vecor, but this is not what you are
# after. R converts the factor levels to numeric instead of the actual value.
(m <- as.numeric(as.matrix(d))) # this does the rigth thing
dim(m) # but m loses the dimensions and is now a vector
dim(m) <- dim(d) # assign the dimensions of d to m
svd(m) # you can do the PCA function of your liking on m发布于 2013-02-28 19:18:56
默认情况下,R将字符串强制为factor。这可能会导致意外的行为。使用以下命令关闭此默认选项:
read.csv(x, stringsAsFactors=F)或者,也可以使用以下命令将因子强制为数字
newVar<-as.numeric(oldVar)发布于 2021-04-05 02:06:12
as.numeric(as.character(data$variable2[1:5])),先用as.character获取因子变量标签的字符串表示,再用as.numeric进行转换
https://stackoverflow.com/questions/15132358
复制相似问题