我有一个按时间的地区计数的数据框架。数据框的一行包含每列的计数总数。我想通过将每个列单元格除以相应列的计数总数来将数据帧从计数转换为比例。某些列包含缺少的观察值。我已经在下面使用嵌套的for-loops完成了这项工作,但我怀疑可能有一种更简单的方法,也许使用lapply。我在提取计数总数的行时也遇到了问题。
我发布这篇文章的部分原因是,现在是我学习使用apply系列函数的时候了,我怀疑它们在这里可能会有用,还有一部分原因是,我在创建count totals向量时遇到了太多麻烦,我怀疑使用[[可能会有所帮助。感谢您对更高效地编写上述代码的任何建议。
my.data = read.table(text = "
state y1970 y1980 y1990 y2000
Alaska 4 6 NA 7
Iowa 10 20 30 40
Nevada 100 100 100 100
Ohio 50 60 NA 80
total 172 195 215 238
Wyoming 8 9 10 11
", sep = "", header = TRUE)
desired.result = read.table(text = "
state y1970 y1980 y1990 y2000
Alaska 0.02325581 0.03076923 NA 0.02941176
Iowa 0.05813953 0.10256410 0.13953488 0.16806723
Nevada 0.58139535 0.51282051 0.46511628 0.42016807
Ohio 0.29069767 0.30769231 NA 0.33613445
total 1.00000000 1.00000000 1.00000000 1.00000000
Wyoming 0.04651163 0.04615385 0.04651163 0.04621849
", sep = "", header = TRUE)
state <- as.vector(unlist(my.data[, 1]))
my.totals <- as.vector(unlist(my.data[ my.data$state=='total', 2:5]))
proportions <- matrix(NA, nrow=nrow(my.data), ncol=ncol(my.data))
proportions <- as.data.frame(proportions)
for(i in 1:nrow(my.data)) {
for(j in 1:ncol(my.data)) {
if(j==1) proportions[i,1] <- state[i]
if(j> 1) proportions[i,j] <- my.data[i,j] / my.totals[j-1]
}
}
colnames(proportions) <- names(my.data)
proportions
# state y1970 y1980 y1990 y2000
# 1 Alaska 0.02325581 0.03076923 NA 0.02941176
# 2 Iowa 0.05813953 0.10256410 0.13953488 0.16806723
# 3 Nevada 0.58139535 0.51282051 0.46511628 0.42016807
# 4 Ohio 0.29069767 0.30769231 NA 0.33613445
# 5 total 1.00000000 1.00000000 1.00000000 1.00000000
# 6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849发布于 2012-11-22 07:16:10
可能会有这样的东西:
df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )如果它是一个矩阵,你可以直接使用prop.table(mat)。但是,在这种情况下,您需要限制为仅处理数字列(通过排除第一个列)。
此外,我认为您需要排除"total“行:
my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
my.data[ -5 , ]
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3 Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849另一种方法:
> my.data[,-1] <-lapply( my.data[ , -1], function(x){ x/x[5] } )
> my.data
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3 Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
5 total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849这显示了当对一个非常简单的矩阵分别对行和列使用时,prop.table将返回哪些缺少值的值:
> prop.table( matrix( c( 1,2,NA, 3),2) )
[,1] [,2]
[1,] NA NA
[2,] NA NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
[,1] [,2]
[1,] NA NA
[2,] 0.4 0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
[,1] [,2]
[1,] 0.3333333 NA
[2,] 0.6666667 NA发布于 2021-10-13 20:39:12
或者,您可以:
library(tidyverse)
my.data = read.table(text = "
state y1970 y1980 y1990 y2000
Alaska 4 6 NA 7
Iowa 10 20 30 40
Nevada 100 100 100 100
Ohio 50 60 NA 80
total 172 195 215 238
Wyoming 8 9 10 11
", sep = "", header = TRUE)
my.data %>%
# Convert table into long format
pivot_longer(cols = -state, names_to = "year") %>%
# (Optional) Convert year to numeric:
mutate(year = as.numeric(gsub("^y", "", year))) %>%
# Convert data frame to a table
xtabs(formula = value ~ state + year) %>%
# Calculate proportions:
prop.table
#> year
#> state 1970 1980 1990 2000
#> Alaska 0.002555911 0.003833866 0.000000000 0.004472843
#> Iowa 0.006389776 0.012779553 0.019169329 0.025559105
#> Nevada 0.063897764 0.063897764 0.063897764 0.063897764
#> Ohio 0.031948882 0.038338658 0.000000000 0.051118211
#> total 0.109904153 0.124600639 0.137380192 0.152076677
#> Wyoming 0.005111821 0.005750799 0.006389776 0.007028754https://stackoverflow.com/questions/13503580
复制相似问题