首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用ddply计算rmse

使用ddply计算rmse
EN

Stack Overflow用户
提问于 2015-08-04 20:04:05
回答 1查看 115关注 0票数 1

我正在使用ddply来计算rmse,即每个id、条件组合的大型数据帧的其他汇总统计信息。数据帧的结构是

代码语言:javascript
复制
'data.frame':   107955 obs. of  11 variables:
 $ date         : Factor w/ 1077 levels "2012-08-17","2012-08-18",..: 487 488 489 490 491 492 493 494 495 496 ...
 $ value        : num  
 $ mean         : num  
 $ accuracy     : num  
 $ id           : int  
 $ criteria     : Factor w/ 5 levels 

我尝试了以下几种方法

代码语言:javascript
复制
ddply(foo, .(id, criteria), summarize, mean=mean(accuracy, na.rm=T), median=median(accuracy, na.rm=T), rmse=sqrt(sum((mean - value)^2 , na.rm = TRUE ) / nrow(foo)))

nrow(foo)给出了整个数据帧的行数,而不是切片的行数(id,criteria)。

我试着使用nrow(.(id, criteria)),这显然是错误的

示例数据:http://pastebin.com/8m0vD5Bq

代码语言:javascript
复制
ddply(foo, .(id, criteria), summarize, mean=mean(accuracy, na.rm=T), median=median(accuracy, na.rm=T), rmse=sqrt(sum((mean - value)^2 , na.rm = TRUE ) / n()))

   id criteria   mean median   rmse
1  49        g 123.00  123.0 101.00
2  49        h 115.25   72.0  80.31
3  49        I 196.00  110.0 173.75
4  50        f 191.75  204.5 168.59
5  50        g 649.00  275.0 634.92
6  51        d 180.00  180.0 160.00
7  51        e 378.67  137.5 359.19
8  51        f 247.00  247.0 227.08
9  52        a 109.00  107.0  74.18
10 52        b  76.33   45.0  46.31
11 52        d  98.67  100.0  64.56

计算id = 50和criteria = 'g‘的rmse

代码语言:javascript
复制
 sub_foo <- foo[foo$id == 50 & foo$criteria=='g',]

R> sub_foo
         date value mean accuracy id criteria
23 2014-01-08     2   37     1850 50        g
24 2014-01-09    12   33      275 50        g
25 2014-01-10    19   48      253 50        g
26 2014-01-11    35   35      100 50        g
27 2014-01-12     3   23      767 50        g

R> sqrt(sum((sub_foo$mean -sub_foo$value)^2 , na.rm = TRUE ) / nrow(sub_foo))
[1] 24.11

预期的rmse是24.11,而我使用ddply得到的是634.92,这是错误的。

编辑:添加数据帧的dput

代码语言:javascript
复制
R>dput(foo)
structure(list(date = structure(1:36, .Label = c("2013-12-17", 
"2013-12-18", "2013-12-19", "2013-12-20", "2013-12-21", "2013-12-22", 
"2013-12-23", "2013-12-24", "2013-12-25", "2013-12-26", "2013-12-27", 
"2013-12-28", "2013-12-29", "2013-12-30", "2013-12-31", "2014-01-01", 
"2014-01-02", "2014-01-03", "2014-01-04", "2014-01-05", "2014-01-06", 
"2014-01-07", "2014-01-08", "2014-01-09", "2014-01-10", "2014-01-11", 
"2014-01-12", "2014-01-13", "2014-01-14", "2014-01-15", "2014-01-16", 
"2014-01-17", "2014-01-18", "2014-01-19", "2014-01-20", "2014-01-21"
), class = "factor"), value = c(33L, 30L, 42L, 15L, 36L, 44L, 
31L, 30L, 42L, 20L, 25L, 9L, 25L, 17L, 3L, 39L, 14L, 26L, 14L, 
41L, 23L, 16L, 2L, 12L, 19L, 35L, 3L, 22L, 8L, 50L, 48L, 41L, 
30L, 40L, 6L, 15L), mean = c(33L, 36L, 45L, 25L, 6L, 20L, 34L, 
30L, 36L, 36L, 19L, 49L, 11L, 32L, 40L, 34L, 47L, 41L, 45L, 15L, 
25L, 48L, 37L, 33L, 48L, 35L, 23L, 27L, 24L, 28L, 42L, 7L, 14L, 
37L, 31L, 19L), accuracy = c(100L, 120L, 107L, 167L, 17L, 45L, 
110L, 100L, 86L, 180L, 76L, 544L, 44L, 188L, 1333L, 87L, 336L, 
158L, 321L, 37L, 109L, 300L, 1850L, 275L, 253L, 100L, 767L, 123L, 
300L, 56L, 88L, 17L, 47L, 93L, 517L, 127L), id = c(52L, 52L, 
52L, 52L, 52L, 52L, 52L, 52L, 52L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 49L, 
49L, 49L, 49L, 49L, 49L, 49L, 49L, 49L), criteria = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 
8L, 8L, 8L), .Label = c("a", "b", "d", "e", "f", "g", "h", "I"
), class = "factor")), .Names = c("date", "value", "mean", "accuracy", 
"id", "criteria"), class = "data.frame", row.names = c(NA, -36L
))
EN

回答 1

Stack Overflow用户

发布于 2015-08-05 15:01:32

对我有效的解决方案是使用自定义函数,而不是使用summarize,其中,我可以使用nrow()来获取切片中的行数。

解决方案:

代码语言:javascript
复制
metrics <- ddply(foo, c("id", "criteria"), function(df) data.frame(mean=mean(df$accuracy, na.rm=T), median=median(df$accuracy, na.rm=T), rmse=sqrt(sum((df$mean - df$value)^2 , na.rm = TRUE ) / nrow(df))))

谢谢你的指点。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/31808862

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档