我有一个包含四列的数据,第一列有县名,第二列有周期,第三列有实测值(IPC类),第四列有预测值(预测)。实际值和预测值的范围都在1到5之间。这是32行按县排序的数据。
structure(list(County = c("Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo",
"Baringo", "Baringo", "Baringo", "Baringo"), `Period of measurement Kenya` = c("2011-01",
"2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07",
"2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01",
"2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07",
"2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06",
"2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02",
"2019-06"), `IPC class` = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2
), Forecast = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 3, 1, 1, 1, 1, 2, 1)), row.names = c(1L,
48L, 95L, 142L, 189L, 236L, 283L, 330L, 377L, 424L, 471L, 518L,
565L, 612L, 659L, 706L, 753L, 800L, 847L, 894L, 941L, 988L, 1035L,
1082L, 1129L, 1176L, 1223L, 1270L, 1317L, 1364L, 1411L, 1458L
), class = "data.frame") 因此,在我的报告中,我需要知道在我研究的这段时间里,有多少次危机过渡和多少次错误预测的危机转变。危机转换是指实际值列中的值从1或2变为3、4或5。在数据部分,您可以看到Baringo县发生了1次危机转换。为了计算这一点,使用了以下代码:
SUB_count_cristrans_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>%
arrange(County, `Period of measurement Kenya`) %>%
group_by(County) %>%
summarize(SUB_crisis_trans_count = sum(diff(crisis) > 0))预测错误的危机转换是指在发生危机转换时,预测列不显示与IPC类列相同的值。正如您在数据部分所看到的,Baringo的危机过渡被错误地预测了,因为预测列中的值不是3、4或5。所以我的问题是:在ifelse函数中有什么正确的条件来根据县减去错误预测的危机周期?换句话说,首先,它必须检查一段时期是否是一次危机过渡,从而从1或2过渡到3,4或5。如果是这样,则是预测列A 3、4或5中的值。如果不是这样,那么这就是错误预测的危机过渡。我现在的代码是:
SUB_count_crismiss_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis_miss = ifelse(`IPC class` %in% 3:5 & (!Forecast %in% 3:5), 1, 0)) %>%
arrange(County, `Period of measurement Kenya`) %>%
group_by(County) %>%
summarize(SUB_crisis_miss_count_KE = sum(diff(crisis_miss) > 0))让我知道,如果我需要添加什么或澄清!提前谢谢。
下面,我强调了加里萨县,以使它更清楚的问题是什么,我想解决或我想达到的目标。;)
> subset(sorted_long.SUB_dfCSKE_tot, County=="Garissa")
County Period of measurement Kenya IPC class Forecast
7 Garissa 2011-01 2 3
54 Garissa 2011-04 2 2
101 Garissa 2011-07 3 3
148 Garissa 2011-10 3 2
195 Garissa 2012-01 2 2
242 Garissa 2012-04 2 2
289 Garissa 2012-07 3 3
336 Garissa 2012-10 3 2
383 Garissa 2013-01 2 2
430 Garissa 2013-04 2 2
477 Garissa 2013-07 2 2
524 Garissa 2013-10 2 2
571 Garissa 2014-01 2 2
618 Garissa 2014-04 2 2
665 Garissa 2014-07 2 2
712 Garissa 2014-10 3 2
759 Garissa 2015-01 3 2
806 Garissa 2015-04 3 2
853 Garissa 2015-07 2 2
900 Garissa 2015-10 2 2
947 Garissa 2016-02 2 2
994 Garissa 2016-06 2 2
1041 Garissa 2016-10 2 2
1088 Garissa 2017-02 3 2
1135 Garissa 2017-06 3 3
1182 Garissa 2017-10 2 3
1229 Garissa 2018-02 3 2
1276 Garissa 2018-06 1 3
1323 Garissa 2018-10 1 1
1370 Garissa 2018-12 2 1
1417 Garissa 2019-02 2 2
1464 Garissa 2019-06 2 22011-04年至2011-07年期间发生了危机转变;IPC值从2次转变为3次。然而,在2011-07年和2011-10年期间,没有出现危机过渡,因为IPC值保持在3级,因此现在是预测错误的部分。对上述时期之间的危机过渡进行了适当的预测,预测值为3、4或5。2011-10年的预测值是不正确的,但由于没有危机过渡,因此不应将其计算在内。那么,我如何能够在没有危机转移的情况下跳过预测值呢?我希望现在更清楚。
加里萨郡的dput子集:
> copied_sorted_long <- dput(sorted_long.SUB_dfCSKE_tot[193:224,])
structure(list(County = c("Garissa", "Garissa", "Garissa", "Garissa",
"Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa",
"Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa",
"Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa",
"Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa",
"Garissa", "Garissa", "Garissa", "Garissa"), `Period of measurement Kenya` = c("2011-01",
"2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07",
"2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01",
"2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07",
"2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06",
"2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02",
"2019-06"), `IPC class` = c(2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 2,
2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2, 2, 3, 3, 2, 3, 1, 1, 2, 2, 2
), Forecast = c(3, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 3, 1, 1, 2, 2)), row.names = c(7L,
54L, 101L, 148L, 195L, 242L, 289L, 336L, 383L, 430L, 477L, 524L,
571L, 618L, 665L, 712L, 759L, 806L, 853L, 900L, 947L, 994L, 1041L,
1088L, 1135L, 1182L, 1229L, 1276L, 1323L, 1370L, 1417L, 1464L
), class = "data.frame")发布于 2020-02-11 18:44:08
我现在已经创建了一个变量data,它包含Garissa数据(只是为了保持名称简单)。那么,如果我正确地理解了您的意思,那么当有一个实际的转换时,您只想计算一个错误预测的(仅为)。如果没有过渡,根据定义,就不会出现错误预测(或者我们不关心这些情况)。在这种情况下,我认为它可以满足您的需要(当然,中间的data1部分和summary可以合并在一个长管道中)。同样,为了清晰起见,下面的data数据框架与您通过dput提供的Garissa子集相同。
data1 <- data %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0),
crisis_f = ifelse(Forecast %in% 3:5, 1, 0)) %>%
arrange(County, `Period of measurement Kenya`) %>%
group_by(County) %>%
mutate(crisis_trans = (crisis - lag(crisis)) > 0,
crisis_trans_f = (crisis_f - lag(crisis_f)) > 0,
misforecast = case_when(
crisis_trans & crisis_trans_f ~ FALSE,
crisis_trans & !crisis_trans_f ~ TRUE,
TRUE ~ FALSE
))
summary <- data1 %>%
group_by(County) %>%
summarise(n_transitions = sum(crisis_trans, na.rm = TRUE),
n_misforecast = sum(misforecast))
> summary
# A tibble: 1 x 3
County n_transitions n_misforecast
<chr> <int> <int>
1 Garissa 5 3下面的逻辑是,我们首先创建转换和预测转换。那么,当且仅当有一个过渡,我们把一个预测归类为一个错误的预测,如果它没有预测一个过渡。所有其他案件都被归类为没有误判。您不一定需要使用case_when,但我喜欢它,因为它非常清楚地了解正在发生的事情。
https://stackoverflow.com/questions/60150378
复制相似问题