试图了解更多的R..。希望找到一种干净且易于遵循的方式来接收订单DF:
customerID Timestamp freq lat 1 1 2017-01-01 2 31.0 2 2 2017-01-01 3 90.5 3 3 2017-01-01 1 NaN 4 4 2017-01-01 1 NaN 5 1 2017-02-01 2 31.0 6 2 2017-03-01 3 90.5 7 2 2017-07-01 3 90.5
并根据lat和freq的一系列桶创建一个带有计数的网格。桶:
例如lat freq 61+ 31-60 0-30 5+ 0 0 0 2-4 3 2 0 1 0 0 2
迪普特:
> dput(orders)
structure(list(customerID = c(1L, 2L, 3L, 4L, 1L, 2L, 2L), Timestamp =
structure(c(17167,
17167, 17167, 17167, 17198, 17226, 17348), class = "Date"), freq = c(2L,
3L, 1L, 1L, 2L, 3L, 3L), lat = c(31, 90.5, NaN, NaN, 31, 90.5,
90.5)), .Names = c("customerID", "Timestamp", "freq", "lat"), row.names =
c(NA, 7L), class = "data.frame")更新
一直在努力..。我用cut .但不确定是不是最好的路线。不过,不知道如何完成网格。
例如:
orders$freq_range <- cut(orders$freq, breaks=c(0,1,4,100000), labels=c("1","2-4","5+"))
发布于 2017-11-18 05:34:20
您可以使用table获得交叉表输出:
df
customerID Timestamp freq lat
1 2017-01-01 2 31.0
2 2017-01-01 3 90.5
3 2017-01-01 1 NA
4 2017-01-01 1 NA
1 2017-02-01 2 31.0
2 2017-03-01 3 90.5
2 2017-07-01 3 90.5
2 2017-07-01 5 90.5
3 2017-07-01 6 100.5
df$a<-cut(df$freq, breaks=c(0,1,4,100000), labels=c("1","2-4","5+"))
df$b <- cut(df$lat, breaks=c(0,30,60,100000), labels=c("0-30","31-60","60+"))
df
customerID Timestamp freq lat a b
1 2017-01-01 2 31.0 2-4 31-60
2 2017-01-01 3 90.5 2-4 60+
3 2017-01-01 1 NA 1 <NA>
4 2017-01-01 1 NA 1 <NA>
1 2017-02-01 2 31.0 2-4 31-60
2 2017-03-01 3 90.5 2-4 60+
2 2017-07-01 3 90.5 2-4 60+
2 2017-07-01 5 90.5 5+ 60+
3 2017-07-01 6 100.5 5+ 60+
table(df$a, df$b)
0-30 31-60 60+
1 0 0 0
2-4 0 2 3
5+ 0 0 2https://stackoverflow.com/questions/47361387
复制相似问题