所以我一直在尝试合并这两个data.table,它们看起来像这样
structure(list(orderDate = structure(c(18414, 18444, 18475, 18506,
18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779
), class = "Date"), productName = c("A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady"), totalOrders = c(2L,
15L, 52L, 225L, 27L, 10L, 5L, 19L, 36L, 41L, 58L, 16L, 2L)), row.names = c(NA,
-13L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000024e1b7d1ef0>, sorted = "orderDate")和
structure(list(returnDate = structure(c(18444, 18475, 18506,
18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779
), class = "Date"), productName = c("A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady"), totalReturns = c(5L, 10L, 129L, 73L, 18L,
3L, 8L, 15L, 43L, 44L, 30L, 6L), orderDate = structure(c(18444,
18475, 18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718,
18748, 18779), class = "Date")), row.names = c(NA, -12L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000024e1b7d1ef0>, sorted = "orderDate")结果是合并的data.table
structure(list(orderDate = structure(c(18444, 18475, 18506, 18536,
18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779), class = "Date"),
productName = c("A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",
"A. De La Sota Lady"), totalOrders = c(15L, 52L, 225L, 27L,
10L, 5L, 19L, 36L, 41L, 58L, 16L, 2L), totalReturns = c(5L,
10L, 129L, 73L, 18L, 3L, 8L, 15L, 43L, 44L, 30L, 6L)), sorted = "orderDate", class = c("data.table",
"data.frame"), row.names = c(NA, -12L), .internal.selfref = <pointer: 0x0000024e1b7d1ef0>)但是,在returnTest表中,缺少一个日期行。
我尝试使用productName列作为键列进行合并,但出于某种原因,它一直给我一个错误,这是我可以在没有错误的情况下合并两个表的唯一方法。最终,我希望有一个数据表来检查某个产品的回报率,但是使用这种方法,我总是会错过一个月的时间,在那里我可以得到订单,但没有退货,反之亦然。有人能帮忙吗?我想解决这个问题已经有一个星期了。
test1 <- ordersByProductNameAndSize[`productName` == 'A. De La Sota Lady' ]
setkeyv(test1, 'orderDate')
test2 <- returnsByProductNameAndSize[`productName` == 'A. De La Sota Lady' ]
test2[, 'orderDate' := returnDate]
setkeyv(test2, 'orderDate'
returnTest <- merge(test1, test2[, c('orderDate', 'totalReturns'), all = TRUE, with = FALSE]) # , 'totalReturns'
returnTest[, 'returnRate' := ((totalReturns / totalOrders) *100)] 发布于 2021-07-01 13:41:57
谢谢你贴出你的资料!如果我正确地理解了这一点,您的“缺失”值就是在2020-06-01期间订购的项目,但是在那个日期没有任何返回,对吗?
t1 <- structure(list(
orderDate = structure(c(18414, 18444, 18475, 18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779 ), class = "Date"),
productName = c("A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady"),
totalOrders = c(2L, 15L, 52L, 225L, 27L, 10L, 5L, 19L, 36L, 41L, 58L, 16L, 2L)),
row.names = c(NA, -13L),
class = c("data.table", "data.frame"))
t2 <- structure(list(
returnDate = structure(c(18444, 18475, 18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779 ), class = "Date"),
productName = c("A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady"),
totalReturns = c(5L, 10L, 129L, 73L, 18L, 3L, 8L, 15L, 43L, 44L, 30L, 6L),
orderDate = structure(c(18444, 18475, 18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779), class = "Date")),
row.names = c(NA, -12L),
class = c("data.table", "data.frame"))
rt <- merge(t1, t2, by = "orderDate", all = TRUE)
# calculate return rate
rt$returnRate <- (rt$totalReturns / rt$totalOrders) * 100https://stackoverflow.com/questions/68211227
复制相似问题