首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >哪些产品最常在一起销售?-分析练习

哪些产品最常在一起销售?-分析练习
EN

Stack Overflow用户
提问于 2022-08-01 23:42:52
回答 3查看 111关注 0票数 -1

我一直试图在R中做一次关于销售的数据分析练习。数据文件就像:

代码语言:javascript
复制
   Order_ID Product                 
      <dbl> <chr>                   
 1   319631 34in Ultrawide Monitor  
 2   319631 Lightning Charging Cable
 3   319596 iPhone                  
 4   319596 Lightning Charging Cable
 5   319584 iPhone                  
 6   319584 Wired Headphones        
 7   319556 Google Phone            
 8   319556 Wired Headphones

我必须找出哪些产品是最常一起购买的,Order_ID有副本,即它们是由同一个人购买的。

我用Python做了这个节选,但我不能在R上做,我的Python代码是:

代码语言:javascript
复制
pares_compras[['Order ID', 'Product']]

>  Order ID Product
2   176560  Google Phone
3   176560  Wired Headphones
17  176574  Google Phone
18  176574  USB-C Charging Cable
29  176585  Bose SoundSport Headphones

pares_compras.groupby('Order ID')['Product'].transform(lambda x: ','.join(x))

>
2                             Google Phone,Wired Headphones
3                             Google Phone,Wired Headphones
17                        Google Phone,USB-C Charging Cable
18                        Google Phone,USB-C Charging Cable

pares_compras = pares_compras[['Order ID', 'Grouped Products']].drop_duplicates()
pares_compras

>   Order ID    Grouped Products
2   176560  Google Phone,Wired Headphones
17  176574  Google Phone,USB-C Charging Cable
29  176585  Bose SoundSport Headphones,Bose SoundSport Hea...
31  176586  AAA Batteries (4-pack),Google Phone
118 176672  Lightning Charging Cable,USB-C Charging Cable

count = Counter()

for row in pares_compras['Grouped Products']:
    row_list = row.split(',')
    count.update(Counter(combinations(row_list, 2)))
count

> Counter({('Google Phone', 'Wired Headphones'): 414,
         ('Google Phone', 'USB-C Charging Cable'): 987,
         ('Bose SoundSport Headphones', 'Bose SoundSport Headphones'): 27, ... )}

for key, num in count.most_common(5):
    print(key, num)
>
('iPhone', 'Lightning Charging Cable') 1005
('Google Phone', 'USB-C Charging Cable') 987
('iPhone', 'Wired Headphones') 447
('Google Phone', 'Wired Headphones') 414
('Vareebadd Phone', 'USB-C Charging Cable') 361

所以,我可以解决这个问题,但是,就像我之前说过的,我不能在R中做同样的事情,我找不到方法,我刚在R里开始,如果有人能帮我,我会很感激的,萨克斯。

EN

回答 3

Stack Overflow用户

发布于 2022-08-02 01:13:44

嗯,我认为共现矩阵实际上是一个很好的解决方案。

另一种方法是考虑产品配置文件有多不同或有多相似。

代码语言:javascript
复制
orders <- read.csv(header = TRUE, text ='
"row", "order", "product"
1,   319631, "34in Ultrawide Monitor"
2,   319631, "Lightning Charging Cable"
3,   319596, "iPhone"
4,   319596, "Lightning Charging Cable"
5,   319584, "iPhone"
6,   319584, "Wired Headphones"
7,   319556, "Google Phone"
8,   319556, "Wired Headphones"')  |>
  dplyr::mutate(product = trimws(product))

df <- tidyr::pivot_wider(orders,
                   values_from = product, 
                   names_from = product, 
                   id_cols = order) |>
  dplyr::mutate(across( `34in Ultrawide Monitor`:`Google Phone` ,
                        ~!is.na(.x))) |>
  select(-order) 
cor(df)

dist(t(df))
dist(t(df), method = "binary")
票数 0
EN

Stack Overflow用户

发布于 2022-08-02 02:03:25

我就把这个留给你,作为你的另一个选择。

在这里,我在两个data.frames中列出了唯一的组合,并使用嵌套的apply函数检查是否相同,并在cbinding之后用rowsums计算结果。

代码语言:javascript
复制
a <- expand.grid(a = df$Product,b = df$Product) |>
  rowwise() |> 
  mutate(c = list(sort(c(a, b))), a = c[[1]], b = c[[2]]) |> 
  distinct() |> 
  filter(a != b)
  
  b <- df |> 
  group_by(Order_ID) |> 
  summarise(Product = list(c(Product)))

     
a$count <- rowSums(do.call(cbind, 
lapply(b$Product, \(one) sapply(a$c, \(two) +(all(two %in% one))))))
代码语言:javascript
复制
   a                      b                      count
   <chr>                  <chr>                  <dbl>
 1 34inUltrawideMonitor   LightningChargingCable     1
 2 34inUltrawideMonitor   iPhone                     0
 3 34inUltrawideMonitor   WiredHeadphones            0
 4 34inUltrawideMonitor   GooglePhone                0
 5 iPhone                 LightningChargingCable     1
 6 LightningChargingCable WiredHeadphones            0
 7 GooglePhone            LightningChargingCable     0
 8 iPhone                 WiredHeadphones            1
 9 GooglePhone            iPhone                     0
10 GooglePhone            WiredHeadphones            1
票数 0
EN

Stack Overflow用户

发布于 2022-08-02 14:55:12

使用data.table连接而不是共生矩阵的解决方案。使用更大的数据集(~3M行),它在我的机器上的速度几乎是使用来自this answerthis answer的两倍。

代码语言:javascript
复制
library(data.table)
library(Matrix) # for comparison with a co-occurrence matrix solution

# Example dataset
n <- 1e6L
orderID <- rep.int(sample.int(n), rpois(n, 1) + 2L)
dt <- unique(data.table(orderID, product = stringi::stri_rand_strings(length(orderID), 2, pattern = "[a-z]")))

# solution using a data.table join
f1 <- function(dt) {
  dt2 <- dt[
    , x := .I
  ][
    dt,
    on = .(orderID = orderID, x > x),
    nomatch = 0
  ][
    product > i.product, c("product", "i.product") := list(i.product, product)
  ][
    , .(count = .N), .(product, i.product)
  ]
  dt[, x := NULL]
  setnames(dt2, c("product1", "product2", "count"))
  setorder(dt2, -count, product1, product2)
}

# co-occurrence matrix solution (slightly modified so the output of the two
# functions is the same)
f2 <- function(dt) {
  dt$product <- as.factor(dt$product)
  dt4 <- setDT(
    summary(
      crossprod(
        xtabs(~ orderID + product, dt, sparse = TRUE)
      )
    )
  )[
    i < j
  ][
    , `:=`(
      i = as.character(levels(dt$product)[i]),
      j = as.character(levels(dt$product)[j]),
      x = as.integer(x)
    )
  ]
  dt[, product := as.character(product)]
  attr(dt4, "header") <- NULL
  setnames(dt4, c("product1", "product2", "count"))
  setorder(dt4, -count, product1, product2)
}

Benchmarking:
    
#> Unit: seconds
#>  expr      min       lq     mean   median       uq      max neval
#>    f1 1.026762 1.128890 1.224291 1.261732 1.278617 1.362014    10
#>    f2 1.984068 2.295159 2.355434 2.403337 2.465651 2.589428    10
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73200582

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档