我有一个与客户,产品和类别的购买数据集。
customer product category sales_value
A aerosol air_care 10
B aerosol air_care 12
C aerosol air_care 7
A perfume air_care 8
A perfume air_care 2
D perfume air_care 11
C burger food 13
D fries food 6
C fries food 9我希望,对于每一个产品,在这个产品上花费的销售价值和这个产品类别上的销售价值之间的比率,由购买该产品至少一次的客户提供。
另一种说法是:以至少购买一次fries的客户为例,计算A)用于fries和B的销售价值之和,以及用于food的销售价值之和。
中间表的形式如下:
product category sum_spent_on_product sum_spent_on_category ratio
by_people_buying_product
aerosol air_care 29 39 0.74
perfume air_care 21 31 0.68
burger food 13 22 0.59
fries food 15 28 0.53至少买过一次aerosol的人在这个产品上总共花了1800英镑。总的来说,同样的人在air_care类别上花费了3600英镑( aerosol属于这个类别)。因此,aerosol的比率为0.5。
我尝试用left join lateral来解决这个问题,并为每个product计算给定的中间结果,但是我无法思考如何包含条件only for customers who bought this specific product。
select
distinct (product_id)
, category
, c.sales_category
from transactions t
left join lateral (
select
sum(sales_value) as sales_category
from transactions
where category = t.category
group by category
) c on true
;上面的查询列出了每个产品在产品类别上的支出之和,但没有所需的产品-购买者条件。
left join lateral是正确的道路吗?在普通SQL中还有其他解决方案吗?
发布于 2019-02-24 20:31:17
我将使用一个窗口函数来计算每个类别中每个客户的总支出:
SELECT
customer, product, category, sales_value,
sum(sales_value) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions;
customer | product | category | sales_value | tot_cat
----------+---------+----------+-------------+---------
A | aerosol | air_care | 10.00 | 20.00
A | perfume | air_care | 8.00 | 20.00
A | perfume | air_care | 2.00 | 20.00
B | aerosol | air_care | 12.00 | 12.00
C | aerosol | air_care | 7.00 | 7.00
C | fries | food | 9.00 | 22.00
C | burger | food | 13.00 | 22.00
D | perfume | air_care | 11.00 | 11.00
D | fries | food | 6.00 | 6.00那我们只需要总结一下。当客户多次购买相同的产品时,就会出现一个问题。在您的例子中,客户A已经购买了两次香水。为了克服这个问题,让我们同时按客户、产品和类别分组(并与sales_value列之和):
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
customer | product | category | sales_value | tot_cat
----------+---------+----------+-------------+---------
A | aerosol | air_care | 10.00 | 20.00
A | perfume | air_care | 10.00 | 20.00 <-- this row summarizes rows 2 and 3 of previous result
B | aerosol | air_care | 12.00 | 12.00
C | aerosol | air_care | 7.00 | 7.00
C | burger | food | 13.00 | 22.00
C | fries | food | 9.00 | 22.00
D | perfume | air_care | 11.00 | 11.00
D | fries | food | 6.00 | 6.00现在,我们只需将sales_value和tot_cat相加即可得到中间结果表。我使用一个公共表表达式来获取名为t的前面的结果
WITH t AS (
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
)
SELECT
product, category,
sum(sales_value) AS sales_value, sum(tot_cat) AS tot_cat,
sum(sales_value) / sum(tot_cat) AS ratio
FROM t
GROUP BY product, category;
product | category | sales_value | tot_cat | ratio
---------+----------+-------------+---------+------------------------
aerosol | air_care | 29.00 | 39.00 | 0.74358974358974358974
fries | food | 15.00 | 28.00 | 0.53571428571428571429
burger | food | 13.00 | 22.00 | 0.59090909090909090909
perfume | air_care | 21.00 | 31.00 | 0.67741935483870967742发布于 2019-02-24 13:29:16
我希望,对于每一个产品,在这个产品上花费的销售价值和这个产品类别上的销售价值之间的比率,由购买该产品至少一次的客户提供。
如果我正确理解了这一点,您可以按人员和类别汇总销售情况,以获得该类别的总销售额。在Postgres中,您可以保留一个产品数组并将其用于匹配。因此,该查询看起来如下:
select p.product, p.category,
sum(p.sales_value) as product_only_sales,
sum(pp.sales_value) as comparable_sales
from purchases p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category;这里是db<>fiddle。
编辑:
数据允许在产品的日期重复。把东西扔了。解决方案是为每个客户预先按产品进行聚合:
select p.product, p.category, sum(p.sales_value) as product_only_sales, sum(pp.sales_value) as comparable_sales
from (select customer, category, product, sum(sales_value) as sales_value
from purchases p
group by customer, category, product
) p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category这里是本例的db<>fiddle。
https://stackoverflow.com/questions/54852276
复制相似问题