我使用的是Oracle视图(来自19c),定义如下:https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/overview-analytic-views.html
我可以用简单的聚合(例如: SUM,COUNT,MIN)创建度量,但是我不能创建一个比较计数(不同)的度量。我试过有计划的和基本的措施,但都没有成功。所以,
如何定义与计数相等的度量值(区分(X))?
要显示我的问题的一些示例SQL是(我可以在livesql.oracle.com中运行):
create table sales_fact (
CUSTOMER_ID VARCHAR2(50 CHAR),
COUNTRY VARCHAR2(50 CHAR),
PRODUCT_ID VARCHAR2(50 CHAR),
ORDER_ID VARCHAR2(50 CHAR),
MONTH_ID VARCHAR2(30 CHAR)
);
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '2', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A2', 'Jun-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('B', 'England', '2', 'B1', 'Jul-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('C', 'Germany', '2', 'C1', 'Aug-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('D', 'Germany', '3', 'D1', 'Apr-12');
create table time_dim (
MONTH_ID VARCHAR2(30),
MONTH_NAME VARCHAR2(40),
YEAR_ID VARCHAR2(30),
YEAR_NAME VARCHAR2(40)
);
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-11', 'April', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jun-11', 'June', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jul-11', 'July', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Aug-11', 'August', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-12', 'April', '12', 'CY2012');
CREATE OR REPLACE ATTRIBUTE DIMENSION time_attr_dim
USING time_dim
ATTRIBUTES
(year_id,
month_id)
LEVEL MONTH
KEY month_id
DETERMINES (year_id)
LEVEL YEAR
KEY year_id;
CREATE OR REPLACE HIERARCHY time_hier
USING time_attr_dim
(month CHILD OF year);
CREATE OR REPLACE ANALYTIC VIEW sales_av
USING sales_fact
DIMENSION BY
(time_attr_dim
KEY month_id REFERENCES month_id
HIERARCHIES (time_hier DEFAULT)
)
MEASURES
(salesCount FACT ORDER_ID AGGREGATE BY COUNT,
orderCount FACT ORDER_ID AGGREGATE BY COUNT -- How to make this count(distinct)?
)
;
-- 2011 should have a salesCount = 5 and orderCount = 4 (if it was distinct)
SELECT YEAR_ID, salesCount, orderCount
FROM sales_av HIERARCHIES(time_hier)
WHERE time_hier.level_name = 'YEAR'
ORDER BY time_hier.hier_order;发布于 2021-12-19 21:53:53
我发现的一个答案是,“通过计数进行聚合”仅适用于数字列。所以,就像:
order_count FACT DBMS_UTILITY.GET_HASH_VALUE(ORDER_ID,0,65536) AGGREGATE BY COUNT DISTINCT,可以工作,但是计数是基于散列的,对于大数据集来说是不准确的。如果我用整数列替换事实表列,那么值需要小于64991,否则分析视图使用的位图运算符在查询期间会产生一个错误(对我来说这似乎是一个很大的限制)。
一个更好的答案是使用“聚合由APPROX_COUNT_DISTINCT”,它使用的HLL算法。这只是精确到大约4%,但是避免了数值小于65k的数值列的需要。
https://stackoverflow.com/questions/70297176
复制相似问题