我知道cassandra不支持joins,所以要使用cassandra,我们需要对表进行反规范化。我想知道是怎么回事?假设我有两张表
<dl>
<dt>Publisher</dt>
<dd>Id : <i>Primary Key</i></dd>
<dd>Name</dd>
<dd>TimeStamp</dd>
<dd>Address</dd>
<dd>PhoneNo</dd>
<dt>Book</dt>
<dd>Id : <i>Primary Key</i></dd>
<dd>Name</dd>
<dd>ISBN</dd>
<dd>Year</dd>
<dd>PublisherId : <i>Foreign Key - Referenes Publisher table's Id</i></dd>
<dd>Cost</dd>
</dt>
</dl>
请告诉我如何对这些表进行反规范化,以便有效地实现以下操作
我很少看到关于cassandra的文章。但不能结束对上述操作的反规范化。请帮帮我。
发布于 2015-07-29 01:31:16
对于一个问题来说,设计整个模式是一项相当大的任务,但一般而言,反规范化意味着您将在多个表中重复相同的数据,以便您可以读取单行以获得每种类型查询所需的所有数据。
因此,您将为每种类型的查询创建一个表,如下所示:
因此,在数据的每一行中,您将重复查询要返回的所有数据(即示例表中所有列的联合)。当您插入新书时,您会将其插入到您的所有表中。
发布于 2015-07-29 03:07:03
这听起来可能会变得很大,所以我将采用第一个方法,并介绍我将如何处理它。你不需要这样做,这只是一种方法。请注意,您可能需要为上述4种方案中的每一种创建查询表。此表仅适用于第一种情况。
首先,我将为发布者地址创建一个类型。
CREATE TYPE address (
street text,
city text,
state text,
postalCode text
);接下来,我将创建一个名为booksByPublisher的表。我将对publisherAddress使用我的address类型。我将使用publisherid作为分区键来构建我的主键,在bookYear和isbn上进行集群。
由于您希望能够按特定出版商查询所有图书,因此将其指定为分区键是有意义的。对结果进行年份排序可能会有帮助,或者至少能够查看特定出版商的特定年份,因此我将bookYear作为第一个聚类键。当然,为了在出版商中为每本书创建一个惟一的CQL行,我将添加isbn以保证唯一性。
CREATE TABLE booksByPublisher (
publisherid UUID,
publisherName text,
publisherAddress frozen<address>,
publisherPhoneNo text,
bookName text,
isbn text,
bookYear bigint,
bookCost bigint,
bookAuthor text,
PRIMARY KEY (publisherid, bookYear, isbn)
);
INSERT INTO booksByPublisher (publisherid, publishername, publisheraddress, publisherphoneno, bookname, isbn, bookyear, bookcost, bookauthor)
VALUES (b7b99ee9-f495-444b-b849-6cea82683d0b,'Crown Publishing',{ street: '1745 Broadway', city: 'New York', state:'NY', postalcode: '10019'},'212-782-9000','Ready Player One','978-0307887443',2005,812,'Ernest Cline');
INSERT INTO booksByPublisher (publisherid, publishername, publisheraddress, publisherphoneno, bookname, isbn, bookyear, bookcost, bookauthor)
VALUES (b7b99ee9-f495-444b-b849-6cea82683d0b,'Crown Publishing',{ street: '1745 Broadway', city: 'New York', state:'NY', postalcode: '10019'},'212-782-9000','Armada','978-0804137256',2015,1560,'Ernest Cline');
INSERT INTO booksByPublisher (publisherid, publishername, publisheraddress, publisherphoneno, bookname, isbn, bookyear, bookcost, bookauthor)
VALUES (uuid(),'The Berkley Publishing Group',{ street: '375 Hudson Street', city: 'New York', state:'NY', postalcode: '10014'},'212-333-2354','Rainbox Six','978-0425170342',1999,867,'Tom Clancy');现在,我可以像这样查询Crown Publishing (publisherid=b7b99ee9-f495-444b-b849-6cea82683d0b)出版的所有图书(共3行):
aploetz@cqlsh:stackoverflow2> SELECT * FROM booksbypublisher
WHERE publisherid=b7b99ee9-f495-444b-b849-6cea82683d0b;
publisherid | bookyear | isbn | bookauthor | bookcost | bookname | publisheraddress | publishername | publisherphoneno
--------------------------------------+----------+----------------+--------------+----------+------------------+-------------------------------------------------------------------------------+------------------+------------------
b7b99ee9-f495-444b-b849-6cea82683d0b | 2005 | 978-0307887443 | Ernest Cline | 812 | Ready Player One | {street: '1745 Broadway', city: 'New York', state: 'NY', postalcode: '10019'} | Crown Publishing | 212-782-9000
b7b99ee9-f495-444b-b849-6cea82683d0b | 2015 | 978-0804137256 | Ernest Cline | 1560 | Armada | {street: '1745 Broadway', city: 'New York', state: 'NY', postalcode: '10019'} | Crown Publishing | 212-782-9000
(2 rows)如果我愿意,我还可以查询到Crown Publishing 2015年的所有图书:
aploetz@cqlsh:stackoverflow2> SELECT * FROM booksbypublisher
WHERE publisherid=b7b99ee9-f495-444b-b849-6cea82683d0b AND bookyear=2015;
publisherid | bookyear | isbn | bookauthor | bookcost | bookname | publisheraddress | publishername | publisherphoneno
--------------------------------------+----------+----------------+--------------+----------+----------+-------------------------------------------------------------------------------+------------------+------------------
b7b99ee9-f495-444b-b849-6cea82683d0b | 2015 | 978-0804137256 | Ernest Cline | 1560 | Armada | {street: '1745 Broadway', city: 'New York', state: 'NY', postalcode: '10019'} | Crown Publishing | 212-782-9000
(1 rows)但是我不能只按bookyear查询
aploetz@cqlsh:stackoverflow2> SELECT * FROM booksbypublisher WHERE bookyear=2015;
InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it might
involve data filtering and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW FILTERING"不要听错误消息并添加ALLOW FILTERING。对于包含3行(甚至300行)的表,这可能很好用。但它不适用于包含300万行的表(您将获得超时)。当您通过一个完整的分区键进行查询时,Cassandra工作得最好。因为publisherid是我们的分区键,所以这个查询将执行得很好。但是如果您需要按bookYear查询,那么您应该创建一个使用bookYear作为其分区键的表。
https://stackoverflow.com/questions/31681512
复制相似问题