我使用的是cqlsh v5.0.1,其中我有一个8节点cassandra集群,其中有几个表,在其中我正在扫描一个表,并且有一个简单的逻辑:如果一个行比我删除的时间早于6个月,那么如果它小于6个月,那么我将更新该行的ttl,为了做到这一点,我使用了express-cassandra npm,并使用每行方法对表的行进行流处理,但是我经常收到关于服务器超时的错误,并且我的程序会终止,因为我没有得到需要进一步处理的下一个页面。
下面我附加了我的表配置和代码
Keyspace: events
Read Count: 550349
Read Latency: 14.500334253355598 ms.
Write Count: 46644769
Write Latency: 0.2615331485294739 ms.
Pending Flushes: 0
Table: track
SSTable count: 18
Space used (live): 1.56 TB
Space used (total): 1.56 TB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 2.66 GB
SSTable Compression Ratio: 0.12156681850176397
Number of partitions (estimate): 222854730
Memtable cell count: 4092
Memtable data size: 8.04 MB
Memtable off heap memory used: 0 bytes
Memtable switch count: 1828
Local read count: 550349
Local read latency: 12.668 ms
Local write count: 46644784
Local write latency: 0.201 ms
Pending flushes: 0
Bloom filter false positives: 5
Bloom filter false ratio: 0.00000
Bloom filter space used: 417.49 MB
Bloom filter off heap memory used: 570.87 MB
Index summary off heap memory used: 211.54 MB
Compression metadata off heap memory used: 1.89 GB
Compacted partition minimum bytes: 43 bytes
Compacted partition maximum bytes: 765.03 MB
Compacted partition mean bytes: 44.5 KB
Average live cells per slice (last five minutes): 10.050420168067227
Maximum live cells per slice (last five minutes): 124
Average tombstones per slice (last five minutes): 9.004201680672269
Maximum tombstones per slice (last five minutes): 1597 模式:
CREATE TABLE events.track (
"profileId" text,
"projectId" text,
"sessionId" bigint,
"anonymousId" text,
"appBuild" text,
"appName" text,
"appNamespace" text,
"appVersion" text,
attributes list<text>,
channels list<text>,
"deviceId" text,
"deviceManufacturer" text,
"deviceModel" text,
"deviceName" text,
"eventTypes" list<text>,
ip text,
"libraryName" text,
"libraryVersion" text,
locale text,
"networkCarrier" text,
"osName" text,
"osVersion" text,
"propertyIds" list<text>,
referrer text,
"screenDensity" int,
"screenHeight" int,
"screenWidth" int,
"sessionAttributes" map<text, text>,
texts list<text>,
timestamps list<timestamp>,
timezone text,
"userAgent" text,
"writeKey" text,
PRIMARY KEY (("profileId", "projectId"), "sessionId")
) WITH CLUSTERING ORDER BY ("sessionId" DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';流代码
EventModel.eachRow({}, { fetchSize: 10000 }, function (n, row) {
eventsChunk.push(row);
},
function (err, result) {
// error handling and business logic here
});任何帮助都将不胜感激。
发布于 2022-05-27 06:03:54
之所以会出现超时,是因为一个完整的表扫描非常昂贵,特别是当一个节点上有超过2亿个分区时。
您没有指定表模式和正在运行的查询,但我认为您正在使用ALLOW FILTERING进行范围查询,这会使节点处于加载状态,从而导致节点失去响应。
Cassandra是为OLTP工作负载设计的,您想要在其中快速检索单个分区。全表扫描是OLAP工作负载,因此您需要一个分析解决方案,例如使用Apache和火花卡桑德拉连接器。
连接器优化了对Cassandra的查询,因此连接器没有执行完整的表扫描,而是将它们分解为令牌范围的段,一次只请求一小部分。干杯!
https://stackoverflow.com/questions/72401022
复制相似问题