首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >dsbulk卸载在大表上失败

dsbulk卸载在大表上失败
EN

Stack Overflow用户
提问于 2021-04-24 15:15:39
回答 2查看 202关注 0票数 1

试图从一个巨大的表中卸载数据,下面是使用的命令和输出。

$ /home/cassandra/dsbulk-1.8.0/bin/dsbulk卸载--driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx --driver.auth.password xxxx -- datastax and java-driver.basic.contact-points 123.123.123.123 -query "select count(*) from sometable with where on clustering and partial pk --允许筛选“--connector.name json --driver.protocol压缩LZ4 --connector.json.mode MULTI_DOCUMENT -maxConcurrentFiles 1 -maxRecords -1 -urldsbulk --executor.ContinousPaging.enable false --executor.maxpersecond 2500 --driver.socket.timeout 240000

代码语言:javascript
复制
Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
total | failed | rows/s |      p50ms |      p99ms |     p999ms
    1 |      1 |      0 | 109,790.10 | 110,058.54 | 110,058.54
Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.

下面是dsbulk记录--

代码语言:javascript
复制
cassandra@somehost> cd logs
cassandra@somehost> cd COUNT_20210423-070104-108326/
cassandra@somehost> ls
operation.log  unload-errors.log
cassandra@somehost> cat operation.log
2021-04-23 07:01:04 WARN  Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
2021-04-23 07:01:04 WARN  Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
2021-04-23 07:01:04 INFO  Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
2021-04-23 07:02:55 WARN  Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
2021-04-23 07:02:55 INFO  Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:55 INFO  Memory usage: used: 212 MB, free: 1,922 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:55 INFO  Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:55 INFO  Throughput: 0 reads/second
2021-04-23 07:02:55 INFO  Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
2021-04-23 07:02:58 INFO  Final stats:
2021-04-23 07:02:58 INFO  Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:58 INFO  Memory usage: used: 251 MB, free: 1,883 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:58 INFO  Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:58 INFO  Throughput: 0 reads/second
2021-04-23 07:02:58 INFO  Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds

cassandra@somehost> cat unload-errors.log
Statement: com.datastax.oss.driver.internal.core.cql.DefaultBoundStatement@1083fef9 [0 values, idempotence: <UNSET>, CL: <UNSET>, serial CL: <UNSET>, timestamp: <UNSET>, timeout: <UNSET>]
SELECT batch_id from .... allow filtering (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))
        at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:534)
        at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.lambda$fetchNextPage$1(ResultSubscription.java:372)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.setFinalError(CqlRequestHandler.java:447) [4 skipped]
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$700(CqlRequestHandler.java:94)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:859)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
        at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at java.lang.Thread.run(Thread.java:748) [24 skipped]
Caused by: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

卡桑德拉的system.log片段

代码语言:javascript
复制
DEBUG [ScheduledTasks:1] 2021-04-23 00:01:48,539  MonitoringTask.java:152 - 1 operations timed out in the last 5015 msecs:
<SELECT * FROM my query being run with limit - LIMIT 5000>, total time 10004 msec, timeout 10000 msec/cross-node
INFO  [ScheduledTasks:1] 2021-04-23 00:02:38,540  MessagingService.java:1302 - RANGE_SLICE messages were dropped in last 5000 ms: 0 internal and 1 cross node
. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 10299 ms
INFO  [ScheduledTasks:1] 2021-04-23 00:02:38,551  StatusLogger.java:114 -
Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
ReadStage                         1         0     1736872997         0                 0
ContinuousPagingStage             0         0            586         0                 0
RequestResponseStage              0         0     1483193130         0                 0
ReadRepairStage                   0         0        9079516         0                 0
CounterMutationStage              0         0              0         0                 0
MutationStage                     0         0      351841038         0                 0
ViewMutationStage                 0         0              0         0                 0
CommitLogArchiver                 0         0          32961         0                 0
MiscStage                         0         0              0         0                 0
CompactionExecutor                0         0       12034828         0                 0
MemtableReclaimMemory             0         0          68612         0                 0
PendingRangeCalculator            0         0              9         0                 0
AntiCompactionExecutor            0         0              0         0                 0
GossipStage                       0         0       20137208         0                 0
SecondaryIndexManagement          0         0              0         0                 0
HintsDispatcher                   0         0           3798         0                 0
MigrationStage                    0         0              8         0                 0
MemtablePostFlush                 0         0         338955         0                 0
PerDiskMemtableFlushWriter_0         0         0          66297         0                 0
ValidationExecutor                0         0         247600         0                 0
Sampler                           0         0              0         0                 0
MemtableFlushWriter               0         0          41757         0                 0
InternalResponseStage             0         0         525242         0                 0
AntiEntropyStage                  0         0         767527         0                 0
CacheCleanupExecutor              0         0              0         0                 0
Native-Transport-Requests         0         0      958717934         0                65
CompactionManager                 0         0
MessagingService                n/a       0/0
Cache Type                     Size                 Capacity               KeysToSave
KeyCache                  104857216                104857600                      all
RowCache                          0                        0                      all
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-04-24 16:06:52

在令牌范围上添加一个附加条件来扩展select count(*) from sometable with where on clustering column and partial pk -- allow filtering,例如:and partial pk token(full_pk) > :start and token(full_pk) <= :end -在本例中,DSBulk将针对发送到多个节点的特定令牌范围执行许多查询,而不会像您的示例那样在单个节点上创建加载。

查看documentation for -query option,在本系列关于DSBulk的博客文章中的第4篇,可以提供更多信息和示例:123456

票数 0
EN

Stack Overflow用户

发布于 2021-04-24 22:12:44

问题是您在DSBulk中运行unload命令来执行SELECT COUNT(),这意味着它必须执行全表扫描才能返回一行。

此外,除非将查询限制为单个分区,否则不建议使用ALLOW FILTERING。在任何情况下,即使在最佳情况下,ALLOW FILTERING的性能也是非常不可预测的。

我建议您改用DSBulk count命令,该命令针对Cassandra中的行或分区计数进行了优化。有关详情,请参阅Counting data with DSBulk example

在这个DSBulk Counting blog post中还有额外的例子,Alex Ott已经在他的答案中链接了。干杯!

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67240233

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档