首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在EKS上使用EKS上的弹力搜索群集处理意外磁盘容量问题

在EKS上使用EKS上的弹力搜索群集处理意外磁盘容量问题
EN

Stack Overflow用户
提问于 2022-01-28 13:27:37
回答 1查看 52关注 0票数 1

我在kubernetes集群(EKS)中配置了一个elasticsearch集群,elasticsearch集群有3个节点,我已经为节点设置了一个8E磁盘来存储数据。(认为我暂时不会有任何空间问题)

代码语言:javascript
复制
[root@es-cluster-0 elasticsearch]# curl -s -XGET http://localhost:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host         ip           node
    36       66.7gb   966.1gb   8191.9pb   8191.9pb            0 10.65.32.184 10.65.32.184 es-cluster-0
    33       82.6gb   966.1gb   8191.9pb   8191.9pb            0 10.65.32.202 10.65.32.202 es-cluster-2
    37         76gb   966.1gb   8191.9pb   8191.9pb            0 10.65.32.178 10.65.32.178 es-cluster-1
    14                                                                                     UNASSIGNED

集群当前的健康状况是:

代码语言:javascript
复制
[root@es-cluster-0 elasticsearch]# curl -s -XGET http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "k8s-logs",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 56,
  "active_shards" : 106,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 14,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 88.33333333333333
}

我可以看到,我有14个"unassigned_shards",它与上面/_cat/allocation的最后一行完全匹配。

当我开始弄清楚发生了什么事时,我发现:

代码语言:javascript
复制
[root@es-cluster-0 elasticsearch]# curl -s -XGET http://localhost:9200/_cluster/allocation/explain?pretty
{
  "index" : "logstash-2022.01.22",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2022-01-22T00:00:11.254Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [bf_GjmcUQGuCTk-_voh4Xw]: failed recovery, failure RecoveryFailedException[[logstash-2022.01.22][0]: Recovery failed from {es-cluster-0}{hYJ4ifx7R7yWJq6VFP3Drw}{jjAAtdcmQXeVpJXxj4DYcA}{10.65.32.184}{10.65.32.184:9300}{dilmrt}{ml.machine_memory=15878057984, ml.max_open_jobs=20, xpack.installed=true, transform.node=true} into {es-cluster-1}{bf_GjmcUQGuCTk-_voh4Xw}{QNp4DD51TQa716D4TjMFPg}{10.65.32.178}{10.65.32.178:9300}{dilmrt}{ml.machine_memory=15878057984, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[es-cluster-0][10.65.32.184:9300][internal:index/shard/recovery/start_recovery]]; nested: RemoteTransportException[[es-cluster-1][10.65.32.178:9300][internal:index/shard/recovery/clean_files]]; nested: UncategorizedExecutionException[Failed execution]; nested: NotSerializableExceptionWrapper[execution_exception: java.io.IOException: Disk quota exceeded]; nested: IOException[Disk quota exceeded]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "7WHft5LVTYCEWvwKM64A-w",
      "node_name" : "es-cluster-2",
      "transport_address" : "10.65.32.202:9300",
      "node_attributes" : {
        "ml.machine_memory" : "15878057984",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },         
--- TRUNCATED ---

我不知道为什么要说Disk quota exceeded,如果elasticsearch集群正确地报告了它的可用容量,那么/_cat/allocation还有什么额外的配置需要设置,以便告诉elasticsearch,我们有足够的空间可以使用?

EN

回答 1

Stack Overflow用户

发布于 2022-01-28 15:39:46

有关可能导致磁盘配额错误的EFS限制,请参阅此处,该错误与磁盘大小无关。一般来说,EFS不支持相当大的ES堆栈,例如elasticsearch期望每个数据节点实例有64K文件描述符,但EFS目前只支持32K。如果您查看您的elasticsearch日志,可能会发现哪些限制已经违反了。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70894538

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档