首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >K8s pod优先级& outOfPods

K8s pod优先级& outOfPods
EN

Stack Overflow用户
提问于 2020-01-08 08:49:11
回答 1查看 3.2K关注 0票数 0

我们的情况是K8s集群在更新(kubernetes或更具体的是: ICP)后耗尽了豆荚,从而产生了"OutOfPods“错误消息。原因是一个较低的"podsPerCore"-setting,我们后来纠正了。在此之前,有一个提供的priorityClass (1000000)的豆荚不能被安排。其他的--没有priorityClass (0) --被安排好了。我采取了不同的行为。我认为K8s调度程序会杀死没有优先级的荚,这样就可以调度具有优先级的荚。我错了吗?

这只是一个需要理解的问题,因为无论如何,我都想保证优先级吊舱正在运行。

谢谢

与Prio一起的吊舱:

代码语言:javascript
复制
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: ibm-anyuid-hostpath-psp
  creationTimestamp: "2019-12-16T13:39:21Z"
  generateName: dms-config-server-555dfc56-
  labels:
    app: config-server
    pod-template-hash: 555dfc56
    release: dms-config-server
  name: dms-config-server-555dfc56-2ssxb
  namespace: dms
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: dms-config-server-555dfc56
    uid: c29c40e1-1da7-11ea-b646-005056a72568
  resourceVersion: "65065735"
  selfLink: /api/v1/namespaces/dms/pods/dms-config-server-555dfc56-2ssxb
  uid: 7758e138-2009-11ea-9ff4-005056a72568
spec:
  containers:
  - env:
    - name: CONFIG_SERVER_GIT_USERNAME
      valueFrom:
        secretKeyRef:
          key: username
          name: dms-config-server-git
    - name: CONFIG_SERVER_GIT_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: dms-config-server-git
    envFrom:
    - configMapRef:
        name: dms-config-server-app-env
    - configMapRef:
        name: dms-config-server-git
    image: docker.repository..../infra/config-server:2.0.8
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 90
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: config-server
    ports:
    - containerPort: 8080
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 250m
        memory: 600Mi
      requests:
        cpu: 10m
        memory: 300Mi
    securityContext:
      capabilities:
        drop:
        - MKNOD
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-v7tpv
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kub-test-worker-02
  priority: 1000000
  priorityClassName: infrastructure
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-v7tpv
    secret:
      defaultMode: 420
      secretName: default-token-v7tpv

没有Prio的Pod (只是同一个名称空间中的一个例子):

代码语言:javascript
复制
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: ibm-anyuid-hostpath-psp
  creationTimestamp: "2019-09-10T09:09:28Z"
  generateName: produkt-service-57d448979d-
  labels:
    app: produkt-service
    pod-template-hash: 57d448979d
    release: dms-produkt-service
  name: produkt-service-57d448979d-4x5qs
  namespace: dms
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: produkt-service-57d448979d
    uid: 4096ab97-5cee-11e9-97a2-005056a72568
  resourceVersion: "65065755"
  selfLink: /api/v1/namespaces/dms/pods/produkt-service-57d448979d-4x5qs
  uid: b112c5f7-d3aa-11e9-9b1b-005056a72568
spec:
  containers:
  - image: docker-snapshot.repository..../dms/produkt-    service:0b6e0ecc88a28d2a91ffb1db61f8ca99c09a9d92
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: produkt-service
    ports:
    - containerPort: 8080
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    securityContext:
      capabilities:
        drop:
        - MKNOD
      procMount: Default
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-v7tpv
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kub-test-worker-02
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-v7tpv
    secret:
      defaultMode: 420
      secretName: default-token-v7tpv
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-10 11:58:54

可能有很多情况会改变调度程序的工作。有一个关于它的文档:豆荚优先和抢占

请注意,在1.14.0版本时,这些特性被认为是稳定的。

从IBM的角度来看,请记住版本1.13.9将被支持直到2020年2月19日!

你说得对,低优先级的豆荚应该换成高优先级的豆荚。

让我用一个例子来说明这一点:

示例

让我们假设一个Kubernetes集群包含3个节点(1个主节点和2个节点):

  • 默认情况下,您不能在主节点上调度普通的吊舱。
  • 是唯一一个可以调度豆荚的工作节点,它有8GB的RAM
  • 第二个工作节点具有禁用调度的污染。

此示例将基于RAM的使用情况,但它可以以与CPU时间相同的方式使用。

优先级级

有两个优先类别:

  • 零优先级(0)
  • 高度优先(1 000 000)

YAML定义的零优先级类:

代码语言:javascript
复制
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: zero-priority
value: 0
globalDefault: false
description: "This is priority class for hello pod"

globalDefault: false用于未分配优先级类的对象。默认情况下,它将分配这个类。

YAML高优先级类的定义:

代码语言:javascript
复制
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This is priority class for goodbye pod"

要应用这个优先级类,您需要调用:$ kubectl apply -f FILE.yaml

部署

使用上面的对象,您可以创建部署:

  • Hello -低优先级部署
  • 再见-高度优先部署

YAML定义的hello部署:

代码语言:javascript
复制
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
spec:
  selector:
    matchLabels:
      app: hello
      version: 1.0.0
  replicas: 10
  template:
    metadata:
      labels:
        app: hello
        version: 1.0.0
    spec:
      containers:
      - name: hello
        image: "gcr.io/google-samples/hello-app:1.0"
        env:
        - name: "PORT"
          value: "50001"
        resources:
          requests:
            memory: "128Mi"
      priorityClassName: zero-priority

请仔细看一下这个片段:

代码语言:javascript
复制
        resources:
          requests:
            memory: "128Mi"
      priorityClassName: zero-priority

由于所请求的资源,它将限制豆荚的数量,并将为这一部署分配低优先级。

YAML对再见部署的定义:

代码语言:javascript
复制
apiVersion: apps/v1
kind: Deployment
metadata:
  name: goodbye
spec:
  selector:
    matchLabels:
      app: goodbye
      version: 2.0.0
  replicas: 3
  template:
    metadata:
      labels:
        app: goodbye
        version: 2.0.0
    spec:
      containers:
      - name: goodbye
        image: "gcr.io/google-samples/hello-app:2.0"
        env:
        - name: "PORT"
          value: "50001"
        resources:
          requests:
            memory: "6144Mi"
      priorityClassName: high-priority

另外,请对此片段进行详细的检查:

代码语言:javascript
复制
        resources:
          requests:
            memory: "6144Mi"
      priorityClassName: high-priority

这个吊舱将有更高的要求RAM和高优先级。

测试和故障排除

没有足够的信息来正确地解决这样的问题。没有从kubeletpodsnodesdeployments本身的许多组件的大量日志。

应用hello部署,看看会发生什么:$ kubectl apply -f hello.yaml

使用命令获取有关部署的基本信息:

$ kubectl get deployments hello

一段时间后,输出应该是这样的:

代码语言:javascript
复制
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
hello   10/10   10           10          9s

如你所见,所有的吊舱都准备好了,可用了。向它们分配了所要求的资源。

要获得更多用于故障排除的详细信息,可以调用:

  • $ kubectl describe deployment hello
  • $ kubectl describe node NAME_OF_THE_NODE

有关从上面命令分配的资源的示例信息:

代码语言:javascript
复制
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                250m (12%)    0 (0%)
  memory             1280Mi (17%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)

应用goodbye部署,看看会发生什么:$ kubectl apply -f goodbye.yaml

使用命令获取有关部署的基本信息:$ kubectl get deployments

代码语言:javascript
复制
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
goodbye   1/3     3            1           25s
hello     9/10    10           9           11m

如您所见,有再见的部署,但只有一个荚是可用的。尽管“再见”有更高的优先级,但“你好”仍然有效。

为什么是这样?:

$ kubectl describe node NAME_OF_THE_NODE

代码语言:javascript
复制
Non-terminated Pods:          (13 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     goodbye-575968c8d6-bnrjc                    0 (0%)        0 (0%)      6Gi (83%)        0 (0%)         15m
  default                     hello-fdfb55c96-6hkwp                       0 (0%)        0 (0%)      128Mi (1%)       0 (0%)         27m
  default                     hello-fdfb55c96-djrwf                       0 (0%)        0 (0%)      128Mi (1%)       0 (0%)         27m

看一看所要求的关于再见荚的记忆。正如上面所描述的,它是6Gi

代码语言:javascript
复制
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                250m (12%)    0 (0%)
  memory             7296Mi (98%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>

内存使用率接近100%。

获取有关处于Pending状态的特定告别pod的信息将产生更多信息,$ kubectl describe pod NAME_OF_THE_POD_IN_PENDING_STATE

代码语言:javascript
复制
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  38s (x3 over 53s)  default-scheduler  0/3 nodes are available: 1 Insufficient memory, 2 node(s) had taints that the pod didn't tolerate.

再见,豆荚没有被创造,因为没有足够的资源可以满足。但仍然有一些资源留给你好豆荚。

有可能出现这样的情况,即会杀死优先级较低的吊舱,并安排更高优先级的吊舱。

将所请求的告别pod内存更改为2304Mi。它将允许调度程序分配所有所需的豆荚(3):

代码语言:javascript
复制
        resources:
          requests:
            memory: "2304Mi"

您可以删除上一次部署,并在内存参数更改后应用新部署。

调用命令:$ kubectl get deployments

代码语言:javascript
复制
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
goodbye   3/3     3            3           5m59s
hello     3/10    10           3           48m

正如你所看到的,所有的告别舱都是可用的。

Hello被减少,以便为具有更高优先级(再见)的豆荚腾出空间。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59642272

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档