Kubernetes中的亲和性
现实中应用的运行对于kubernetes在亲和性上提出了一些要求,可以归类到以下几个方面:1.Pod固定调度到某些节点之上2.Pod不会调度到某些节点之上3.Pod的多副本调度到相同的节点之上4.Pod的多副本调度到不同的节点之上实践下面我们将通过例子的方式来说明在kubernetes需要去设置亲和性实现上面要求.Pod调动到某些节点上Pod的定义中通过no
现实中应用的运行对于kubernetes在亲和性上提出了一些要求,可以归类到以下几个方面:
1.Pod固定调度到某些节点之上
2.Pod不会调度到某些节点之上
3.Pod的多副本调度到相同的节点之上
4.Pod的多副本调度到不同的节点之上
实践
下面我们将通过例子的方式来说明在kubernetes需要去设置亲和性实现上面要求.
Pod调动到某些节点上
Pod的定义中通过nodeSelector指定label标签,pod将会只调度到具有该标签的node之上
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
这个例子中pod只会调度到具有disktype=ssd的node上面.
Affinity/anti-affinity node
Affinity/anti-affinity node 相对于nodeSelector机制更加的灵活和丰富
- 表达的语法:支持In,NotIn,Exists,DoesNotExist,Gt,Lt.
- 支持soft(preference)和hard(requirement),hard表示pod sheduler到某个node上,则必须满足亲和性设置.soft表示scheduler的时候,无法满足节点的时候,会选择非nodeSelector匹配的节点.
- nodeAffinity的基础上添加多个nodeSelectorTerms字段,调度的时候Node只需要nodeSelectorTerms中的某一个符合条件就符合nodeAffinity的规则.在nodeSelectorTerms中添加matchExpressions,需要可以调度的Node是满足matchExpressions中表示的所有规则.
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
Pod间的亲和性和反亲和性
基于已经运行在Node 上pod的labels来决定需要新创建的Pods是否可以调度到node节点上,配置的时候可以指定那个namespace中的pod需要满足pod的亲和性.可以通过topologyKey来指定topology domain, 可以指定为node/cloud provider zone/cloud provider region的范围.
- 表达的语法:支持In, NotIn, Exists, DoesNotExist
- Pod的亲和性和反亲和性可以分成
requiredDuringSchedulingIgnoredDuringExecution #硬要求
preferredDuringSchedulingIgnoredDuringExecution #软要求
类似上面node的亲和策略类似,requiredDuringSchedulingIgnoredDuringExecution亲和性可以用于约束不同服务的pod在同一个topology domain的Nod上.preferredDuringSchedulingIgnoredDuringExecution反亲和性可以将服务的pod分散到不同的topology domain的Node上. - topologyKey可以设置成如下几种类型
kubernetes.io/hostname #Node
failure-domain.beta.kubernetes.io/zone #Zone
failure-domain.beta.kubernetes.io/region #Region
可以设置node上的label的值来表示node的name,zone,region等信息,pod的规则中指定topologykey的值表示指定topology范围内的node上运行的pod满足指定规则
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: failure-domain.beta.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
利用社区官方的例子来进一步的说明,例子中指定了pod的亲和性和反亲和性,
preferredDuringSchedulingIgnoredDuringExecution指定的规则是pod将会调度到的node尽量会满足如下条件:
- node上具有failure-domain.beta.kubernetes.io/zone,并且具有相同failure-domain.beta.kubernetes.io/zone的值的node上运行有一个pod,它符合label为securtity=S1. preferredDuringSchedulingIgnoredDuringExecution规则表示将不会调度到node上运行有security=S2的pod.如果这里我们将topologyKey=failure-domain.beta.kubernetes.io/zone,那么pod将不会调度到node满足的条件是:node上具有failure-domain.beta.kubernetes.io/zone相同的value,并且这些相同zone下的node上运行有security=S2的pod.
Notice:对于topologyKey字段具有如下约束
1.对于亲和性以及RequiredDuringScheduling的反亲和性,topologyKey需要指定
2.对于RequiredDuringScheduling的反亲和性,LimitPodHardAntiAffinityTopology的准入控制限制topologyKey为kubernetes.io/hostname,可以通过修改或者disable解除该约束
3.对于PreferredDuringScheduling的反亲和性,空的topologyKey表示kubernetes.io/hostname, failure-domain.beta.kubernetes.io/zone and failure-domain.beta.kubernetes.io/region的组合.
4. topologyKey在遵循其他约束的基础上可以设置成其他的key.
规则中可以指定匹配pod所在namespace,如果定义了但是为空,它表示所有namespace范围内的pod.
常用的场景
一些更加常用的场景见例子所示
例子一
apiVersion: apps/v1beta1 # for versions before 1.6.0 use extensions/v1beta1
kind: Deployment
metadata:
name: redis-cache
spec:
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: redis-server
image: redis:3.2-alpine
创建了一个Deployment,副本数为3,指定了反亲和规则如上所示,pod的label为app:store,那么pod调度的时候将不会调度到node上已经运行了label为app:store的pod了,这样就会使得Deployment的三副本分别部署在不同的host的node上.
例子二
apiVersion: apps/v1beta1 # for versions before 1.6.0 use extensions/v1beta1
kind: Deployment
metadata:
name: web-server
spec:
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.12-alpine
在一个例子中基础之上,要求pod的亲和性满足requiredDuringSchedulingIgnoredDuringExecution中topologyKey=”kubernetes.io/hostname”,并且node上需要运行有app=store的label.
运行完例子一,例子二,那么pod的分布如下所示
$kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
redis-cache-1450370735-6dzlj 1/1 Running 0 8m 10.192.4.2 kube-node-3
redis-cache-1450370735-j2j96 1/1 Running 0 8m 10.192.2.2 kube-node-1
redis-cache-1450370735-z73mh 1/1 Running 0 8m 10.192.3.1 kube-node-2
web-server-1287567482-5d4dz 1/1 Running 0 7m 10.192.2.3 kube-node-1
web-server-1287567482-6f7v5 1/1 Running 0 7m 10.192.4.3 kube-node-3
web-server-1287567482-s330j 1/1 Running 0 7m 10.192.3.2 kube-node-2
例子三
apiVersion: apps/v1beta1 # for versions before 1.6.0 use extensions/v1beta1
kind: Deployment
metadata:
name: web-server
spec:
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: hub.easystack.io/library/nginx:1.9.0
在一些应用中,pod副本之间需要共享cache,需要将pod运行在一个节点之上
web-server-77bfb4575f-bhxvg 1/1 Running 0 11s 10.233.66.79 hzc-slave2 app=web-store,pod-template-hash=3369601319
web-server-77bfb4575f-mkfd9 1/1 Running 0 11s 10.233.66.80 hzc-slave2 app=web-store,pod-template-hash=3369601319
web-server-77bfb4575f-wgjq6 1/1 Running 0 11s 10.233.66.78 hzc-slave2 app=web-store,pod-template-hash=3369601319
Link:
https://github.com/davidkbainbridge/demo-affinity
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity-beta-feature
https://medium.com/kokster/scheduling-in-kubernetes-part-2-pod-affinity-c2b217312ae1
更多推荐
所有评论(0)