AWS具有每个节点的Pod IP限制，pod处于ContainerCreating状态

众所周知，AWS对每个节点都具有pod IP限制，而kubernetes在调度时并不关心这一点，在无法分配pod IP的节点上调度pod，并且pod停留在ContainerCreating状态，如下所示：

Normal   Scheduled               114s                 default-scheduler                        Successfully assigned default/whoami-deployment-9f9c86c4f-r4flx to ip-192-168-15-248.ec2.internal
Warning  FailedCreatepodSandBox  111s                 kubelet,ip-192-168-15-248.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8d4b5f98f9b600ad9ec486f994fa2f9223d5224842df7f78802616f014b52970" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container
Normal   SandboxChanged          86s (x12 over 109s)  kubelet,ip-192-168-15-248.ec2.internal  pod sandbox changed,it will be killed and re-created.
Warning  FailedCreatepodSandBox  61s (x4 over 76s)    kubelet,ip-192-168-15-248.ec2.internal  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e2a3c54ba7d9a33a45248f7c276f4a2d5b0c8ba6c3deb5184392156b35638553" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container

因此，我尝试通过使用key = value：NoSchedule污染节点来解决该问题，以便默认调度程序不会将pod调度到已达到pod IP限制的节点，并删除所有停留在ContainerCreating状态的pod。我希望这将使调度程序不将更多的pod调度到受污染的节点，这就是发生的事情，但是，由于pod没有被调度，所以我也希望集群自动缩放器可以扩展ASG，而我的pod可以在新节点上运行，这就是没有发生什么。

当我描述pod时，我会看到：

Warning FailedScheduling 40s (x5 over 58s) default-scheduler 0/5 nodes are available: 5 node(s) had taints that the pod didn't tolerate.

Normal NotTriggerScaleUp 5s (x6 over 56s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate

当我查看集群自动缩放器日志时，我看到：

I1108 16:30:47.521026 1 event.go:209] Event(v1.ObjectReference{Kind:"pod",Namespace:"default",Name:"whoami-deployment-9f9c86c4f-x5h4d",UID:"158cc806-0245-11ea-a67a-0efb4254edc4",APIVersion:"v1",ResourceVersion:"2483839",FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate

现在，我尝试了一种替代方法，通过删除上述NoSchedule污点并通过以下方式修补节点，将节点标记为不可调度：

kubectl patch nodes node1.internal -p '{"spec": {"unschedulable": true}}'

这是我在cluster-autoscaler中看到的日志：

I1109 10:47:50.894680       1 static_autoscaler.go:138] Starting main loop
W1109 10:47:50.894719       1 static_autoscaler.go:562] Cluster has no ready nodes.
I1109 10:47:50.901157       1 event.go:209] Event(v1.ObjectReference{Kind:"ConfigMap",Namespace:"kube-system",Name:"cluster-autoscaler-status",UID:"7c949105-0153-11ea-9a39-12e5fc698b6e",ResourceVersion:"2629645",FieldPath:""}): type: 'Warning' reason: 'ClusterUnhealthy' Cluster has no ready nodes.

因此，我克服这个问题的想法没有任何意义。我该如何克服？

Kubernetes版本：1.14 群集自动缩放器：1.14.6

让我知道你们是否需要更多细节。

AWS具有每个节点的Pod IP限制，pod处于ContainerCreating状态

dtj3516702 回答：AWS具有每个节点的Pod IP限制，pod处于ContainerCreating状态

大家都在问