Waiting for services and endpoints to be initialized from apiserver...skydns: failure to forward request "read udp 10.240.0.18:47848->168.63.129.16:53: i/o timeout"Timeout waiting for initialization
等待一段时间,如果还未恢复,则检查 Node 网络是否正确配置,比如是否可以正常请求上游DNS服务器、是否开启了 IP 转发(包括 Node 内部和公有云上虚拟网卡等)、是否有安全组禁止了 DNS 请求等
如果错误日志中不是转发 DNS 请求超时,而是访问 kube-apiserver 超时,比如
E0122 06:56:04.774977 1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeoutI0122 06:56:04.775358 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...E0122 06:56:04.775574 1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeoutI0122 06:56:05.275295 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...I0122 06:56:05.775182 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...I0122 06:56:06.275288 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
这说明 Pod 网络(一般是多主机之间)访问异常,包括 Pod->Node、Node->Pod 以及 Node-Node 等之间的往来通信异常。可能的原因比较多,具体的排错方法可以参考网络异常排错指南。
Kubelet: failed to initialize top level QOS containers
重启 kubelet 时报错 Failed to start ContainerManager failed to initialise top level QOS containers(参考 #43856),临时解决方法是:
$ kubectl describe node node1Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedNodeAllocatableEnforcement 2m (x1001 over 16h) kubelet, aks-agentpool-22604214-0 Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 7285047296 to memory.limit_in_bytes: write /var/lib/docker/overlay2/5650a1aadf9c758946073fefa1558446ab582148ddd3ee7e7cb9d269fab20f72/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument
Kubernetes nodes can be scheduled to Capacity. Pods can consume all the available capacity on a node by default. This is an issue because nodes typically run quite a few system daemons that power the OS and Kubernetes itself. Unless resources are set aside for these system daemons, pods and system daemons compete for resources and lead to resource starvation issues on the node.
The kubelet exposes a feature named Node Allocatable that helps to reserve compute resources for system daemons. Kubernetes recommends cluster administrators to configure Node Allocatable based on their workload density on each node.
kube-proxy[2241]: E0502 15:55:13.889842 2241 conntrack.go:42] conntrack returned error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
$ kubectl describe hpa php-apacheName: php-apacheNamespace: defaultLabels: <none>Annotations: <none>CreationTimestamp: Wed, 27 Dec 2017 14:36:38 +0800Reference: Deployment/php-apacheMetrics: ( current / target ) resource cpu on pods (as a percentage of request): <unknown> / 50%Min replicas: 1Max replicas: 10Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedGetResourceMetric 3m (x2231 over 18h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)