Kubeadm安装高可用Kubernetes1.30.1-6
- Kubernetes
- 2024-05-20
- 1101热度
- 2评论
Kubeadm安装高可用Kubernetes
高可用Kubernetes集群规划
//注意:宿主机网段、K8S Service网段、Pod网段不能重复
主机名 | IP地址 | 角色 | 版本 |
---|---|---|---|
master-01 | 192.168.132.169 | master节点-1 | Centos7.9 |
master-02 | 192.168.132.170 | master节点-2 | Centos7.9 |
master-03 | 192.168.132.171 | master节点-3 | Centos7.9 |
master-lb | 192.168.132.236 | VIP | KeepAlived+HAProxy |
node-01 | 192.168.132.172 | node节点-1 | Centos7.9 |
node-02 | 192.168.132.173 | node节点-2 | Centos7.9 |
软件和其他网段规划
软件 | 版本 | 网段 |
---|---|---|
Containerd | 20.10.x | |
Kubeadm | 1.30.* | |
Pod | 172.16.0.0/16 | |
Service | 10.96.0.0/16 |
安装过程(5个结点都需要操作的,仅以master-01结点操作,读者需要自行在其余主机内执行相同的命令)
初始化操作
所有节点修改主机名(其他节点按照规划表进行修改)
[root@localhost ~]# hostnamectl set-hostname master-01
[root@localhost ~]# bash
[root@master-01 ~]#
master-01节点修改/etc/hosts文件
[root@master-01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.132.169 master-01
192.168.132.170 master-02
192.168.132.171 master-03
# 如果不是高可用集群,该IP为Master01的IP
192.168.132.236 master-lb
192.168.132.172 node-01
192.168.132.173 node-02
master-01执行ping命令,检查与其他4台主机的连通性,由于192.168.132.236是VIP地址,所以当前地址不可达
[root@master-01 ~]# for i in {master-01,master-02,master-03,master-lb,node-01,node-02}; do ping $i -c 1; done
PING master-01 (192.168.132.169) 56(84) bytes of data.
64 bytes from master-01 (192.168.132.169): icmp_seq=1 ttl=64 time=0.040 ms
--- master-01 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.040/0.040/0.040/0.000 ms
PING master-02 (192.168.132.170) 56(84) bytes of data.
64 bytes from master-02 (192.168.132.170): icmp_seq=1 ttl=64 time=0.651 ms
--- master-02 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.651/0.651/0.651/0.000 ms
PING master-03 (192.168.132.171) 56(84) bytes of data.
64 bytes from master-03 (192.168.132.171): icmp_seq=1 ttl=64 time=0.460 ms
--- master-03 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.460/0.460/0.460/0.000 ms
PING master-lb (192.168.132.236) 56(84) bytes of data.
From master-01 (192.168.132.169) icmp_seq=1 Destination Host Unreachable
--- master-lb ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
PING node-01 (192.168.132.172) 56(84) bytes of data.
64 bytes from node-01 (192.168.132.172): icmp_seq=1 ttl=64 time=0.453 ms
--- node-01 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.453/0.453/0.453/0.000 ms
PING node-02 (192.168.132.173) 56(84) bytes of data.
64 bytes from node-02 (192.168.132.173): icmp_seq=1 ttl=64 time=0.842 ms
--- node-02 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.842/0.842/0.842/0.000 ms
master-01节点配置对其他主机的免密登录
[root@master-01 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:y8j+HKRG7GQCAZWmdHpqM5rKK4FKbvTotHzyuR5+eXY root@master-01
The key's randomart image is:
+---[RSA 2048]----+
|.oo. |
| .oo |
|.o+ |
|.. o . |
|. o . = S |
|oB B = . |
|B+=. B + |
|O=ooo= + E |
|****o +.+ |
+----[SHA256]-----+
[root@master-01 ~]# for i in {master-01,master-02,master-03,node-01,node-02}; do ssh-copy-id $i; done
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'master-01 (192.168.132.169)' can't be established.
ECDSA key fingerprint is SHA256:3jDHkD+/2lQF89uZBEBLWtHSQLcv34cdY/oRhyo507I.
ECDSA key fingerprint is MD5:ca:ec:2f:b0:c6:65:d9:50:11:f6:b1:38:84:64:f1:c4.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@master-01's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'master-01'"
and check to make sure that only the key(s) you wanted were added.
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'master-02 (192.168.132.170)' can't be established.
ECDSA key fingerprint is SHA256:3jDHkD+/2lQF89uZBEBLWtHSQLcv34cdY/oRhyo507I.
ECDSA key fingerprint is MD5:ca:ec:2f:b0:c6:65:d9:50:11:f6:b1:38:84:64:f1:c4.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@master-02's password:
Number of key(s) added: 1
...省略部分输出...
master-01节点通过scp命令将/etc/hosts文件分发到其他4台主机上
[root@master-01 ~]# for i in {master-01,master-02,master-03,node-01,node-02}; do scp /etc/hosts root@$i:/etc/hosts; done
hosts 100% 364 506.1KB/s 00:00
hosts 100% 364 150.4KB/s 00:00
hosts 100% 364 355.4KB/s 00:00
hosts 100% 364 397.2KB/s 00:00
hosts
所有节点配置仓库源
[root@master-01 ~]# curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
[root@master-01 ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@master-01 ~]# yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
[root@master-01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.30/rpm/repodata/repomd.xml.key
EOF
[root@master-01 ~]# sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
所有节点安装初始化工具
[root@master-01 ~]# yum install wget jq psmisc vim net-tools telnet git bash-completion -y
[root@master-01 ~]# bash
所有节点修改IP地址为静态获取,修改网卡配置文件,删除UUID选项,对应的设备接口、网关地址、
DNS地址需要读者根据实际环境进行修改
[root@master-01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.132.169
NETMASK=255.255.255.0
#PREFIX=24
GATEWAY=192.168.132.2
DNS1=192.168.132.2
所有节点关闭NetworkManager服务,重启所有节点的网络,查看IP地址是否成功获取,查看主机是否能访问互联网,如果检查配置无误,发现无法获取IP地址或者无法访问互联网,检查IP地址是否被占用,网关和域名解析是否正确
[root@master-01 ~]# systemctl stop NetworkManager
[root@master-01 ~]# systemctl disable NetworkMange
[root@master-01 ~]# ip address show | grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.132.169/24 brd 192.168.132.255 scope global ens33
[root@master-01 ~]# ping blog.caijxlinux.work
PING blog.caijxlinux.work (8.138.107.10) 56(84) bytes of data.
64 bytes from blog.caijxlinux.work (8.138.107.10): icmp_seq=1 ttl=128 time=45.7 ms
--- blog.caijxlinux.work ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 45.753/45.753/45.753/0.000 ms
所有节点关闭防火墙、Selinux和Swap
[root@master-01 ~]# systemctl stop firewalld.service
[root@master-01 ~]# systemctl disable firewalld.service
[root@master-01 ~]# setenforce 0
[root@master-01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux
[root@master-01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
[root@master-01 ~]# swapoff -a && sysctl -w vm.swappiness=0
vm.swappiness = 0
[root@master-01 ~]# sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
所有节点安装时间同步服务,修改系统时区,使用阿里云的时间服务器进行时间同步,并写入定时任务
[root@master-01 ~]# rpm -ivh http://mirrors.wlnmp.com/centos/wlnmp-release-centos.noarch.rpm
[root@master-01 ~]# yum install ntpdate -y
[root@master-01 ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@master-01 ~]# echo 'Asia/Shanghai' >/etc/timezone
[root@master-01 ~]# ntpdate time2.aliyun.com
19 May 00:51:05 ntpdate[3180]: adjust time server 203.107.6.88 offset 0.008514 sec
[root@master-01 ~]# crontab -e
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com
[root@master-01 ~]# date
Sun May 19 00:55:22 CST 2024
调整系统的文件描述符限制
[root@master-01 ~]# ulimit -SHn 65535
[root@master-01 ~]# vim /etc/security/limits.conf
#ftp hard nproc 0
#@student - maxlogins 4
# End of file
#在配置文件的末尾加入
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
所有节点软件升级
[root@master-01 ~]# yum update -y --exclude=kernel*
master-01节点下载K8S所需源码文件(读者需要注意下载的目录)
[root@master-01 ~]# git clone https://gitee.com/dukuan/k8s-ha-install.git
Cloning into 'k8s-ha-install'...
remote: Enumerating objects: 907, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (21/21), done.
...省略部分输出...
master-01节点下载4.19内核版本,读者可以通过链接进行下载,也可以联系编者提供压缩包
[root@master-01 ~]# wget http://193.49.22.109/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm
[root@master-01 ~]# wget http://193.49.22.109/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm
[root@master-01 ~]# ls
anaconda-ks.cfg kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm
master-01节点分发内核文件到其他4个节点
[root@master-01 ~]# for i in {master-02,master-03,node-01,node-02} ; do scp kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm root@$i:/root/ ; done
kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm 100% 46MB 61.0MB/s 00:00
kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm 100% 12MB 55.0MB/s 00:00
...省略部分输出...
所有节点安装新内核
[root@master-01 ~]# yum localinstall -y kernel-ml*
所有节点更改内核启动顺序
[root@master-01 ~]# grub2-set-default 0 && grub2-mkconfig -o /etc/grub2.cfg
[root@master-01 ~]# grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
所有节点检查内核版本是否为4.19
[root@master-01 ~]# grubby --default-kernel
/boot/vmlinuz-4.19.12-1.el7.elrepo.x86_64
所有节点安装ipvsadm、ipset等管理工具,并添加ipvs模块
[root@master-01 ~]# yum -y install ipvsadm ipset sysstat conntrack libseccomp
[root@master-01 ~]# modprobe -- ip_vs
[root@master-01 ~]# modprobe -- ip_vs_rr
[root@master-01 ~]# modprobe -- ip_vs_wrr
[root@master-01 ~]# modprobe -- ip_vs_sh
[root@master-01 ~]# modprobe -- nf_conntrack
[root@master-01 ~]# cat /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
所有节点重新加载内核模块
[root@master-01 ~]# systemctl enable --now systemd-modules-load.service
所有节点配置K8S内核所需参数,并立即重新加载内核参数
[root@master-01 ~]# cat <<EOF > /etc/sysctl.d/k8s.conf
> net.ipv4.ip_forward = 1
> net.bridge.bridge-nf-call-iptables = 1
> net.bridge.bridge-nf-call-ip6tables = 1
> fs.may_detach_mounts = 1
> net.ipv4.conf.all.route_localnet = 1
> vm.overcommit_memory=1
> vm.panic_on_oom=0
> fs.inotify.max_user_watches=89100
> fs.file-max=52706963
> fs.nr_open=52706963
> net.netfilter.nf_conntrack_max=2310720
> net.ipv4.tcp_keepalive_time = 600
> net.ipv4.tcp_keepalive_probes = 3
> net.ipv4.tcp_keepalive_intvl =15
> net.ipv4.tcp_max_tw_buckets = 36000
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_max_orphans = 327680
> net.ipv4.tcp_orphan_retries = 3
> net.ipv4.tcp_syncookies = 1
> net.ipv4.tcp_max_syn_backlog = 16384
> net.ipv4.ip_conntrack_max = 65536
> net.ipv4.tcp_max_syn_backlog = 16384
> net.ipv4.tcp_timestamps = 0
> net.core.somaxconn = 16384
> EOF
[root@master-01 ~]# sysctl --system
* Applying /usr/lib/sysctl.d/00-system.conf ...
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
所有节点配置完成后,重启节点,确保内核参数正确(虚拟机环境建议添加快照)
[root@master-01 ~]# reboot
[root@master-01 ~]# lsmod | grep --color=auto -e ip_vs -e nf_conntrack
ip_vs_ftp 16384 0
nf_nat 32768 1 ip_vs_ftp
ip_vs_sed 16384 0
ip_vs_nq 16384 0
ip_vs_fo 16384 0
...省略部分输出...
[root@master-01 ~]# uname -a
Linux master-01 4.19.12-1.el7.elrepo.x86_64 #1 SMP Fri Dec 21 11:06:36 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@master-01 ~]# getenforce
Disabled
高可用配置
所有master节点安装HAProxy和KeepAlived,为高可用配置做准备
[root@master-01 ~]# yum -y install keepalived haproxy
所有master节点配置HAProxy服务,清空/etc/haproxy/haproxy.cfg文件内旧内容,添加如下参数(详细参数可以网上查询,此处不进行展开,有疑问可交流)
[root@master-01 ~]# vim /etc/haproxy/haproxy.cfg
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s
defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s
frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor
frontend k8s-master
bind 0.0.0.0:16443
bind 127.0.0.1:16443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-master
backend k8s-master
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server master-01 192.168.132.169:6443 check
server master-02 192.168.132.170:6443 check
server master-03 192.168.132.171:6443 check
所有master节点配置KeepAlived服务,清空/etc/keepalived/keepalived.conf文件内旧内容,添加如下参数,三个master节点的IP和网卡接口名称需要读者根据实际情况进行修改
[root@master-01 ~]# vim /etc/keepalived/keepalived.conf
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface ens33
mcast_src_ip 192.168.132.169
virtual_router_id 51
priority 101
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.132.236
}
track_script {
chk_apiserver
}
}
[root@master-02 ~]# vim /etc/keepalived/keepalived.conf
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
mcast_src_ip 192.168.132.170
virtual_router_id 51
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.132.236
}
track_script {
chk_apiserver
}
}
[root@master-03 ~]# vim /etc/keepalived/keepalived.conf
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
mcast_src_ip 192.168.132.171
virtual_router_id 51
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.132.236
}
track_script {
chk_apiserver
}
}
所有master节点配置健康检查文件,并为脚本添加执行权限
[root@master-01 ~]# vim /etc/keepalived/check_apiserver.sh
#!/bin/bash
err=0
for k in $(seq 1 3)
do
check_code=$(pgrep haproxy)
if [[ $check_code == "" ]]; then
err=$(expr $err + 1)
sleep 1
continue
else
err=0
break
fi
done
if [[ $err != "0" ]]; then
echo "systemctl stop keepalived"
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
[root@master-01 ~]# chmod +x /etc/keepalived/check_apiserver.sh
所有master节点启动Haproxy和KeepAlived服务
[root@master-01 ~]# systemctl daemon-reload
[root@master-01 ~]# systemctl enable --now haproxy.service
Created symlink from /etc/systemd/system/multi-user.target.wants/haproxy.service to /usr/lib/systemd/system/haproxy.service.
[root@master-01 ~]# systemctl enable --now keepalived.service
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
master-01节点查看VIP是否已经绑定
[root@master-01 ~]# ip address show | grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.132.169/24 brd 192.168.132.255 scope global ens33
inet 192.168.132.236/32 scope global ens33
所有节点使用ping命令和telnet命令测试与VIP的连通性,一定要确保能够正常连通,如果出现问题,排查思路:检查防火墙、Selinux是否已经关闭,Haproxy、KeepAlived配置文件是否正确,服务是否正常启动,物理机的IP地址和网段是否无误,使用云环境进行部署的,需要查看虚拟网络是否已经打通
K8S组件和Runtime安装
所有节点添加阿里云docker源并更新缓存(有时候更新的方法可能会改变,具体参照阿里云官方文档,https://developer.aliyun.com/mirror/docker-ce/?spm=a2c6h.25603864.0.0.5bef61d5dsLmOy)
[root@master-01 ~]# yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
[root@master-01 ~]# sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
[root@master-01 ~]# yum makecache fast
所有节点安装Containerd
[root@master-01 ~]# yum -y install docker-ce-23.* docker-ce-cli-23.*
所有节点配置Containerd所需模块,并重新加载模块
[root@master-01 ~]# cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
> overlay
> br_netfilter
> EOF
overlay
br_netfilter
[root@master-01 ~]# modprobe -- overlay
[root@master-01 ~]# modprobe -- br_netfilter
[root@master-01 ~]# cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
> net.bridge.bridge-nf-call-iptables = 1
> net.ipv4.ip_forward = 1
> net.bridge.bridge-nf-call-ip6tables = 1
> EOF
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
[root@master-01 ~]# sysctl --system
* Applying /usr/lib/sysctl.d/00-system.conf ...
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
* Applying /usr/lib/sysctl.d/50-default.conf ...
...省略部分输出...
所有节点生成Containerd默认配置文件,修改配置文件/etc/containerd/config.toml的SystemdCgroup字段(令Cgroup方式为Systemd)
[root@master-01 ~]# containerd config default | tee /etc/containerd/config.toml
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
...省略部分输出...
[root@master-01 ~]# vim /etc/containerd/config.toml | grep SystemdCgroup
SystemdCgroup = true
[root@master-01 ~]# cat /etc/containerd/config.toml | grep SystemdCgroup -A 5 -B 5
...省略部分输出...
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
...省略部分输出...
所有节点修改配置文件/etc/containerd/config.toml的sandbox_image字段,修改pause镜像拉取地址
[root@master-01 ~]# cat /etc/containerd/config.toml | grep -A 2 -B 2 sandbox_image
...省略部分输出...
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"
selinux_category_range = 1024
stats_collect_period = 10
...省略部分输出...
所有节点启动Containerd服务,配置服务开机自启并查看服务状态(部分节点会提示关于CNI的错误,无需理会,网络插件安装完成后,会解决错误)
[root@master-01 ~]# systemctl daemon-reload
[root@master-01 ~]# systemctl enable --now containerd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/containerd.service to /usr/lib/systemd/system/containerd.service.
[root@master-01 ~]# systemctl status containerd.service
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2024-05-19 20:49:16 CST; 5s ago
Docs: https://containerd.io
Process: 12753 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 12757 (containerd)
Tasks: 9
Memory: 16.1M
CGroup: /system.slice/containerd.service
└─12757 /usr/bin/containerd
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.958294487+08:00" level=info msg=serv...trpc
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.958855971+08:00" level=info msg=serv...sock
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.966456119+08:00" level=info msg="Sta...ent"
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.966664799+08:00" level=info msg="Sta...ate"
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.966778189+08:00" level=info msg="Sta...tor"
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.966835873+08:00" level=info msg="Sta...cer"
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.966859864+08:00" level=info msg="Sta...ult"
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.966879131+08:00" level=info msg="Sta...ver"
May 19 20:49:16 master-01 containerd[12757]: time="2024-05-19T20:49:16.967603120+08:00" level=info msg="con...25s"
May 19 20:49:16 master-01 systemd[1]: Started containerd container runtime.
Hint: Some lines were ellipsized, use -l to show in full.
所有节点配置crictl客户端连接的运行时位置
[root@master-01 ~]# cat > /etc/crictl.yaml <<EOF
> runtime-endpoint: unix:///run/containerd/containerd.sock
> image-endpoint: unix:///run/containerd/containerd.sock
> timeout: 10
> debug: false
> EOF
master-01节点查看K8S版本
[root@master-01 ~]# yum list kubeadm.x86_64 --showduplicates | sort -r
Loading mirror speeds from cached hostfile
Loaded plugins: fastestmirror
kubeadm.x86_64 1.30.1-150500.1.1 kubernetes
kubeadm.x86_64 1.30.0-150500.1.1 kubernetes
Available Packages
所有节点安装提示版本的K8S组件(Kubeadm、Kubelet和Kubectl)
kubeadm
:用于初始化和配置 Kubernetes 集群。kubelet
:运行在每个节点上,负责管理节点上的容器。kubectl
:命令行工具,用于与 Kubernetes 集群进行交互和管理资源。
[root@master-01 ~]# yum -y install kubeadm-1.30* kubelet-1.30* kubectl-1.30*
所有节点设置Kubelet开机自启(Kubelet服务当前状态为auto-restart,这是由于还未初始化,没有kubelet的配置文件,此时kubelet无法启动,无需理会)
[root@master-01 ~]# systemctl daemon-reload
[root@master-01 ~]# systemctl enable --now kubelet.service
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.
集群初始化操作,master-01节点创建kubeadm-config.yaml文件,并写入如下内容(如果不是高可用集群,192.168.132.236:16443改为master-01的地址,16443改为apiserver的端口,默认是6443,注意更改kubernetesVersion的值和自己服务器kubeadm的版本一致:执行命令kubeadm version可以查看)
[root@master-01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.1", GitCommit:"6911225c3f747e1cd9d109c305436d08b668f086", GitTreeState:"clean", BuildDate:"2024-05-14T10:49:05Z", GoVersion:"go1.22.2", Compiler:"gc", Platform:"linux/amd64"}
[root@master-01 ~]# vim kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: 7t2weq.bjbawausm0jaxury
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.132.169
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
name: master-01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
---
apiServer:
certSANs:
- 192.168.132.236
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.132.236:16443
controllerManager: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.30.1 # 更改此处的版本号和kubeadm version一致
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/16
serviceSubnet: 10.96.0.0/16
scheduler: {}
master-01节点更新Kubeadm文件,并将更新后的new.yaml文件分发到其他master节点
[root@master-01 ~]# kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml
[root@master-01 ~]# for i in {master-02,master-03}; do scp new.yaml root@$i:/root/ ; done
new.yaml 100% 994 459.1KB/s 00:00
new.yaml
所有master节点拉取配置文件内的镜像(由于网络的原因,可能会拉取失败,解决方法:修改/etc/resolv.conf文件,添加阿里云的DNS服务器,地址为223.5.5.5或223.6.6.6),将错误的拉取和正确的拉取状态贴出,给读者参考
[root@master-01 ~]# kubeadm config images pull --config /root/new.yaml
failed to pull image "registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.30.1": output: E0519 21:14:33.568256 14988 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.30.1\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://registry.cn-hangzhou.aliyuncs.com/v2/google_containers/kube-apiserver/manifests/sha256:fd55381fb07b1fbef20b58d2ad814510f01e3a204118b6a5a4695275dca19677\": dial tcp: lookup registry.cn-hangzhou.aliyuncs.com on 192.168.132.2:53: no such host" image="registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.30.1"
time="2024-05-19T21:14:33+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.30.1\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://registry.cn-hangzhou.aliyuncs.com/v2/google_containers/kube-apiserver/manifests/sha256:fd55381fb07b1fbef20b58d2ad814510f01e3a204118b6a5a4695275dca19677\": dial tcp: lookup registry.cn-hangzhou.aliyuncs.com on 192.168.132.2:53: no such host"
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher
[root@master-01 ~]# vim /etc/resolv.conf
nameserver 223.5.5.5
nameserver 192.168.132.2
[root@master-01 ~]# kubeadm config images pull --config /root/new.yaml
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.30.1
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.30.1
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.30.1
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.30.1
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.11.1
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.12-0
master-01节点进行初始化操作,完成后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入master-01节点即可
[root@master-01 ~]# kubeadm init --config /root/new.yaml --upload-certs
[init] Using Kubernetes version: v1.30.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
...省略部分输出...
master-01节点初始化成功以后,会产生Token值,用于其他节点加入时使用,因此要记录下初始化成功生成的token值(令牌值)
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.132.236:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:4fb824ea4f1a1707b3db3591c32b34a5bfa914275a44e94a098cac2223c6732c \
--control-plane --certificate-key c11a6e45ea484364fa0e34e9e25c361493ac45d2e05c210bb69b160c65c4aafb
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.132.236:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:4fb824ea4f1a1707b3db3591c32b34a5bfa914275a44e94a098cac2223c6732c
master-01节点配置环境变量,用于访问K8S集群
[root@master-01 ~]# cat <<EOF >> /root/.bashrc
> export KUBECONFIG=/etc/kubernetes/admin.conf
> EOF
[root@master-01 ~]# source /root/.bashrc
master-01节点查看节点状态,此时只有一个节点,并且状态为NotReady,这是由于还没进行其他的配置,读者可以无需理会
[root@master-01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master-01 NotReady control-plane 3m57s v1.30.1
master-01节点查看Pod状态,由于使用Kubeadm方式安装,所有的系统组件均以容器的方式运行并且在kube-system命名空间内
[root@master-01 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7c445c467-l8tm9 0/1 Pending 0 8m9s
coredns-7c445c467-nhvf4 0/1 Pending 0 8m9s
etcd-master-01 1/1 Running 0 8m12s
kube-apiserver-master-01 1/1 Running 0 8m12s
kube-controller-manager-master-01 1/1 Running 0 8m12s
kube-proxy-6m9k7 1/1 Running 0 8m9s
kube-scheduler-master-01 1/1 Running 0 8m12s
如果初始化失败,重置后再次初始化,命令如下(没有失败不要执行)
[root@master-01 ~]# kubeadm reset -f ; ipvsadm --clear ; rm -rf ~/.kube
如果多次尝试都是初始化失败,需要看系统日志,CentOS日志路径:/var/log/messages,Ubuntu日志路径:/var/log/syslog,排查思路:CNI错误是正常的,因为没配置网络插件,其次是new.yaml文件,可能是非高可用集群没有修改16443端口为6443,或者是三个网段出现IP冲突,最后检查Containerd配置文件是否正确,VIP是否正常绑定,日志会有对应的提示
[root@master-01 ~]# tail -f /var/log/messages | grep -v "not found"
master-02和master-03节点执行加入集群的命令
[root@master-02 ~]# kubeadm join 192.168.132.236:16443 --token 7t2weq.bjbawausm0jaxury \
> --discovery-token-ca-cert-hash sha256:4fb824ea4f1a1707b3db3591c32b34a5bfa914275a44e94a098cac2223c6732c \
> --control-plane --certificate-key c11a6e45ea484364fa0e34e9e25c361493ac45d2e05c210bb69b160c65c4aafb
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
...省略部分输出...
[root@master-03 ~]# kubeadm join 192.168.132.236:16443 --token 7t2weq.bjbawausm0jaxury \
> --discovery-token-ca-cert-hash sha256:4fb824ea4f1a1707b3db3591c32b34a5bfa914275a44e94a098cac2223c6732c \
> --control-plane --certificate-key c11a6e45ea484364fa0e34e9e25c361493ac45d2e05c210bb69b160c65c4aafb
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
...省略部分输出...
master-01节点查看当前节点状态
[root@master-01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master-01 NotReady control-plane 17m v1.30.1
master-02 NotReady control-plane 3m9s v1.30.1
master-03 NotReady control-plane 93s v1.30.1
如果Token过期,后续还有其他的master节点需要加入集群,可以重新生成Token,master-01节点执行以下命令
[root@master-01 ~]# kubeadm token create --print-join-command
[root@master-01 ~]# kubeadm init phase upload-certs --upload-certs
node-01和node-02节点加入集群
[root@node-01 ~]# kubeadm join 192.168.132.236:16443 --token 7t2weq.bjbawausm0jaxury \
> --discovery-token-ca-cert-hash sha256:4fb824ea4f1a1707b3db3591c32b34a5bfa914275a44e94a098cac2223c6732c
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
...省略部分输出...
[root@node-02 ~]# kubeadm join 192.168.132.236:16443 --token 7t2weq.bjbawausm0jaxury \
> --discovery-token-ca-cert-hash sha256:4fb824ea4f1a1707b3db3591c32b34a5bfa914275a44e94a098cac2223c6732c
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
...省略部分输出...
master-01节点查看当前节点状态
[root@master-01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master-01 NotReady control-plane 20m v1.30.1
master-02 NotReady control-plane 5m37s v1.30.1
master-03 NotReady control-plane 4m1s v1.30.1
node-01 NotReady <none> 30s v1.30.1
node-02 NotReady <none> 26s v1.30.1
网络插件配置
master-01节点切换目录和分支,准备配置Calico插件
[root@master-01 ~]# cd k8s-ha-install/
[root@master-01 k8s-ha-install]# git checkout manual-installation-v1.30.x
Branch manual-installation-v1.30.x set up to track remote branch manual-installation-v1.30.x from origin.
Switched to a new branch 'manual-installation-v1.30.x'
[root@master-01 k8s-ha-install]# cd calico/
[root@master-01 calico]#
master-01节点获取Pod网段具体信息
[root@master-01 calico]# POD_SUBNET=`cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`
You have new mail in /var/spool/mail/root
[root@master-01 calico]# echo $POD_SUBNET
172.16.0.0/16
master-01节点替换calico.yaml文件下的Pod网段信息
[root@master-01 calico]# sed -i "s#POD_CIDR#${POD_SUBNET}#g" calico.yaml
master-01节点启动Calico插件
[root@master-01 calico]# kubectl apply -f calico.yaml
poddisruptionbudget.policy/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
serviceaccount/calico-node created
serviceaccount/calico-cni-plugin created
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpfilters.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created
....省略部分输出...
此处编者出现了一个问题,node节点全都无法启动,原因是没有添加阿里云的DNS服务器,导致镜像无法拉取,添加nameserver字段后,错误解除
[root@master-01 calico]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-01 Ready control-plane 44m v1.30.1
master-02 Ready control-plane 29m v1.30.1
master-03 Ready control-plane 28m v1.30.1
node-01 Ready <none> 24m v1.30.1
node-02 NotReady <none> 24m v1.30.1
[root@master-01 calico]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-744d658cd-lp4l8 1/1 Running 0 5m22s
calico-node-dzz85 1/1 Running 0 5m22s
calico-node-nbckj 1/1 Running 0 5m22s
calico-node-vt9n7 0/1 Init:0/3 0 5m22s
calico-node-wjfgz 0/1 Init:0/3 0 5m22s
calico-node-x2dkc 1/1 Running 0 5m22s
coredns-7c445c467-l8tm9 1/1 Running 0 32m
coredns-7c445c467-nhvf4 1/1 Running 0 32m
etcd-master-01 1/1 Running 0 32m
etcd-master-02 1/1 Running 0 18m
etcd-master-03 1/1 Running 0 16m
kube-apiserver-master-01 1/1 Running 0 32m
kube-apiserver-master-02 1/1 Running 0 18m
kube-apiserver-master-03 1/1 Running 0 16m
kube-controller-manager-master-01 1/1 Running 0 32m
kube-controller-manager-master-02 1/1 Running 0 18m
kube-controller-manager-master-03 1/1 Running 0 16m
kube-proxy-6m9k7 1/1 Running 0 32m
kube-proxy-k8hqh 0/1 ContainerCreating 0 13m
kube-proxy-pkzt2 1/1 Running 0 16m
kube-proxy-ptp8j 0/1 ContainerCreating 0 13m
kube-proxy-w6qvx 1/1 Running 0 18m
kube-scheduler-master-01 1/1 Running 0 32m
kube-scheduler-master-02 1/1 Running 0 18m
kube-scheduler-master-03 1/1 Running 0 16m
master-01节点查看容器和节点状态,此时节点全部为Ready状态
[root@master-01 calico]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-744d658cd-lp4l8 1/1 Running 0 22m
calico-node-dzz85 1/1 Running 0 22m
calico-node-nbckj 1/1 Running 0 22m
calico-node-vt9n7 1/1 Running 0 22m
calico-node-wjfgz 1/1 Running 0 22m
calico-node-x2dkc 1/1 Running 0 22m
coredns-7c445c467-l8tm9 1/1 Running 0 50m
coredns-7c445c467-nhvf4 1/1 Running 0 50m
etcd-master-01 1/1 Running 0 50m
etcd-master-02 1/1 Running 0 35m
etcd-master-03 1/1 Running 0 34m
kube-apiserver-master-01 1/1 Running 0 50m
kube-apiserver-master-02 1/1 Running 0 35m
kube-apiserver-master-03 1/1 Running 0 34m
kube-controller-manager-master-01 1/1 Running 0 50m
kube-controller-manager-master-02 1/1 Running 0 35m
kube-controller-manager-master-03 1/1 Running 0 34m
kube-proxy-6m9k7 1/1 Running 0 50m
kube-proxy-k8hqh 1/1 Running 0 30m
kube-proxy-pkzt2 1/1 Running 0 34m
kube-proxy-ptp8j 1/1 Running 0 30m
kube-proxy-w6qvx 1/1 Running 0 35m
kube-scheduler-master-01 1/1 Running 0 50m
kube-scheduler-master-02 1/1 Running 0 35m
kube-scheduler-master-03 1/1 Running 0 34m
[root@master-01 calico]# kubectl get node
NAME STATUS ROLES AGE VERSION
master-01 Ready control-plane 50m v1.30.1
master-02 Ready control-plane 36m v1.30.1
master-03 Ready control-plane 34m v1.30.1
node-01 Ready <none> 30m v1.30.1
node-02 Ready <none> 30m v1.30.1
参数优化
master-01节点修改Kube-Proxy模式为ipvs,修改的字段为mode: "ipvs"
[root@master-01 ~]# kubectl edit configmap kube-proxy -n kube-system
configmap/kube-proxy edited
master-01节点通过更新时间戳来触发Kube-Proxy的Pod更新
[root@master-01 ~]# kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system
daemonset.apps/kube-proxy patched
master-01节点验证Kube-Proxy模式
[root@master-01 ~]# curl 127.0.0.1:10249/proxyMode
ipvs
资源采集插件配置
master-01节点将证书文件分发到所有node节点
[root@master-01 ~]# for i in {node-01,node-02}; do scp /etc/kubernetes/pki/front-proxy-ca.crt root@$i:/etc/kubernetes/pki/front-proxy-ca.crt ;done
front-proxy-ca.crt 100% 1123 724.0KB/s 00:00
front-proxy-ca.crt 100% 1123 350.8KB/s 00:00
master-01节点创建metrics server,此处读者遇到问题,由于网络问题,镜像再次拉取失败,读者需要修改comp.yaml文件拉取镜像的地址,提供两个地址给读者选择,还是无法拉取镜像则是当前的网络有问题,读者可以尝试使用其他网络进行拉取(手机热点)
registry.cn-hangzhou.aliyuncs.com/rainux/metrics-server:v0.7.0
registry.cn-beijing.aliyuncs.com/dotbalo/metrics-server:v0.7.0
[root@master-01 ~]# cd k8s-ha-install/kubeadm-metrics-server/
[root@master-01 kubeadm-metrics-server]# kubectl create -f comp.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
报错片段
[root@master-01 kubeadm-metrics-server]# kubectl get pods -n kube-system
metrics-server-555f6755d5-hjp4f 0/1 ImagePullBackOff 0 5m52s
[root@master-01 kubeadm-metrics-server]# kubectl describe pod metrics-server-555f6755d5-hjp4f -n kube-system
...省略部分输出...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m21s default-scheduler Successfully assigned kube-system/metrics-server-555f6755d5-hjp4f to node-02
Warning Failed 7m20s kubelet Failed to pull image "registry.cn-beijing.aliyuncs.com/dotbalo/metrics-server:v0.7.0": failed to pull and unpack image "registry.cn-beijing.aliyuncs.com/dotbalo/metrics-server:v0.7.0": failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://registry.cn-beijing.aliyuncs.com/v2/dotbalo/metrics-server/manifests/sha256:36bee91c79117a845bf5cbe51eae65fbccd647474e049b50e96602caec745342": dial tcp: lookup registry.cn-beijing.aliyuncs.com on 192.168.132.2:53: no such host
Normal Pulling 5m55s (x4 over 7m20s) kubelet Pulling image "registry.cn-beijing.aliyuncs.com/dotbalo/metrics-server:v0.7.0"
Warning Failed 5m55s (x4 over 7m20s) kubelet Error: ErrImagePull
...省略部分输出...
master-01节点查看metrics容器状态(正常状态)
[root@master-01 kubeadm-metrics-server]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
...省略部分输出...
metrics-server-555f6755d5-xcqmg 1/1 Running 0 49s
master-01节点查看集群资源使用情况
[root@master-01 kubeadm-metrics-server]# cd
[root@master-01 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master-01 210m 10% 1074Mi 57%
master-02 273m 13% 994Mi 53%
master-03 273m 13% 1190Mi 63%
node-01 77m 3% 566Mi 30%
node-02 75m 3% 528Mi 28%
[root@master-01 ~]# kubectl top pod -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system calico-kube-controllers-744d658cd-lp4l8 2m 35Mi
kube-system calico-node-dzz85 26m 149Mi
kube-system calico-node-nbckj 31m 154Mi
kube-system calico-node-vt9n7 29m 162Mi
kube-system calico-node-wjfgz 30m 160Mi
kube-system calico-node-x2dkc 28m 142Mi
kube-system coredns-7c445c467-l8tm9 2m 17Mi
kube-system coredns-7c445c467-nhvf4 2m 41Mi
kube-system etcd-master-01 41m 107Mi
kube-system etcd-master-02 51m 102Mi
kube-system etcd-master-03 37m 108Mi
kube-system kube-apiserver-master-01 42m 254Mi
kube-system kube-apiserver-master-02 49m 248Mi
kube-system kube-apiserver-master-03 54m 275Mi
kube-system kube-controller-manager-master-01 2m 41Mi
kube-system kube-controller-manager-master-02 16m 80Mi
kube-system kube-controller-manager-master-03 2m 20Mi
kube-system kube-proxy-cx22c 13m 20Mi
kube-system kube-proxy-nsbft 6m 20Mi
kube-system kube-proxy-rg857 1m 20Mi
kube-system kube-proxy-sg68h 12m 22Mi
kube-system kube-proxy-tgbq6 1m 20Mi
kube-system kube-scheduler-master-01 3m 34Mi
kube-system kube-scheduler-master-02 2m 16Mi
kube-system kube-scheduler-master-03 3m 48Mi
kube-system metrics-server-555f6755d5-xcqmg 3m 24Mi
You have new mail in /var/spool/mail/root
Dashboard插件配置
master-01节点创建dashboard
[root@master-01 ~]# cd /root/k8s-ha-install/dashboard/
[root@master-01 dashboard]# ls
dashboard-user.yaml dashboard.yaml
[root@master-01 dashboard]# kubectl create -f .
serviceaccount/admin-user created
clusterrolebinding.rbac.authorization.k8s.io/admin-user created
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
master-01节点查看容器和其他服务端点是否正常创建
[root@master-01 dashboard]# kubectl get pod -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-56f84bbdbc-hw6tb 1/1 Running 0 75s
kubernetes-dashboard-5fb6564877-7zqs7 1/1 Running 0 75s
master-01节点查看dashboard服务端口号
[root@master-01 dashboard]# kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.96.32.58 <none> 8000/TCP 81s
kubernetes-dashboard NodePort 10.96.135.153 <none> 443:30826/TCP 83s
通过任意安装了kube-proxy的宿主机的IP+端口访问到Dashboard UI界面
master-01节点创建Dashboard登录Token
[root@master-01 dashboard]# cd
[root@master-01 ~]# kubectl create token admin-user -n kube-system
eyJhbGciOiJSUzI1NiIsImtpZCI6IlBGVDNfLXY4ZmIwWVVlVU44THdWNm1ELUhrSGVlLWJ0VkVXWUQ2N1JIcUEifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzE2MjE2NjcxLCJpYXQiOjE3MTYyMTMwNzEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiZGQ4ZGQ3YzAtZmRiZS00NGVhLThkMGMtMzllMjJkYTE4ODA2Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiY2UwYzBmNWEtOTYxZC00Yjc5LTgwYWMtZDRiY2U1NmMwODczIn19LCJuYmYiOjE3MTYyMTMwNzEsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTphZG1pbi11c2VyIn0.mCs_euw660IguyeD4IebVwqPxYh0iWppODUD-5NfjO0iZ1vXPxytTWQNvn2QQZRfmv_VrTxchNcdHzHlWfxsx3TQtpC3Xlg-sMHeMDX87CaiJRayiNPC78SYSIngPaLbg22o6BTtM19GXeSAqMYZdA36zYxgtezcAiWZq_hI8YvYfpKI16UL9tyjWR8xvdwx0wQeGEjBFTdOeTvtZ616fZhmpTPshWrvvKyM7up9R8nlSTGF1oLAeXxyCaYFudZWRt0s5GPO1njKaJ2GBPg1sutDc7_aV7znf-_j6L43NaGiUJaUBsW34Zu3zQn4LBDhDzxNOUDoGOSiVcqInHMQ0A
浏览器输入Token后,即可成功访问Dashboard
由于原生的UI界面对某些服务的创建并不友好,此处再介绍宽哥开发的另一种UI界面
master-01节点安装KRM的集群中创建Namespace,并授权,注意: 下述步骤将KRM安装到了krm命名空间,如果需要更改Namespace,需要把下面步骤所有关于Namespace的地方更改为自己的Namespace,推荐不更改Namespace
[root@master-01 ~]# cd k8s-ha-install/
[root@master-01 k8s-ha-install]# kubectl create -f krm.yaml
namespace/krm created
serviceaccount/krm-backend created
rolebinding.rbac.authorization.k8s.io/krm-backend created
clusterrole.rbac.authorization.k8s.io/namespace-creater created
clusterrolebinding.rbac.authorization.k8s.io/krm-backend-ns-creater created
service/krm-backend created
deployment.apps/krm-backend created
service/krm-frontend created
deployment.apps/krm-frontend created
master-01节点查看KRM暴露的端口
[root@master-01 ~]# kubectl get svc -n krm
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
krm-backend ClusterIP 10.96.212.82 <none> 8080/TCP 2m56s
krm-frontend NodePort 10.96.111.190 <none> 80:32549/TCP 2m5s
通过任意一台Kubernetes工作节点的IP:NodePort即可访问KRM,默认用户名密码:admin / admin
在集群管理界面点击添加按钮,添加集群,可使用下列命令读取kubeconfig配置
[root@master-01 ~]# cat /etc/kubernetes/admin.conf
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJSkJNSUxSVFI4MUl3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRBMU1Ua3hNekUyTWpoYUZ3MHpOREExTVRjeE16SXhNamhhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUUM3dHdKRGRDZ2d4WjVSVUxBVUViMTA4bXNEVU5sTFZldGQySnpXOExjMW1uUmdLSWpIRDVwMDRlNi8KdzRPK3diQnkyZVdPZGFFR1RncEhJNHprMWRPV0NxdDR3L1FVaTM5L0hZUnRobkd6aDFrNXEwa2owVXcyek4wTgpzUVRvMCs3YkxNQ0c5aURaeVB6aiszVzBraFZ5SW9ROEVjZFQrWW1vMlJLOVRkMG52MERXMVAvQ1VIRzNwN3IwClJUSHpIeHZRZ1lCMk5RSUdYcGVoT
...省略部分输出...
其余功能交由读者进行体验
集群可用性验证
节点均正常
[root@master-01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-01 Ready control-plane 24h v1.30.1
master-02 Ready control-plane 24h v1.30.1
master-03 Ready control-plane 24h v1.30.1
node-01 Ready <none> 24h v1.30.1
node-02 Ready <none> 24h v1.30.1
Pod均正常
[root@master-01 ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
krm krm-backend-56c547fc7d-jf5bm 1/1 Running 0 21m
krm krm-frontend-79697dc9dc-8hdgk 1/1 Running 0 21m
kube-system calico-kube-controllers-744d658cd-lp4l8 1/1 Running 2 (108m ago) 24h
kube-system calico-node-dzz85 1/1 Running 2 (108m ago) 24h
kube-system calico-node-nbckj 1/1 Running 2 (108m ago) 24h
kube-system calico-node-vt9n7 1/1 Running 2 (108m ago) 24h
kube-system calico-node-wjfgz 1/1 Running 2 (108m ago) 24h
kube-system calico-node-x2dkc 1/1 Running 2 (108m ago) 24h
kube-system coredns-7c445c467-l8tm9 1/1 Running 2 (108m ago) 24h
kube-system coredns-7c445c467-nhvf4 1/1 Running 2 (108m ago) 24h
kube-system etcd-master-01 1/1 Running 2 (108m ago) 24h
kube-system etcd-master-02 1/1 Running 2 (108m ago) 24h
kube-system etcd-master-03 1/1 Running 2 (108m ago) 24h
kube-system kube-apiserver-master-01 1/1 Running 2 (108m ago) 24h
kube-system kube-apiserver-master-02 1/1 Running 2 (108m ago) 24h
kube-system kube-apiserver-master-03 1/1 Running 2 (108m ago) 24h
kube-system kube-controller-manager-master-01 1/1 Running 2 (108m ago) 24h
kube-system kube-controller-manager-master-02 1/1 Running 3 (33m ago) 24h
kube-system kube-controller-manager-master-03 1/1 Running 3 (49m ago) 24h
kube-system kube-proxy-cx22c 1/1 Running 0 97m
kube-system kube-proxy-nsbft 1/1 Running 0 97m
kube-system kube-proxy-rg857 1/1 Running 0 97m
kube-system kube-proxy-sg68h 1/1 Running 0 97m
kube-system kube-proxy-tgbq6 1/1 Running 0 97m
kube-system kube-scheduler-master-01 1/1 Running 2 (108m ago) 24h
kube-system kube-scheduler-master-02 1/1 Running 3 (48m ago) 24h
kube-system kube-scheduler-master-03 1/1 Running 2 (108m ago) 24h
kube-system metrics-server-555f6755d5-xcqmg 1/1 Running 0 47m
kubernetes-dashboard dashboard-metrics-scraper-56f84bbdbc-hw6tb 1/1 Running 0 32m
kubernetes-dashboard kubernetes-dashboard-5fb6564877-7zqs7 1/1 Running 0 32m
集群网段无冲突
[root@master-01 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 24h
[root@master-01 ~]# kubectl get pods -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
krm krm-backend-56c547fc7d-jf5bm 1/1 Running 0 22m 172.16.184.7 node-02 <none> <none>
krm krm-frontend-79697dc9dc-8hdgk 1/1 Running 0 21m 172.16.190.8 node-01 <none> <none>
kube-system calico-kube-controllers-744d658cd-lp4l8 1/1 Running 2 (109m ago) 24h 172.16.133.137 master-03 <none> <none>
kube-system calico-node-dzz85 1/1 Running 2 (109m ago) 24h 192.168.132.169 master-01 <none> <none>
kube-system calico-node-nbckj 1/1 Running 2 (109m ago) 24h 192.168.132.171 master-03 <none> <none>
kube-system calico-node-vt9n7 1/1 Running 2 (109m ago) 24h 192.168.132.173 node-02 <none> <none>
kube-system calico-node-wjfgz 1/1 Running 2 (109m ago) 24h 192.168.132.172 node-01 <none> <none>
kube-system calico-node-x2dkc 1/1 Running 2 (109m ago) 24h 192.168.132.170 master-02 <none> <none>
kube-system coredns-7c445c467-l8tm9 1/1 Running 2 (109m ago) 24h 172.16.133.136 master-03 <none> <none>
kube-system coredns-7c445c467-nhvf4 1/1 Running 2 (109m ago) 24h 172.16.133.135 master-03 <none> <none>
kube-system etcd-master-01 1/1 Running 2 (109m ago) 24h 192.168.132.169 master-01 <none> <none>
kube-system etcd-master-02 1/1 Running 2 (109m ago) 24h 192.168.132.170 master-02 <none> <none>
kube-system etcd-master-03 1/1 Running 2 (109m ago) 24h 192.168.132.171 master-03 <none> <none>
kube-system kube-apiserver-master-01 1/1 Running 2 (109m ago) 24h 192.168.132.169 master-01 <none> <none>
kube-system kube-apiserver-master-02 1/1 Running 2 (109m ago) 24h 192.168.132.170 master-02 <none> <none>
kube-system kube-apiserver-master-03 1/1 Running 2 (109m ago) 24h 192.168.132.171 master-03 <none> <none>
kube-system kube-controller-manager-master-01 1/1 Running 2 (109m ago) 24h 192.168.132.169 master-01 <none> <none>
kube-system kube-controller-manager-master-02 1/1 Running 3 (33m ago) 24h 192.168.132.170 master-02 <none> <none>
kube-system kube-controller-manager-master-03 1/1 Running 3 (49m ago) 24h 192.168.132.171 master-03 <none> <none>
kube-system kube-proxy-cx22c 1/1 Running 0 97m 192.168.132.173 node-02 <none> <none>
kube-system kube-proxy-nsbft 1/1 Running 0 97m 192.168.132.170 master-02 <none> <none>
kube-system kube-proxy-rg857 1/1 Running 0 98m 192.168.132.171 master-03 <none> <none>
kube-system kube-proxy-sg68h 1/1 Running 0 97m 192.168.132.172 node-01 <none> <none>
kube-system kube-proxy-tgbq6 1/1 Running 0 97m 192.168.132.169 master-01 <none> <none>
kube-system kube-scheduler-master-01 1/1 Running 2 (109m ago) 24h 192.168.132.169 master-01 <none> <none>
kube-system kube-scheduler-master-02 1/1 Running 3 (49m ago) 24h 192.168.132.170 master-02 <none> <none>
kube-system kube-scheduler-master-03 1/1 Running 2 (109m ago) 24h 192.168.132.171 master-03 <none> <none>
kube-system metrics-server-555f6755d5-xcqmg 1/1 Running 0 48m 172.16.190.5 node-01 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-56f84bbdbc-hw6tb 1/1 Running 0 33m 172.16.190.7 node-01 <none> <none>
kubernetes-dashboard kubernetes-dashboard-5fb6564877-7zqs7 1/1 Running 0 33m 172.16.184.6 node-02 <none> <none>
集群可以正常创建资源
[root@master-01 ~]# kubectl create deploy cluster-test --image=registry.cn-beijing.aliyuncs.com/dotbalo/debug-tools -- sleep 3600
deployment.apps/cluster-test created
[root@master-01 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cluster-test-665f554bcc-rsdkm 1/1 Running 0 4m27s
Pod可以解析SVC,无论是相同命名空间还是跨命名空间
[root@master-01 ~]# kubectl exec -it cluster-test-665f554bcc-rsdkm bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
(14:27 cluster-test-665f554bcc-rsdkm:/) nslookup kubernetes
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
(14:27 cluster-test-665f554bcc-rsdkm:/) nslookup kubernetes-dns.kube-system
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find kubernetes-dns.kube-system: NXDOMAIN
所有节点都必须要能访问 Kubernetes 的 kubernetes svc 443 和 kube-dns 的 service 53,访问443端口报错原因是因为没有权限,此处读者无需理会
[root@master-01 ~]# curl 10.96.0.10:53
curl: (52) Empty reply from server
[root@master-01 ~]# curl -k https://10.96.0.1:443
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {},
"code": 403
所有节点Pod之间可以正常通信,无论是相同命名空间还是跨命名空间,选取不同命名空间下的两个容器,进行连通性测试
[root@master-01 ~]# kubectl get pods -n default -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-test-665f554bcc-rsdkm 1/1 Running 0 21m 172.16.184.8 node-02 <none> <none>
[root@master-01 ~]# kubectl get pods -n kube-system -owide | grep calico
calico-kube-controllers-744d658cd-lp4l8 1/1 Running 2 (133m ago) 24h 172.16.133.137 master-03 <none> <none>
calico-node-dzz85 1/1 Running 2 (133m ago) 24h 192.168.132.169 master-01 <none> <none>
calico-node-nbckj 1/1 Running 2 (133m ago) 24h 192.168.132.171 master-03 <none> <none>
calico-node-vt9n7 1/1 Running 2 (133m ago) 24h 192.168.132.173 node-02 <none> <none>
calico-node-wjfgz 1/1 Running 2 (133m ago) 24h 192.168.132.172 node-01 <none> <none>
calico-node-x2dkc 1/1 Running 2 (133m ago) 24h 192.168.132.170 master-02 <none> <none>
[root@master-01 ~]# kubectl exec -it cluster-test-665f554bcc-rsdkm -- ping 172.16.133.137 -c 2
PING 172.16.133.137 (172.16.133.137) 56(84) bytes of data.
64 bytes from 172.16.133.137: icmp_seq=1 ttl=62 time=1.56 ms
64 bytes from 172.16.133.137: icmp_seq=2 ttl=62 time=1.49 ms
--- 172.16.133.137 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.491/1.528/1.566/0.054 ms
所有节点Pod之间可以正常通信,无论是相同主机节点还是不同主机节点,查看新创建Pod的地址和所在主机,使用其他主机节点进行访问,确保连通性
[root@master-01 ~]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-test-665f554bcc-rsdkm 1/1 Running 0 17m 172.16.184.8 node-02 <none> <none>
[root@master-01 ~]# ping 172.16.184.8 -c 2
PING 172.16.184.8 (172.16.184.8) 56(84) bytes of data.
64 bytes from 172.16.184.8: icmp_seq=1 ttl=63 time=0.456 ms
64 bytes from 172.16.184.8: icmp_seq=2 ttl=63 time=0.397 ms
--- 172.16.184.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1058ms
rtt min/avg/max/mdev = 0.397/0.426/0.456/0.036 ms
You have new mail in /var/spool/mail/root
大佬,kubeadm config images pull –config /root/new.yaml,添加了域名还是无法使用
[root@master-01 ~]# kubeadm config images pull –config /root/new.yaml
failed to pull image “registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.30.5″: output: time=”2024-10-14T18:38:52+08:00″ level=fatal msg=”validate service connection: validate CRI v1 image API for endpoint \”unix:///var/run/containerd/containerd.sock\”: rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService”
, error: exit status 1
To see the stack trace of this error execute with –v=5 or higher
看起来是否是容器的端点没有配置好,检查containerd的配置,随后重启一下虚拟机再次执行。