K8s 安裝筆記 (ubuntu16.04) - kubeadm 手動


整理安裝 Kubernetes 的筆記,主要是以 kubeadm 為主。

K8s Cluster 安裝管理的四種選擇:手動 (kubeadm)半自動 (kops)自動 (EKS)全自動 (GKE)


準備 Base Image

這個 Base Image 會用在安裝 Master / Worker Node.

  • 虛擬環境:
    • 在 VMWare 上,將 Guest OS 的網路設定為 Bridge Mode
    • Proxmox 上,同樣將 Guest OS 的網路設定為 Bridge Mode
  • 安裝 ubuntu 16.04 (ubuntu-16.04.6-server-amd64.iso)
    • 關閉 swap: swapoff -a
    • 註解 /etc/fstab

安裝 CRI: Docker-CE

因為 CRI (Container Runtime Interface) 最近已經從 CNCF 畢業了,所以除了 Docker,還有其他可以選擇。但這裡還是先以 docker 為主。

以下參考自:https://kubernetes.io/docs/setup/cri/

docker 之外其他 CRI 選擇,有名的如 CRI-O, containerd, rtk, kata container

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Install Docker CE
## Set up the repository:
### Update the apt package index
apt-get update

### Install packages to allow apt to use a repository over HTTPS
apt-get update && apt-get install apt-transport-https ca-certificates curl software-properties-common

### Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

### Add docker apt repository.
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"

## Install docker ce.
apt-get update && apt-get install docker-ce=18.06.2~ce~3-0~ubuntu

# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# Restart docker.
systemctl daemon-reload
systemctl restart docker

確認版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
~# docker version
Client:
Version: 18.06.2-ce
API version: 1.38
Go version: go1.10.3
Git commit: 6d37f41
Built: Sun Feb 10 03:48:06 2019
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 18.06.2-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: 6d37f41
Built: Sun Feb 10 03:46:30 2019
OS/Arch: linux/amd64
Experimental: false

安裝 kubeadm, kubectl, kubelet

主要參考自 官方文件 ,不過最新版 kubeadm 無法順利初始化,所以這個紀錄以 1.11.3 為範例。

準備 repository:

1
2
3
4
5
6
apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update

指定 kubeadm 版本:

1
2
3
apt-get install -y kubelet=1.11.3-00
apt-get install -y kubectl=1.11.3-00
apt-get install -y kubeadm=1.11.3-00

安裝新版本 kubeadm 如下,但是本次 Lab 無法運作:

1
2
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

確認安裝版本:kubeadm / kubectl

如果是安裝指定版本:

1
2
3
4
5
~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:59:42Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

~# kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

最新版的確認:

1
2
3
4
5
~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:35:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

~# kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

安裝 Kubernetes Cluster

  • 安裝 Master Node
  • 加入 Worker Node

安裝 Master Node

改機器名字:

  1. /etc/hostname: k8s-master01-u1604
  2. /etc/hosts: k8s-master01-u1604
  3. reboot

初始化 kubeadm

建立 kubeadm 配置檔:

kubeadm.yaml
1
2
3
4
5
6
7
8
9
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
controllerManagerExtraArgs:
horizontal-pod-autoscaler-use-rest-clients: "true"
horizontal-pod-autoscaler-sync-period: "10s"
node-monitor-grace-period: "10s"
apiServerExtraArgs:
runtime-config: "api/all=true"
kubernetesVersion: "stable-1.11"

這份詳細的資訊,請參閱 原始碼

執行初始化配置,過程中會根據指訂的 CRI 抓取 image,預設的 CRI 是 Docker.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
kubeadm init --config kubeadm.yaml

## 順利的話,執行過程約 2-3 分鐘,最後會出現以下訊息,
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

kubeadm join 192.168.2.16:6443 --token s8o9wi.dylbvs735sy53mmq --discovery-token-ca-cert-hash sha256:0c16a05978533ca8f44af6e779162a1c99516fa2a4acd81915f0379755a856bc

查看 docker ps,會出現一堆 container 已經在跑:

1
2
3
4
5
6
7
8
~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ab574cbeb1f2 2ed65dca1a98 "/usr/local/bin/kube…" 33 seconds ago Up 33 seconds k8s_kube-proxy_kube-proxy-7cfkg_kube-system_83851117-48bf-11e9-b533-000c29d7e00b_0
5bb5811d11d4 k8s.gcr.io/pause:3.1 "/pause" 34 seconds ago Up 33 seconds k8s_POD_kube-proxy-7cfkg_kube-system_83851117-48bf-11e9-b533-000c29d7e00b_0
acc7433382dc b8df3b177be2 "etcd --advertise-cl…" 56 seconds ago Up 55 seconds k8s_etcd_etcd-k8s-master01-u1604_kube-system_f09a86c0e59bd660bdd359cf6d46e2be_0
bf812ade4168 14028d7dcbf9 "kube-scheduler --ad…" 56 seconds ago Up 55 seconds k8s_kube-scheduler_kube-scheduler-k8s-master01-u1604_kube-system_cbb979db2eb698a42e58c4ca7edd7b16_0
3951b23da250 abbc2fa179b7 "kube-controller-man…" 56 seconds ago Up 55 seconds k8s_kube-controller-manager_kube-controller-manager-k8s-master01-u1604_kube-system_fc391fbab6130026480db4a97e595c16_0
9c146a4aae4b 6de771eabf8c "kube-apiserver --au…" 56 seconds ago Up 55 seconds k8s_kube-apiserver_kube-apiserver-k8s-master01-u1604_kube-system_f09f833b5c32ac560364b59f58055df6_0

同樣的,查看 docker images,應該已經抓了一堆東西。

1
2
3
4
5
6
7
8
9
~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy-amd64 v1.11.8 2ed65dca1a98 2 weeks ago 98.1MB
k8s.gcr.io/kube-apiserver-amd64 v1.11.8 6de771eabf8c 2 weeks ago 187MB
k8s.gcr.io/kube-controller-manager-amd64 v1.11.8 abbc2fa179b7 2 weeks ago 155MB
k8s.gcr.io/kube-scheduler-amd64 v1.11.8 14028d7dcbf9 2 weeks ago 56.9MB
k8s.gcr.io/coredns 1.1.3 b3b94275d97c 9 months ago 45.6MB
k8s.gcr.io/etcd-amd64 3.2.18 b8df3b177be2 11 months ago 219MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 15 months ago 742kB

要注意 Node Disk 的容量

確認 Master Node 狀態

取得 node 狀態,k8s-master01-u1604 還沒有 ready。可以從 describe 中看到這段關鍵訊息:network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

1
2
3
4
5
6
7
8
9
10
11
12
13
14

~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01-u1604 NotReady master 2m v1.11.3

~# kubectl describe node k8s-master01-u1604
... 略 ...

Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
... 略 ...
Ready False Sun, 17 Mar 2019 22:20:59 +0800 Sun, 17 Mar 2019 22:18:02 +0800 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

... 略 ...

取得 kube-system pods 狀態,可以看到 coredns 還沒 ready

1
2
3
4
5
6
7
8
9
~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-j88px 0/1 Pending 0 4m
coredns-78fcdf6894-lvlf7 0/1 Pending 0 4m
etcd-k8s-master01-u1604 1/1 Running 0 3m
kube-apiserver-k8s-master01-u1604 1/1 Running 0 3m
kube-controller-manager-k8s-master01-u1604 1/1 Running 0 3m
kube-proxy-7cfkg 1/1 Running 0 4m
kube-scheduler-k8s-master01-u1604 1/1 Running 0 3m

部署 Network Overlay: Weave Net

K8s 的 Pod 與 Pod 之間透過 Network Overlay 網路架構溝通, K8s 提供 CNI (Container Network Interface) 標準化介面,底下的範例是 weave

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 部署網路插件
kubectl apply -f https://git.io/weave-kube-1.6

# 再次檢查狀態:weave 部署中
~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-j88px 0/1 Pending 0 8m
coredns-78fcdf6894-lvlf7 0/1 Pending 0 8m
etcd-k8s-master01-u1604 1/1 Running 0 7m
kube-apiserver-k8s-master01-u1604 1/1 Running 0 7m
kube-controller-manager-k8s-master01-u1604 1/1 Running 0 8m
kube-proxy-7cfkg 1/1 Running 0 8m
kube-scheduler-k8s-master01-u1604 1/1 Running 0 7m
weave-net-4gbxq 0/2 ContainerCreating 0 19s

## 已經完成部署 weave
~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-j88px 1/1 Running 0 9m
coredns-78fcdf6894-lvlf7 1/1 Running 0 9m
etcd-k8s-master01-u1604 1/1 Running 0 8m
kube-apiserver-k8s-master01-u1604 1/1 Running 0 8m
kube-controller-manager-k8s-master01-u1604 1/1 Running 0 8m
kube-proxy-7cfkg 1/1 Running 0 9m
kube-scheduler-k8s-master01-u1604 1/1 Running 0 8m
weave-net-4gbxq 2/2 Running 0 39s

## 再次取得 node 狀態
~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01-u1604 Ready master 10m v1.11.3

除了 Weave Net, 另外兩個常見的 CNI: flannel, calico

1
2
3
4
5
6
7
8
9
## flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

### calico
kubectl apply -f \
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
kubectl delete -f \
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml

新增 Worker Node

確認已經安裝 kubeadm, kubectl, kubelet,改機器名字:

  1. /etc/hostname: k8s-worker01-u1604
  2. /etc/hosts: k8s-worker01-u1604
  3. reboot

完成開機後,把這台機器加入 Kubernetes Cluster:

1
~# kubeadm join 192.168.2.16:6443 --token s8o9wi.dylbvs735sy53mmq --discovery-token-ca-cert-hash sha256:0c16a05978533ca8f44af6e779162a1c99516fa2a4acd81915f0379755a856bc

如果上述 token 過期了,到 master 執行這段產生: kubeadm token create --print-join-command

k8s-master01-u1604 這台機器上,下 kubectl get nodes,就會看到 worker node 已經加入 cluster.

1
2
3
4
~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01-u1604 Ready master 19m v1.11.3
k8s-worker01-u1604 Ready <none> 2m v1.11.3

同樣的,Join 之後可以連到 Worker Node 透過 docker ps 看看實際的運行狀況.


安裝 Add-on

部署 Dashboard

部署 kubernetes-dashboard,參考 官方 說明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
~# kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
kubectl apply -f kubernetes-dashboard.yaml
secret/kubernetes-dashboard-certs unchanged
secret/kubernetes-dashboard-csrf unchanged
serviceaccount/kubernetes-dashboard unchanged
role.rbac.authorization.k8s.io/kubernetes-dashboard-minimal unchanged
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-minimal unchanged
deployment.apps/kubernetes-dashboard created
service/kubernetes-dashboard unchanged

~# kubectl get pods -n kube-system | grep dashboard
... 略 ...
kubernetes-dashboard-5dd89b9875-5mx9n 1/1 Running 0 16s
... 略 ...

## 啟動 proxy
~# kubectl proxy

## 開啟網頁瀏覽以下位址
# http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
# 透過 RBAC 取得 Token 登入

使用 RBAC (Role-Base Access Control)

參考:https://github.com/kubernetes/dashboard/wiki/Creating-sample-user

admin.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# admin-user.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kube-system

---
# admin-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kube-system

然後執行 kubectl apply -f admin.yaml,好了之後取得 admin-user token

1
2
3
4
5
6
7
8
9
10
11
12
13
14
~$ kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
Name: admin-user-token-txkst
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name: admin-user
kubernetes.io/service-account.uid: fb1c2dcb-5ebf-11e9-8d1e-92cde7b04430

Type: kubernetes.io/service-account-token

Data
====
ca.crt: 1025 bytes
namespace: 11 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5

複製 token 回到 dashboard 選擇 token 方式登入。

部署 Persistent Volume(PV)和 Persistent Volume Claim (PVC)

主要針對 K8s 的 PV / PVC 套件安裝,安裝之後,才能夠跑 StatefulSet

這邊安裝的是 rook,參考 官方文件

1. 建立 namespace、CRD、RBAC 相關資源

第一部會建立 namespace 給 rook 使用,然後建立 RBAC 相關資源,包含以下:

  • Namespace: rook-ceph
  • CRD (Custom Resource Definition):
    • cephclusters
    • cephfilesystems
    • cephnfses
    • cephobjectstores
    • cephobjectstoreusers
    • cephblockpools
    • volumes
  • RBAC 相關:
    • ClusterRole: rook-ceph-cluster-mgmt, rook-ceph-global, rook-ceph-mgr-cluster, rook-ceph-mgr-system
    • Role: rook-ceph-system, rook-ceph-osd, rook-ceph-mgr
    • ServiceAccount: rook-ceph-system, rook-ceph-osd, rook-ceph-mgr
    • RoleBinding: rook-ceph-system, rook-ceph-cluster-mgmt, rook-ceph-osd, rook-ceph-mgr, rook-ceph-mgr-system
    • ClusterRoleBinding: rook-ceph-global, rook-ceph-mgr-cluster
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
~# kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/cluster/examples/kubernetes/ceph/common.yaml
namespace/rook-ceph created

customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/volumes.rook.io created

clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
role.rbac.authorization.k8s.io/rook-ceph-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
serviceaccount/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created

2. 建立 deployment:rook

建立相關的 pods

1
2
3
4
5
6
7
8
9
10
~# kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/cluster/examples/kubernetes/ceph/operator.yaml

## 第一次會跑一下子, rook/ceph 大小約 700MB
~# kubectl get po -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-agent-dcgb8 1/1 Running 0 2m
rook-ceph-agent-jcpmd 1/1 Running 0 2m
rook-ceph-operator-65b65fbd66-9m58m 1/1 Running 0 3m35s
rook-discover-88mf5 1/1 Running 0 2m
rook-discover-wggnl 1/1 Running 0 2m

這次的部署包含 agent, operator, discover, 等待所有的 pod 都是 running 狀態後,就可以部署 Ceph Cluster.

3. 部署 Ceph Cluster

執行以下部署 Ceph Cluster:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
~# kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/cluster/examples/kubernetes/ceph/cluster.yaml
cephcluster.ceph.rook.io/rook-ceph created

## 等一下,會出現 rook-ceph-mon-a
~# kubectl get po -n rook-ceph
rook-ceph-agent-dcgb8 1/1 Running 0 127m
rook-ceph-agent-jcpmd 1/1 Running 2 127m
rook-ceph-mgr-a-77d8645896-8blh4 1/1 Running 0 118m
rook-ceph-mon-a-9cbbbf7b-2w6zk 1/1 Running 1 123m
rook-ceph-mon-b-775ff945c5-77vtv 1/1 Running 0 122m
rook-ceph-mon-c-59695fb97b-drnww 1/1 Running 1 122m
rook-ceph-operator-65b65fbd66-9m58m 1/1 Running 1 129m
rook-discover-88mf5 1/1 Running 0 127m
rook-discover-wggnl 1/1 Running 1 127m

完成部署後,會出現以下:

  • Ceph Monitor: 三個
  • Ceph Manager: 一個

Deployment

1
2
3
4
5
6
7
~$ kubectl get deploy -n rook-ceph
NAME READY UP-TO-DATE AVAILABLE AGE
rook-ceph-mgr-a 1/1 1 1 119m
rook-ceph-mon-a 1/1 1 1 124m
rook-ceph-mon-b 1/1 1 1 123m
rook-ceph-mon-c 1/1 1 1 123m
rook-ceph-operator 1/1 1 1 130m

安裝 Weave Scope (監控工具)

Weave Scope 是個探索 K8s 資源關係與狀態的圖形介面,CNI 需要用 WeaveNet 才能使用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 安裝方法一
kubectl apply -f "https://cloud.weave.works/k8s/scope.yaml?k8s-version=$(kubectl version | base64 | tr -d '\n')"

## 開啟 port forward
kubectl port-forward -n weave "$(kubectl get -n weave pod --selector=weave-scope-component=app -o jsonpath='{.items..metadata.name}')" 4040

# 從 Source Code 安裝
git clone https://github.com/weaveworks/scope
cd scope
kubectl apply -f examples/k8s

## 瀏覽
kubectl port-forward svc/weave-scope-app -n weave 8040:80
# http://127.0.0.1:8040


維護

刪除 Worker Node

  • 了解如何刪除 worker node
  • 了解刪除過程中會發生的事情

打算刪除 k8s14-worker02-u1604 這個 worker node,先檢查現況

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 1. 確認現在的 node
~$ k get no
NAME STATUS ROLES AGE VERSION
k8s14-master01-u1604 Ready master 10d v1.14.0
k8s14-worker02-u1604 Ready <none> 4m31s v1.14.0
k8s14-worker03-u1604 Ready <none> 32m v1.14.0
k8s14-worker04-u1604 Ready <none> 24m v1.14.0

## 檢查有哪些 pod 跑在 worker02
~$ k get po -o wide | grep worker02
kube-proxy-zx6fn 1/1 Running 0 3m53s 192.168.2.16 k8s14-worker02-u1604
weave-net-fkb5w 2/2 Running 1 3m53s 192.168.2.16 k8s14-worker02-u1604

## 2. 到 worker02 裡面,檢查 docker ps
root@k8s14-worker02-u1604:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
239c836beb6a 047e0878ff14 "/tini -- /usr/local…" 3 minutes ago Up 3 minutes k8s_rook-ceph-agent_rook-ceph-agent-bsgqk_rook-ceph_71d9fc74-605e-11e9-a972-92cde7b04430_1
fe68d8f88ed1 1f394ae9e226 "/home/weave/launch.…" 3 minutes ago Up 3 minutes k8s_weave_weave-net-fkb5w_kube-system_6bd4a2fa-605e-11e9-a972-92cde7b04430_1
ae00e498728f 789b7f496034 "/usr/bin/weave-npc" 4 minutes ago Up 4 minutes k8s_weave-npc_weave-net-fkb5w_kube-system_6bd4a2fa-605e-11e9-a972-92cde7b04430_0
... 略 ...

開始執行刪除 worker node,主要步驟如下:

  1. Drain Node
  2. Delete Node
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# 1. Drain Node, 同時刪除一些資料
~$ kubectl drain k8s14-worker02-u1604 --delete-local-data --force --ignore-daemonsets
node/k8s14-worker02-u1604 cordoned
WARNING: Ignoring DaemonSet-managed pods: kube-proxy-zx6fn, weave-net-fkb5w, rook-ceph-agent-bsgqk, rook-discover-p5ptq, weave-scope-agent-wsmd2
pod/rook-ceph-mon-b-775ff945c5-rzzt6 evicted

## 1-1. 確認狀態
~$ kubectl get no
NAME STATUS ROLES AGE VERSION
k8s14-master01-u1604 Ready master 10d v1.14.0
k8s14-worker02-u1604 Ready,SchedulingDisabled <none> 8m36s v1.14.0
k8s14-worker03-u1604 Ready <none> 36m v1.14.0
k8s14-worker04-u1604 Ready <none> 28m v1.14.0

## 1-2. Begin of Checking: 沒啥變化
### 檢查跑在 worker02 上的 pod
~$ k get po -o wide | grep worker02
kube-proxy-zx6fn 1/1 Running 0 8m56s 192.168.2.16 k8s14-worker02-u1604
weave-net-fkb5w 2/2 Running 1 8m56s 192.168.2.16 k8s14-worker02-u1604

### 到 worker02 裡面,檢查 docker ps
root@k8s14-worker02-u1604:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
239c836beb6a 047e0878ff14 "/tini -- /usr/local…" 3 minutes ago Up 3 minutes k8s_rook-ceph-agent_rook-ceph-agent-bsgqk_rook-ceph_71d9fc74-605e-11e9-a972-92cde7b04430_1
fe68d8f88ed1 1f394ae9e226 "/home/weave/launch.…" 3 minutes ago Up 3 minutes k8s_weave_weave-net-fkb5w_kube-system_6bd4a2fa-605e-11e9-a972-92cde7b04430_1
ae00e498728f 789b7f496034 "/usr/bin/weave-npc" 4 minutes ago Up 4 minutes k8s_weave-npc_weave-net-fkb5w_kube-system_6bd4a2fa-605e-11e9-a972-92cde7b04430_0
... 略 ...
## End of Checking: 沒啥變化


# 2. 刪除 worker01 node
~$ kubectl delete node k8s14-worker02-u1604
node "k8s14-worker02-u1604" deleted

## 2-1. 檢查 node
~$ k get no
NAME STATUS ROLES AGE VERSION
k8s14-master01-u1604 Ready master 10d v1.14.0
k8s14-worker03-u1604 Ready <none> 42m v1.14.0
k8s14-worker04-u1604 Ready <none> 33m v1.14.0


## 2-2. 到 worker02 裡執行 kubeadm reset 清除資料, 如下:
root@k8s14-worker02-u1604:~# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0416 23:57:25.138327 18457 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your systems IPVS tables.


### (optional) 到 worker02 機器裡面看看 docker ps, container 很快就刪完了,沒有 container 再跑了
root@k8s14-worker02-u1604:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bf7310c8d3c6 d9ece03f45e7 "/home/weave/entrypo…" 9 minutes ago Up 9 minutes k8s_scope-agent_weave-scope-agent-wsmd2_weave_6bd48236-605e-11e9-a972-92cde7b04430_0
79d7b777a525 k8s.gcr.io/pause:3.1 "/pause" 9 minutes ago Up 9 minutes k8s_POD_weave-scope-agent-wsmd2_weave_6bd48236-605e-11e9-a972-92cde7b04430_0

root@k8s14-worker02-u1604:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@k8s14-worker02-u1604:~#

### (optional) docker images 還在
root@k8s14-worker02-u1604:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
rook/ceph master 047e0878ff14 3 days ago 698MB
wordpress latest 837092bc87de 5 days ago 421MB
istio/proxyv2 1.1.2 c7fb421f087e 12 days ago 378MB
... 略 ...

Troubleshooting

CoreDNS 無法啟動

我的機器開機後,CoreDNS pod 狀態一直是 Completed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
~$ k get po
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-f4kcl 0/1 Completed 3 10d
coredns-fb8b8dccf-n5tj6 0/1 Completed 1 23h
etcd-k8s14-master01-u1604 1/1 Running 5 10d
kube-apiserver-k8s14-master01-u1604 1/1 Running 5 10d
kube-controller-manager-k8s14-master01-u1604 1/1 Running 5 10d
kube-proxy-2bjb8 1/1 Running 5 10d
kube-proxy-fcm2v 1/1 Running 8 10d
kube-proxy-hn44m 1/1 Running 6 10d
kube-scheduler-k8s14-master01-u1604 1/1 Running 5 10d
weave-net-gbjvt 2/2 Running 5 22h
weave-net-rjj5p 2/2 Running 7 22h
weave-net-rns74 2/2 Running 5 22h

這個問題發生在 worker node 故障了,後來解決方法是 join new node, drain old node 就解掉了。


參考資料

延伸閱讀 (站內)

Installation

Storage

Networking


Comments