博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Kubernetes Fluentd+Elasticsearch+Kibana统一日志管理平台搭建的填坑指南
阅读量:4358 次
发布时间:2019-06-07

本文共 33774 字,大约阅读时间需要 112 分钟。

 

在初步完成Kubernetes集群架构的建立后,通过搭建一些监控组件,我们已经能够实现

  • 图形化的监控每个node,pod的状态信息和资源情况
  • 通过scale进行replicateSet的扩展和伸缩
  • 通过kubectl logs 或dashboard去查看每个Pod的运行日志

但是,在分布式架构中节点的规模往往是很庞大的,一个典型的生产环境可能有几十上百个minion节点,在这种情况下就需要建立一套集中的日志监控和管理系统,在本人前期的思考中,也想通过volumn外挂到存储的方式实现weblogic的日志输出到共享存储,但这种方式的问题在于:

  • 我们通过WebLogic单域的模式进行Docker的扩展服务, 这就意味着所有的日志路径和名字都是一致的,也就是都存放在pod的container节点的统一的路径下(/u01/oracle/user_projects/domains/base_domain/servers/AdminServer/logs/AdminServer.log),如果通过volumnMount映射到存储,会存在文件的冲突问题. 
  • 无法获取pod和container的信息
  • 无法获取集群中其他节点的运行信息

因此还是需要寻找平台级别的架构方案.在kubernetes的官方文档中,https://kubernetes.io/docs/concepts/cluster-administration/logging/

Kubernetes给出了几种日志方案,并给出Cluster-level logging的参考架构:

也就是说,我们自己启动运行的Pod的内部容器进程通过streaming的方式把日志输出到minion主机,然后由运行在相同主机的另外一个pod,logging-agent-pod把日志获取到,同时把日志传回Backend, Bankend实际上是基于不同的实现,比如Elasticsearch-logging,以及展示的kibana平台。

 

 Kubernetes建议采用这种结点级别的logging-agent,并提供了其中的两种,一种用于Google Cloud Platform的,另一种就是,两种都是采用fluentd做为在结点上运行的Agent(日志代理)

Using a node-level logging agent is the most common and encouraged approach for a Kubernetes cluster, because it creates only one agent per node, and it doesn’t require any changes to the applications running on the node. However, node-level logging only works for applications’ standard output and standard error.

Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release:  for use with Google Cloud Platform, and . You can find more information and instructions in the dedicated documents. Both use  with custom configuration as an agent on the node.

好了,下文便开始我们的填坑指南

1. 准备工作

  • 先交代一下环境

操作系统: CentOS 7.3

Kubernetes version: 1.5

[root@k8s-master fluentd-elasticsearch]# kubectl version

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

  • 将github中kubernetes代码都下栽到master本地。
git clone https://github.com/kubernetes/kubernetes
  • 配置ServiceAccount,这是因为之后下载的fluentd images需要用到SSL的方式去连接API Server,如果不准备修改并生成新的images的话,还是需要配置好, 配置指南参考
http://www.cnblogs.com/ericnie/p/6894688.html
  • 配置dns,kibana组件需要通过dns找到elasticsearch-logging的Service,如果不配置dns,那就需要修改kibana-controller.yaml中的地址为固定的service ip,配置指南参考
http://www.cnblogs.com/ericnie/p/6897142.html

 

  • 下载images

进入/root/kubernetes/cluster/addons/fluentd-elasticsearch目录,看到所有的yaml文件

其中fluentd-es-ds.yaml用于构建运行在每个结点的fluentd DamonSet负责logging Agent角色,es-controller.yaml和es-service.yaml用于构建elasticsearch logging,负责logging backend的日志汇总,而kibana-controller和kibana-service用于展示

把几个conroller.yaml文件中的images下载到各个minion节点

docker pull gcr.io/google_containers/elasticsearch:v2.4.1-2docker pull gcr.io/google_containers/fluentd-elasticsearch:1.22docker pull gcr.io/google_containers/kibana:v4.6.1-1

 

2.启动fluentd DaemonSet

  • 打Label(坑1)

Fluentd会运行在每一个minion节点上,通过

# kubectl create -f fluentd-es-ds.yaml daemonset "fluentd-es-v1.22" created

然后在minion节点上通过tail -f /var/log/fluentd.log中查看,结果在minion节点上根本没有fluentd.log文件啊!

笔者通过

kubectl get pods -n kube-system

看了一下,发现根本没有fluentd相关的Pod在运行或者在pending! :( 

 通过

kubectl get -f fluentd-es-ds.yaml NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE fluentd-es-v1.22 0 0 0 0 0 beta.kubernetes.io/fluentd-ds-ready=true 2m

查看一下,发现有个NODE-SELECTOR(正常的如下),beta.kubernetes.io/fluentd-ds-ready=true

 

 再

kubectl describe nodes k8s-node-1

发现我的minion节点根本没有这个Label,通过命令打label

kubectl label node k8s-node-1 beta.kubernetes.io/fluentd-ds-ready=true

重新create后,就发现在k8s-node-1中已经存在/var/log/fluentd.log文件了.

  •  创建configmap(坑2)
# tail -f /var/log/fluentd.log2017-03-02 02:27:01 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"2017-03-02 02:27:01 +0000 [info]: starting fluentd-0.12.312017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.3'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.5.0'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kafka' version '0.4.1'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.16'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-s3' version '0.8.0'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td' version '0.10.29'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.2'2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.2'2017-03-02 02:27:01 +0000 [info]: gem 'fluentd' version '0.12.31'2017-03-02 02:27:01 +0000 [info]: adding match pattern="fluent.**" type="null"2017-03-02 02:27:01 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"2017-03-02 02:27:02 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error="Invalid Kubernetes API v1 endpoint https://192.168.0.105:443/api: 401 Unauthorized"2017-03-02 02:27:02 +0000 [info]: process finished code=2562017-03-02 02:27:02 +0000 [warn]: process died within 1 second. exit.

 发现fluentd image是通过443端口去连我的ApiServer的,API Server开启了安全机制,因此需要配置ca_file、client_cert、client_key等key,如果不想重新做images,Kubernetes提供了这一强大的武器,我们可以将新版td-agent.conf制作成kubernetes的configmap资源,并挂载到fluentd pod的相应位置以替换image中默认的td-agent.conf。

td-agent.conf的目录是

/root/kubernetes/cluster/addons/fluentd-elasticsearch/fluentd-es-image

加入ca,client等后如

// td-agent.conf... ...
type kubernetes_metadata ca_file /srv/kubernetes/ca.crt client_cert /srv/kubernetes/kubecfg.crt client_key /srv/kubernetes/kubecfg.key
... ...

需要注意的是

  • 在基于td-agent.conf创建configmap资源之前,需要将td-agent.conf中的注释行都删掉,建议先备份一份(后来发挥了作用)
  • fluentd pod将创建在kube-system下,因此configmap资源也需要创建在kube-system namespace下面,否则kubectl create无法找到对应的configmap。
# kubectl create configmap td-agent-config --from-file=./td-agent.conf -n kube-systemconfigmap "td-agent-config" created# kubectl get configmaps td-agent-config -o yamlapiVersion: v1data:  td-agent.conf: |    
type null
type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos time_format %Y-%m-%dT%H:%M:%S.%NZ tag kubernetes.* format json read_from_head true ... ...

fluentd-es-ds.yaml也要随之做一些改动,主要是增加两个mount:

一个是mount 上面的configmap td-agent-config

另外一个就是mount hostpath:/srv/kubernetes以获取到相关client端的数字证书:

[root@k8s-master fluentd-elasticsearch]# cat fluentd-es-ds.yaml apiVersion: extensions/v1beta1kind: DaemonSetmetadata:  name: fluentd-es-v1.22  namespace: kube-system  labels:    k8s-app: fluentd-es    kubernetes.io/cluster-service: "true"    addonmanager.kubernetes.io/mode: Reconcile    version: v1.22spec:  template:    metadata:      labels:        k8s-app: fluentd-es        kubernetes.io/cluster-service: "true"        version: v1.22      # This annotation ensures that fluentd does not get evicted if the node      # supports critical pod annotation based priority scheme.      # Note that this does not guarantee admission on the nodes (#40573).      annotations:        scheduler.alpha.kubernetes.io/critical-pod: ''    spec:      containers:      - name: fluentd-es        image: gcr.io/google_containers/fluentd-elasticsearch:1.22        command:          - '/bin/sh'          - '-c'          - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'        resources:          limits:            memory: 200Mi          requests:            cpu: 100m            memory: 200Mi        volumeMounts:        - name: varlog          mountPath: /var/log        - name: varlibdockercontainers          mountPath: /var/lib/docker/containers          readOnly: true        - name: td-agent-config          mountPath: /etc/td-agent        - name: tls-files          mountPath: /srv/kubernetes      nodeSelector:        beta.kubernetes.io/fluentd-ds-ready: "true"      terminationGracePeriodSeconds: 30      volumes:      - name: varlog        hostPath:          path: /var/log      - name: varlibdockercontainers        hostPath:          path: /var/lib/docker/containers      - name: td-agent-config        configMap:          name: td-agent-config      - name: tls-files        hostPath:          path: /srv/kubernetes[root@k8s-master fluentd-elasticsearch]#

 

再次创建fluent-es-ds.yaml,然后看minion的/var/log/fluentd.log

......    client_cert /srv/kubernetes/kubecfg.crt    client_key /srv/kubernetes/kubecfg.key    
type elasticsearch log_level info include_tag_key true host elasticsearch-logging port 9200 logstash_format true buffer_chunk_limit 2M buffer_queue_limit 32 flush_interval 5s max_retry_wait 30 disable_retry_limit num_threads 8

 

出现这个基本算是成功了,貌似没问题吧,其实有个坑,可以接下来继续配置elasticsearch log.

 

3.配置elasticsearch

创建elasticsearch,

# kubectl create -f es-controller.yamlreplicationcontroller "elasticsearch-logging-v1" created# kubectl create -f es-service.yamlservice "elasticsearch-logging" createdget pods:kube-system                  elasticsearch-logging-v1-3bzt6          1/1       Running    0          7s        172.16.57.8    10.46.181.146kube-system                  elasticsearch-logging-v1-nvbe1          1/1       Running    0          7s        172.16.99.10   10.47.136.60

 

查看日志

# kubectl logs -f elasticsearch-logging-v1-3bzt6 -n kube-systemF0302 03:59:41.036697       8 elasticsearch_logging_discovery.go:60] kube-system namespace doesn't exist: the server has asked for the client to provide credentials (get namespaces kube-system)goroutine 1 [running]:k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0x19a8100, 0xc400000000, 0xc2, 0x186)... ...main.main()    elasticsearch_logging_discovery.go:60 +0xb53[2017-03-02 03:59:42,587][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] version[2.4.1], pid[16], build[c67dc32/2016-09-27T18:57:55Z][2017-03-02 03:59:42,588][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] initializing ...[2017-03-02 03:59:44,396][INFO ][plugins                  ] [elasticsearch-logging-v1-3bzt6] modules [reindex, lang-expression, lang-groovy], plugins [], sites []... ...[2017-03-02 03:59:44,441][INFO ][env                      ] [elasticsearch-logging-v1-3bzt6] heap size [1007.3mb], compressed ordinary object pointers [true][2017-03-02 03:59:48,355][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] initialized[2017-03-02 03:59:48,355][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] starting ...[2017-03-02 03:59:48,507][INFO ][transport                ] [elasticsearch-logging-v1-3bzt6] publish_address {
172.16.57.8:9300}, bound_addresses {[::]:9300}[2017-03-02 03:59:48,547][INFO ][discovery ] [elasticsearch-logging-v1-3bzt6] kubernetes-logging/7_f_M2TKRZWOw4NhBc4EqA[2017-03-02 04:00:18,552][WARN ][discovery ] [elasticsearch-logging-v1-3bzt6] waited for 30s and no initial state was set by the discovery[2017-03-02 04:00:18,562][INFO ][http ] [elasticsearch-logging-v1-3bzt6] publish_address {
172.16.57.8:9200}, bound_addresses {[::]:9200}[2017-03-02 04:00:18,562][INFO ][node ] [elasticsearch-logging-v1-3bzt6] started

发现错误,无法提供安全的credential,通过在网上参考Tony Bai的技术文档,发现是默认的Service Account的问题,其中原理还需要研究一下。

先run起来再说,解决方案如下:

创建一个新的serviceaccount在kube-system namespace下:

/serviceaccount.yamlapiVersion: v1kind: ServiceAccountmetadata:  name: k8s-efk# kubectl create -f serviceaccount.yaml -n kube-systemserviceaccount "k8s-efk" created# kubectl get serviceaccount -n kube-systemNAME      SECRETS   AGEdefault   1         139dk8s-efk   1         17s

 

修改es-controller.yaml使用service account “k8s-efk”:

[root@k8s-master fluentd-elasticsearch]# cat es-controller.yaml apiVersion: v1kind: ReplicationControllermetadata:  name: elasticsearch-logging-v1  namespace: kube-system  labels:    k8s-app: elasticsearch-logging    version: v1    kubernetes.io/cluster-service: "true"    addonmanager.kubernetes.io/mode: Reconcilespec:  replicas: 2  selector:    k8s-app: elasticsearch-logging    version: v1  template:    metadata:      labels:        k8s-app: elasticsearch-logging        version: v1        kubernetes.io/cluster-service: "true"    spec:      serviceAccount: k8s-efk      containers:      - image: gcr.io/google_containers/elasticsearch:v2.4.1-2        name: elasticsearch-logging        resources:          # need more cpu upon initialization, therefore burstable class          limits:            cpu: 1000m          requests:            cpu: 100m        ports:        - containerPort: 9200          name: db          protocol: TCP        - containerPort: 9300          name: transport          protocol: TCP        volumeMounts:        - name: es-persistent-storage          mountPath: /data        env:        - name: "NAMESPACE"          valueFrom:            fieldRef:              fieldPath: metadata.namespace      volumes:      - name: es-persistent-storage        emptyDir: {}

 

重新创建elasticsearch logging service后,我们再来查看elasticsearch-logging pod的日志,貌似OK,其实也是个坑,呆会我继续说:

[2017-05-22 06:09:12,155][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] version[2.4.1], pid[1], build[c67dc32/2016-09-27T18:57:55Z][2017-05-22 06:09:12,156][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] initializing ...[2017-05-22 06:09:13,657][INFO ][plugins                  ] [elasticsearch-logging-v1-9jjf1] modules [reindex, lang-expression, lang-groovy], plugins [], sites [][2017-05-22 06:09:13,733][INFO ][env                      ] [elasticsearch-logging-v1-9jjf1] using [1] data paths, mounts [[/data (/dev/mapper/cl-root)]], net usable_space [25gb], net total_space [37.2gb], spins? [possibly], types [xfs][2017-05-22 06:09:13,738][INFO ][env                      ] [elasticsearch-logging-v1-9jjf1] heap size [1015.6mb], compressed ordinary object pointers [true][2017-05-22 06:09:21,946][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] initialized[2017-05-22 06:09:21,980][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] starting ...[2017-05-22 06:09:22,442][INFO ][transport                ] [elasticsearch-logging-v1-9jjf1] publish_address {
192.168.10.6:9300}, bound_addresses {[::]:9300}[2017-05-22 06:09:22,560][INFO ][discovery ] [elasticsearch-logging-v1-9jjf1] kubernetes-logging/RY_IOcwSSSeuJNtC2E0W7A[2017-05-22 06:09:30,446][INFO ][cluster.service ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{
9--uDYJOTqegj5ctbbCx_A}{
192.168.10.8}{
192.168.10.8:9300}{master=true}, added {
{elasticsearch-logging-v1-sbcgt}{
9--uDYJOTqegj5ctbbCx_A}{
192.168.10.8}{
192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{
9--uDYJOTqegj5ctbbCx_A}{
192.168.10.8}{
192.168.10.8:9300}{master=true}])[2017-05-22 06:09:30,453][INFO ][http ] [elasticsearch-logging-v1-9jjf1] publish_address {
192.168.10.6:9200}, bound_addresses {[::]:9200}[2017-05-22 06:09:30,465][INFO ][node ] [elasticsearch-logging-v1-9jjf1] started

 

好了,继续.....

 

4.配置kibana

 

根据前辈们的经验,把上面新创建的serviceaccount:k8s-efk显式赋值给kibana-controller.yaml:

[root@k8s-master fluentd-elasticsearch]# cat kibana-controller.yaml apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: kibana-logging  namespace: kube-system  labels:    k8s-app: kibana-logging    kubernetes.io/cluster-service: "true"    addonmanager.kubernetes.io/mode: Reconcilespec:  replicas: 1  selector:    matchLabels:      k8s-app: kibana-logging  template:    metadata:      labels:        k8s-app: kibana-logging    spec:      serviceAccount: k8s-efk      containers:      - name: kibana-logging        image: gcr.io/google_containers/kibana:v4.6.1-1        resources:          # keep request = limit to keep this container in guaranteed class          limits:            cpu: 100m          requests:            cpu: 100m        env:          - name: "ELASTICSEARCH_URL"            value: "http://elasticsearch-logging:9200"          - name: "KIBANA_BASE_URL"            value: "/api/v1/proxy/namespaces/kube-system/services/kibana-logging"        ports:        - containerPort: 5601          name: ui          protocol: TCP[root@k8s-master fluentd-elasticsearch]#

启动kibana,并观察pod日志:

# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-systemELASTICSEARCH_URL=http://elasticsearch-logging:9200server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging{
"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}

kibana启动需要十几分钟。抱歉,本人是在一台8G的笔记本电脑的virtualbox虚拟机上做,之后你将会看到如下日志:

# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-systemELASTICSEARCH_URL=http://elasticsearch-logging:9200server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging{
"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}{
"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["info","optimize"],"pid":6,"message":"Optimization of bundles for kibana and statusPage complete in 588.60 seconds"}{
"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["status","plugin:kibana@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:kbn_vislib_vis_types@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:markdown_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:metric_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:spyModes@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:statusPage@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:table_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}{
"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["listening","info"],"pid":6,"message":"Server running at http://0.0.0.0:5601"}{
"type":"log","@timestamp":"2017-03-02T08:40:11Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}{
"type":"log","@timestamp":"2017-03-02T08:40:14Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from yellow to green - Kibana index ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}

需要注意的是:(这也是坑阿)

  • 必须配置dns,否则会出现http://elasticsearch-logging:9200无法连接的状态
  • 如果不配置dns,那就只能修改control文件把elasticsearch-logging修改成具体的elasticsearch-logging服务的集群ip

通过

kubectl cluster-info

可以获取kibana服务的地址,其实也就是在

https://{API Server external IP}:{API Server secure port}/api/v1/proxy/namespaces/kube-system/services/kibana-logging/app/kibana#/settings/indices/

 在下面这个界面中发现无论怎么搞create都不出现,无法添加index,当然除了直接输入*可以create,但是进去没有任何pod的信息,问题大了!!!!

 

 

 

 5.定位问题过程

仔细对照了Tony Bai的搭建文档,有参考了无数前辈的无数的帖子,都想换个CentOS 6.5版本重新来过了,无奈CentOS 6.5暂时也没装上kubernetes集群,所以放弃。

对照了一下日志,出问题的地方很可能是:

  • 根本没有日志
  • fluentd服务运行问题
  • elasticsearch logging日志收集问题

 

  • 修改elasticsearch logging的images(坑阿)

仔细看fluentd的日志/etc/log/fluent.log,发现根本就是没有任何日志的输出,排除fluentd连接elasticsearch logging:9200连接不上的问题,

感觉是elasticseach logging的自己的问题,然后对比tonybai的elasticsearch的日志,发现我的只有

[2017-05-22 06:09:30,446][INFO ][cluster.service          ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{
9--uDYJOTqegj5ctbbCx_A}{
192.168.10.8}{
192.168.10.8:9300}{master=true}, added {
{elasticsearch-logging-v1-sbcgt}{
9--uDYJOTqegj5ctbbCx_A}{
192.168.10.8}{
192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{
9--uDYJOTqegj5ctbbCx_A}{
192.168.10.8}{
192.168.10.8:9300}{master=true}])[2017-05-22 06:09:30,453][INFO ][http ] [elasticsearch-logging-v1-9jjf1] publish_address {
192.168.10.6:9200}, bound_addresses {[::]:9200}[2017-05-22 06:09:30,465][INFO ][node ] [elasticsearch-logging-v1-9jjf1] started

就结束了,而tonybai的是

[2017-03-02 08:26:56,955][INFO ][http                     ] [elasticsearch-logging-v1-dklui] publish_address {
172.16.57.8:9200}, bound_addresses {[::]:9200}[2017-03-02 08:26:56,956][INFO ][node ] [elasticsearch-logging-v1-dklui] started[2017-03-02 08:26:57,157][INFO ][gateway ] [elasticsearch-logging-v1-dklui] recovered [0] indices into cluster_state[2017-03-02 08:27:05,378][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [][2017-03-02 08:27:06,360][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [][2017-03-02 08:27:07,163][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-2017.03.01][3], [logstash-2017.03.01][3]] ...]).[2017-03-02 08:27:07,354][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] create_mapping [fluentd][2017-03-02 08:27:07,988][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] create_mapping [fluentd][2017-03-02 08:27:09,578][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2017.03.02][4]] ...]).

区别如下:

  • recover cluster的信息
  • 有logstash的信息

感觉很有可能是image问题,因此把原来的image修改成和tonybai保持一致,把es-control.yaml中官方的v2.4.1-2修改为

bigwhite/elasticsearch:v2.4.1-1

然后启动,发现果然有了recover cluster的信息。但仍然没有logstash的信息。

 

  • 定位日志问题(又是个大坑)

又绕回到第一个问题,明明通过kubectl logs满屏幕的日志

[root@k8s-master fluentd-elasticsearch]# kubectl logs helloworld-service-4d72j..JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom.CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar.PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle.****************************************************  To start WebLogic Server, use a username and   **  password assigned to an admin-level user.  For **  server administration, use the WebLogic Server **  console at http://hostname:port/console        ****************************************************starting weblogic with Java version:java version "1.8.0_101"Java(TM) SE Runtime Environment (build 1.8.0_101-b13)Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)Starting WLS with line:/usr/java/jdk1.8.0_101/bin/java -server   -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy  -Dweblogic.ProductionModeEnabled=true   -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed  -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server     -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true  weblogic.Server
May 24, 2017 2:28:31 AM weblogic.wsee.WseeCoreMessages logWseeServiceStartingINFO: The Wsee Service is starting
[root@k8s-master fluentd-elasticsearch]#

 

再去minion机器上通过docker logs id也是满屏的日志阿!

[root@k8s-node-1 ~]# docker logs bec3e02b2490..JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom.CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar.PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle.****************************************************  To start WebLogic Server, use a username and   **  password assigned to an admin-level user.  For **  server administration, use the WebLogic Server **  console at http://hostname:port/console        ****************************************************starting weblogic with Java version:java version "1.8.0_101"Java(TM) SE Runtime Environment (build 1.8.0_101-b13)Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)Starting WLS with line:/usr/java/jdk1.8.0_101/bin/java -server   -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy  -Dweblogic.ProductionModeEnabled=true   -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed  -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server     -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true  weblogic.Server...........

 

因为/var/log/fluentd.log老是停留在配置上没有刷动,怀疑是configmap的配置问题,找出之前备份的td-agent.conf一看

# The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log# record & add labels to the log record if properly configured. This enables users# to filter & search logs on any metadata.# For example a Docker container's logs might be in the directory:##  /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b## and in the file:##  997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log## where 997599971ee6... is the Docker ID of the running container.# The Kubernetes kubelet makes a symbolic link to this file on the host machine# in the /var/log/containers directory which includes the pod name and the Kubernetes# container name:##    synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

终于发现问题, 原来都是通过/var/lib/docker/containers/目录去找,然而我的docker下面容器没有任何log文件。

 仔细研究了一下docker,原来所有的docker日志都journal到系统日志 /var/log/messages下了.为什么呢? 因为经常有人说docker日志太多导致container容器增长比较快,所以都通过系统的journal进行统一处理。

修改/etc/sysconfig/docker配置文件,把原来的journal改回到当前json.log方式.

#OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false'OPTIONS='--selinux-enabled --log-driver=json-file --signature-verification=false'

改完后就发现container下面有很多log文件了.

然后回到/var/log/fluentd.log文件,发现终于满屏的日志滚动,输出正常了

 

2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-aca728523bc307598917d78b2526e718e6c7fdbb38b70c05900d2439399efa10.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_POD-ca013e9ab31b825cd4b85ab4700fad2fcaafd5f39c572778d10d438012ea4435.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-2eb78ece8c2b5c222313ab4cfb53ea6ec32f54e1b7616f729daf48b01d393b65.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_POD-1dcbbc2ef71f7f542018069a1043a122117a97378c19f03ddb95b8a71dab4637.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_weblogichelloworld-d7229e5c23c6bf7582ed6559417ba24d99e33e44a68a6079159b4792fe05a673.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_weblogichelloworld-71d1d7252dd7504fd45351d714d21c3c615facc5e2650553c68c0bf359e8434a.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_kube2sky-c77121c354459f22712b0a99623eff1590f4fdb1a5d3ad2db09db000755f9c2c.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_skydns-f3c0fbf4ea5cd840c968a807a40569042c90de08f7722e7344282845d5782a20.log2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_fluentd-es-93795904ff4870758441dd7288972d4967ffac18f2f25272b12e99ea6b692d44.log2017-05-24 05:45:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-05-24 05:44:24 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-logging\", :port=>9200, :scheme=>\"http\"})!" plugin_id="object:3f986d0a5150"  2017-05-24 05:45:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:122:in `client'

 

把所有的组件都启动起来

[root@k8s-master fluentd-elasticsearch]# kubectl get pods -n kube-systemNAME                              READY     STATUS    RESTARTS   AGEelasticsearch-logging-v1-1xwnq    1/1       Running   0          29selasticsearch-logging-v1-gx6lc    1/1       Running   0          29sfluentd-es-v1.22-351lz            1/1       Running   1          3hkibana-logging-3659310023-gcwrn   1/1       Running   0          15skube-dns-v11-x0vr3                4/4       Running   28         1d

 

访问kibana

[root@k8s-master fluentd-elasticsearch]# kubectl cluster-infoKubernetes master is running at http://localhost:8080Elasticsearch is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/elasticsearch-loggingKibana is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kibana-loggingKubeDNS is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kube-dnsTo further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

 

终于出现绿色的create

create完后进入

 

点开kubernetes pod name,可以看到集群内部所有的pod的日志

如果我们只是关注helloworld-service,选择旁边的+,然后可以看到每条weblogic日志的输出.

好了,到目前为止配置完成。

 

6.填坑总结:

  • 这次为了配置EFK把DNS,ServiceAccount都配置了一便,也算是收获啦。
  • 对kubernetes内部集群的机制有更深的理解,也慢慢的帮助我们加速解决问题吧。

 

最后真心感谢网上那些尚未谋面的大师的帮助:

http://tonybai.com/2017/03/03/implement-kubernetes-cluster-level-logging-with-fluentd-and-elasticsearch-stack/

http://rootsongjc.github.io/blogs/kubernetes-fluentd-elasticsearch-installation/

http://www.tothenew.com/blog/how-to-install-kubernetes-on-centos/

http://blog.csdn.net/wenwst/article/details/53908144

 

转载于:https://www.cnblogs.com/ericnie/p/6897348.html

你可能感兴趣的文章
Java 日期往后推迟n天
查看>>
Web应用漏洞评估工具Paros
查看>>
Git 和 Github 使用指南
查看>>
20180925-4 单元测试
查看>>
mysql的数据存储
查看>>
[转载] Activiti Tenant Id 字段释疑
查看>>
[Java 8] (8) Lambda表达式对递归的优化(上) - 使用尾递归 .
查看>>
SQL Server-聚焦移除Bookmark Lookup、RID Lookup、Key Lookup提高SQL查询性能
查看>>
最小权限的挑战
查看>>
jquery 视觉特效(水平滚动图片)
查看>>
SVG笔记
查看>>
linux下使用dd命令写入镜像文件到u盘
查看>>
物联网架构成长之路(8)-EMQ-Hook了解、连接Kafka发送消息
查看>>
2018-2019-1 20165234 20165236 实验二 固件程序设计
查看>>
IDEA的GUI连接数据库写入SQL语句的问题总结
查看>>
Xpath在选择器中正确,在代码中返回的是空列表问题
查看>>
leecode第一百九十八题(打家劫舍)
查看>>
【BZOJ 1233】 [Usaco2009Open]干草堆tower (单调队列优化DP)
查看>>
07-3. 数素数 (20)
查看>>
写一个欢迎页node统计接口Py脚本(邮件,附件)-py
查看>>