Prometheus架构介绍

image-20230918144122475

一、角色分配

主机名 IP 应用
docker01 10.0.0.101 Prometheus、Grafana、redis_exporter、node-exporter、cadvisor
docker02 10.0.0.102 node-exporter、cadvisor
  • Prometheus 采集数据
  • Grafana 用于图表展示
  • redis_exporter 用于收集 redis 的 metrics
  • node-exporter 用于收集操作系统和硬件信息的 metrics
  • cadvisor 用于收集 docker 的相关 metrics

二、安装 Docker

我通常使用如下命令安装最新版的 docker

wget -O /etc/yum.repos.d/docker-ce.repo https://download.docker.com/linux/centos/docker-ce.repo && yum install -y docker-ce && systemctl enable docker.service --now

2.1、安装 Docker-Compose

可以使用如下命令安装最新版的 docker-compose

curl -L https://github.com/docker/compose/releases/download/1.24.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

上诉方式下载如果过慢可以使用下面链接,浏览器下载

https://github.com/docker/compose/releases/download/1.24.1/docker-compose-Linux-x86_64

下载完成后上传到服务器上,然后执行如下命令:

mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

三、部署 Prometheus 和 Grafana

3.1、新增 Prometheus 配置文件

设置 Prometheus 的监控和警报规则,docker01创建(Prometheus 和 Grafana安装在哪台服务器上就谁创建)

首先,创建 mkdir -p /data/prometheus/ 目录,然后创建 vim /data/prometheus/prometheus.yml,填入如下内容:

global:
  scrape_interval:     15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
  - static_configs:
    - targets: ['10.0.0.101:9093']

rule_files:
  - "node_down.yml"

scrape_configs:

  - job_name: 'prometheus'
    static_configs:
    - targets: ['10.0.0.101:9090']

  - job_name: 'redis'
    static_configs:
    - targets: ['10.0.0.101:9121']
      labels:
        instance: redis

  - job_name: 'node'
    scrape_interval: 8s
    static_configs:
    - targets: ['10.0.0.101:9100', '10.0.0.102:9100']

  - job_name: 'cadvisor'
    scrape_interval: 8s
    static_configs:
    - targets: ['10.0.0.101:8088', '10.0.0.102:8088']

接着在 10.0.0.101 和 10.0.0.102 创建告警规则组 vim /data/prometheus/node_down.yml, 添加如下内容:

groups:
- name: node_down
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      user: test
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

四、创建 docker-compose

在 docker01 中 /data/prometheus/目录中创建vim /data/prometheus/docker-compose-prometheus.yml, 添加如下内容:

version: '2'

networks:
    monitor:
        driver: bridge

services:
    prometheus:
        image: prom/prometheus
        container_name: prometheus
        hostname: prometheus
        restart: always
        volumes:
            - /data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
            - /data/prometheus/node_down.yml:/etc/prometheus/node_down.yml
        ports:
            - "9090:9090"
        networks:
            - monitor

    grafana:
        image: grafana/grafana
        container_name: grafana
        hostname: grafana
        restart: always
        ports:
            - "3000:3000"
        networks:
            - monitor
    redis-exporter:
        image: oliver006/redis_exporter
        container_name: redis_exporter
        hostname: redis_exporter
        restart: always
        ports:
            - "9121:9121"
        networks:
            - monitor
        command:
            - '--redis.addr=redis://10.0.0.101:9121'
    node-exporter:
        image: quay.io/prometheus/node-exporter
        container_name: node-exporter
        hostname: node-exporter
        restart: always
        ports:
            - "9100:9100"
        networks:
            - monitor

    cadvisor:
        image: google/cadvisor:latest
        container_name: cadvisor
        hostname: cadvisor
        restart: always
        volumes:
            - /:/rootfs:ro
            - /var/run:/var/run:rw
            - /sys:/sys:ro
            - /var/lib/docker/:/var/lib/docker:ro
        ports:
            - "8088:8080"
        networks:
            - monitor

在 docker02 中 /data/prometheus/目录中创建 vim /data/prometheus/docker-compose.yml, 添加如下内容:

version: '2'

networks:
    monitor:
        driver: bridge

services:
    node-exporter:
        image: quay.io/prometheus/node-exporter
        container_name: node-exporter
        hostname: node-exporter
        restart: always
        ports:
            - "9100:9100"
        networks:
            - monitor

    cadvisor:
        image: google/cadvisor:latest
        container_name: cadvisor
        hostname: cadvisor
        restart: always
        volumes:
            - /:/rootfs:ro
            - /var/run:/var/run:rw
            - /sys:/sys:ro
            - /var/lib/docker/:/var/lib/docker:ro
        ports:
            - "8088:8080"
        networks:
            - monitor

docker01添加对应的端口添加到防火墙策略:

#docker01执行
[root@docker prometheus]# systemctl start firewalld
firewall-cmd --zone=public --add-port=9100/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --zone=public --add-port=9121/tcp --permanent
firewall-cmd --zone=public --add-port=3000/tcp --permanent
firewall-cmd --zone=public --add-port=9090/tcp --permanent
firewall-cmd --reload
#可通过如下命令查看端口策略是否已经生效
[root@docker prometheus]# firewall-cmd --permanent --zone=public --list-ports
[root@docker prometheus]# systemctl stop firewalld
#重启docker服务
[root@docker prometheus]# systemctl restart docker

4.1、启动 docker-compose

使用下面的命令启动 docker-compose 定义的容器

# docker01
docker-compose -f /data/prometheus/docker-compose-prometheus.yml up -d

# docker02
docker-compose up -d

输出如下内容即代表启动成功:

Creating network "prometheus_monitor" with driver "bridge"
Creating cadvisor       ... done
Creating prometheus     ... done
Creating node-exporter  ... done
Creating redis_exporter ... done
Creating grafana        ... done

也可通过 docker ps 命令查看是否启动成功。如果要关闭并删除以上 5 个容器,只需要执行如下命令即可:

docker-compose -f /data/prometheus/docker-compose-prometheus.yml down

同样也会输出如下日志:

Stopping cadvisor       ... done
Stopping node-exporter  ... done
Stopping grafana        ... done
Stopping redis_exporter ... done
Stopping prometheus     ... done
Removing cadvisor       ... done
Removing node-exporter  ... done
Removing grafana        ... done
Removing redis_exporter ... done
Removing prometheus     ... done
Removing network prometheus_monitor
#浏览器访问
10.0.0.101:9090/targets

image-20230916111113598

4.2、配置 Grafana

打开 http://10.0.101:3000 使用默认账号密码 admin/admin 登录并修改密码后

image-20230916111240142

修改密码为:123456

image-20230916111605897

默认进来是创建数据库的页面,在如下图所示中,选择 Prometheus。

image-20230916111752384

image-20230916111829154

选择完成后,打开新的页面,在 HTTP 的 URL 中输入 Prometheus 的地址 http://10.0.0.101:9090。点击保存并测试

image-20230916134333438

image-20230916134439140

现在数据源已经接通了,只剩下漂亮的报表了,可以选择在 https://grafana.com/grafana/dashboards/ 模版,并将其 json 文件下载下来。

点击 Grafana 搜索栏中的lmport dashboard

image-20230916135259178

推荐使用 11277、12633

这里使用的 主机模板:12633 容器:11277

image-20230916140126977

image-20230916135440516

主机信息

image-20230916140010857

容器信息:

image-20230918151034185

查看容器监控模板的时候,会出现如下问题

image-20230918151233467

那是因为使用的别人画好的模板。存在一定的版本差异,导致一些语法上的不同,使其识别不了

image-20230918151441187

image-20230918151505961

#变量名              变量值
# 获取特定项目(project)的容器最后一次出现的所有标签值
project label_values(container_last_seen, project)
# 获取特定项目(project)和服务器(server)的容器最后一次出现的所有标签值
server label_values(container_last_seen{project=~"$project"}, server)
# 获取特定项目(project)、服务器(server)和实例(instance)的容器最后一次出现的所有标签值
instance label_values(container_last_seen{project=~"$project", server=~"$server"}, instance)

# container_last_seen:key值

复制project的key值去Prometheus查看是否存在

image-20230918152320725

image-20230918152710354

通过如上图,不难发现,已经没有server的值了,所以我们可以删除此行

#修改为如下
job   label_values(container_last_seen, job)    
instance  label_values(container_last_seen{job=~"$job"}, instance)

image-20230918161451111

image-20230918161626045

image-20230918161800894

image-20230918161941411

右上角save保存退出

image-20230918162127651

熊猫头表情包合集|你这样我很反感_问号

发现啥都没有了,

那是因为修改了project变量名称,且只修改了两处语句,其他使用的地方还是project,所以调取的变量值为空,导致
网页显示异常

image-20230918163428962

再次查看的时候,会发现主机选择还是居然直接没有了

image-20230918163502099

那是因为在Show on dashboard选项为什么都不显示

image-20230918164007883

保存退出,再次刷新访问

image-20230918164108770