系列文章

  • Grafana 系列文章

AWS Cloudwatch 数据源

对于 AWS Cloudwatch, 主要在于 3 种不同的认证方式:

  • AWS SDK Default
  • IAM Role
  • AK&SK
  • Credentials file

现在推荐的是使用 IAM Role 的认证方式,避免了密钥泄露的风险。


(资料图片仅供参考)

但是特别要注意的是,要读取 CloudWatch 指标和 EC2 标签 (tags)、实例、区域和告警,你必须通过 IAM 授予 Grafana 权限。你可以将这些权限附加到你在 AWS 认证中配置的 IAM role 或 IAM 用户。

IAM policy 示例如下:

Metrics-only:

{  "Version": "2012-10-17",  "Statement": [    {      "Sid": "AllowReadingMetricsFromCloudWatch",      "Effect": "Allow",      "Action": [        "cloudwatch:DescribeAlarmsForMetric",        "cloudwatch:DescribeAlarmHistory",        "cloudwatch:DescribeAlarms",        "cloudwatch:ListMetrics",        "cloudwatch:GetMetricData",        "cloudwatch:GetInsightRuleReport"      ],      "Resource": "*"    },    {      "Sid": "AllowReadingTagsInstancesRegionsFromEC2",      "Effect": "Allow",      "Action": ["ec2:DescribeTags", "ec2:DescribeInstances", "ec2:DescribeRegions"],      "Resource": "*"    },    {      "Sid": "AllowReadingResourcesForTags",      "Effect": "Allow",      "Action": "tag:GetResources",      "Resource": "*"    }  ]}

Logs-only:

{  "Version": "2012-10-17",  "Statement": [    {      "Sid": "AllowReadingLogsFromCloudWatch",      "Effect": "Allow",      "Action": [        "logs:DescribeLogGroups",        "logs:GetLogGroupFields",        "logs:StartQuery",        "logs:StopQuery",        "logs:GetQueryResults",        "logs:GetLogEvents"      ],      "Resource": "*"    },    {      "Sid": "AllowReadingTagsInstancesRegionsFromEC2",      "Effect": "Allow",      "Action": ["ec2:DescribeTags", "ec2:DescribeInstances", "ec2:DescribeRegions"],      "Resource": "*"    },    {      "Sid": "AllowReadingResourcesForTags",      "Effect": "Allow",      "Action": "tag:GetResources",      "Resource": "*"    }  ]}

Metrics and Logs:

{  "Version": "2012-10-17",  "Statement": [    {      "Sid": "AllowReadingMetricsFromCloudWatch",      "Effect": "Allow",      "Action": [        "cloudwatch:DescribeAlarmsForMetric",        "cloudwatch:DescribeAlarmHistory",        "cloudwatch:DescribeAlarms",        "cloudwatch:ListMetrics",        "cloudwatch:GetMetricData",        "cloudwatch:GetInsightRuleReport"      ],      "Resource": "*"    },    {      "Sid": "AllowReadingLogsFromCloudWatch",      "Effect": "Allow",      "Action": [        "logs:DescribeLogGroups",        "logs:GetLogGroupFields",        "logs:StartQuery",        "logs:StopQuery",        "logs:GetQueryResults",        "logs:GetLogEvents"      ],      "Resource": "*"    },    {      "Sid": "AllowReadingTagsInstancesRegionsFromEC2",      "Effect": "Allow",      "Action": ["ec2:DescribeTags", "ec2:DescribeInstances", "ec2:DescribeRegions"],      "Resource": "*"    },    {      "Sid": "AllowReadingResourcesForTags",      "Effect": "Allow",      "Action": "tag:GetResources",      "Resource": "*"    }  ]}

跨账号可观测性:

{  "Version": "2012-10-17",  "Statement": [    {      "Action": ["oam:ListSinks", "oam:ListAttachedLinks"],      "Effect": "Allow",      "Resource": "*"    }  ]}

AWS Cloudwatch 数据源配置示例

几种认证方式的 AWS CLoudwatch 配置示例如下:

AWS SDK(default):

apiVersion: 1datasources:  - name: CloudWatch    type: cloudwatch    jsonData:      authType: default      defaultRegion: eu-west-2

使用 Credentials 配置文件:

apiVersion: 1datasources:  - name: CloudWatch    type: cloudwatch    jsonData:      authType: credentials      defaultRegion: eu-west-2      customMetricsNamespaces: "CWAgent,CustomNameSpace"      profile: secondary

使用 AK&SK:

apiVersion: 1datasources:  - name: CloudWatch    type: cloudwatch    jsonData:      authType: keys      defaultRegion: eu-west-2    secureJsonData:      accessKey: ""      secretKey: ""

使用 AWS SDK Default 和 IAM Role 的 ARM 来 Assume:

apiVersion: 1datasources:  - name: CloudWatch    type: cloudwatch    jsonData:      authType: default      assumeRoleArn: arn:aws:iam::123456789012:root      defaultRegion: eu-west-2

Cloudwatch 自带仪表板

Cloudwatch 自带的几个仪表板都不太好用,建议使用 monitoringartist/grafana-aws-cloudwatch-dashboards 替代。

创建告警的查询

告警需要返回 numeric 数据的查询,而 CloudWatch Logs 支持这种查询。例如,你可以通过使用 stats命令来启用告警。

这也是一个有效的查询,用于对包括文本 "Exception" 的消息发出告警:

filter @message like /Exception/    | stats count(*) as exceptionCount by bin(1h)    | sort exceptionCount desc

跨账户的可观察性

CloudWatch 插件使您能够跨区域账户监控应用程序并排除故障。利用跨账户的可观察性,你可以无缝地搜索、可视化和分析指标和日志,而不必担心账户的界限。

要使用这个功能,请在 AWS 控制台的 Cloudwatch 设置下,配置一个 monitoring 和 source 账户,然后按照上文所述添加必要的 IAM 权限。

关键词: