Amazon web services AWS Cloudwatch公制报警在第一次后未触发
我有一个警报在日志中查找Amazon web services AWS Cloudwatch公制报警在第一次后未触发,amazon-web-services,amazon-cloudformation,amazon-cloudwatch,amazon-cloudwatchlogs,Amazon Web Services,Amazon Cloudformation,Amazon Cloudwatch,Amazon Cloudwatchlogs,我有一个警报在日志中查找错误消息,它确实触发了警报状态。但它不会被重置,并保持在报警状态下的。我把报警动作作为SNS主题,这反过来会触发电子邮件。所以基本上在第一个错误之后,我看不到任何后续的电子邮件。下面的模板配置出了什么问题 "AppErrorMetric": { "Type": "AWS::Logs::MetricFilter", "Properties": { "LogGroupName": { "Ref": "AppServerLG" },
错误
消息,它确实触发了警报状态。但它不会被重置,并保持在报警状态下的。我把报警动作作为SNS主题,这反过来会触发电子邮件。所以基本上在第一个错误之后,我看不到任何后续的电子邮件。下面的模板配置出了什么问题
"AppErrorMetric": {
"Type": "AWS::Logs::MetricFilter",
"Properties": {
"LogGroupName": {
"Ref": "AppServerLG"
},
"FilterPattern": "[error]",
"MetricTransformations": [
{
"MetricValue": "1",
"MetricNamespace": {
"Fn::Join": [
"",
[
{
"Ref": "ApplicationEndpoint"
},
"/metrics/AppError"
]
]
},
"MetricName": "AppError"
}
]
}
},
"AppErrorAlarm": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"ActionsEnabled": "true",
"AlarmName": {
"Fn::Join": [
"",
[
{
"Ref": "AppId"
},
",",
{
"Ref": "AppServerAG"
},
":",
"AppError",
",",
"MINOR"
]
]
},
"AlarmDescription": {
"Fn::Join": [
"",
[
"service is throwing error. Please check logs.",
{
"Ref": "AppServerAG"
},
"-",
{
"Ref": "AppId"
}
]
]
},
"MetricName": "AppError",
"Namespace": {
"Fn::Join": [
"",
[
{
"Ref": "ApplicationEndpoint"
},
"metrics/AppError"
]
]
},
"Statistic": "Sum",
"Period": "300",
"EvaluationPeriods": "1",
"Threshold": "1",
"AlarmActions": [{
"Fn::GetAtt": [
"VPCInfo",
"SNSTopic"
]
}],
"ComparisonOperator": "GreaterThanOrEqualToThreshold"
}
}
您的问题是两个因素的组合:
您的度量仅在发现错误时发出,它是一个稀疏度量,因此错误时会出现1,但如果不存在错误,则不会发出0
默认情况下,CloudWatch报警配置为TreatMissingData
为missing
说:
对于每个报警,您可以指定CloudWatch来处理丢失的数据
分为以下任一点:
- 不违反–缺失的数据点被视为“良好”且在阈值范围内
- 违反–丢失的数据点被视为“坏”并违反阈值
- 忽略–保持当前报警状态
在评估是否改变状态时,警报不考虑丢失的数据点。
在报警配置中添加“TreatMissing”:“NotBreaking”
参数将导致CloudWatch将丢失的数据点视为未违反,并将报警转换为OK:
"AppErrorAlarm": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"ActionsEnabled": "true",
"AlarmName": {
"Fn::Join": [
"",
[
{
"Ref": "AppId"
},
",",
{
"Ref": "AppServerAG"
},
":",
"AppError",
",",
"MINOR"
]
]
},
"AlarmDescription": {
"Fn::Join": [
"",
[
"service is throwing error. Please check logs.",
{
"Ref": "AppServerAG"
},
"-",
{
"Ref": "AppId"
}
]
]
},
"MetricName": "AppError",
"Namespace": {
"Fn::Join": [
"",
[
{
"Ref": "ApplicationEndpoint"
},
"metrics/AppError"
]
]
},
"Statistic": "Sum",
"Period": "300",
"EvaluationPeriods": "1",
"Threshold": "1",
"TreatMissingData": "notBreaching",
"AlarmActions": [{
"Fn::GetAtt": [
"VPCInfo",
"SNSTopic"
]
}],
"ComparisonOperator": "GreaterThanOrEqualToThreshold"
}
}