Prometheus alertmanager由于“超出上下文截止日期”而无法发送通知”

我为prometheus-operator配置了prometheus-msteams图表,用于监视和警报k8s集群。

但是所有通知均未正确定向到MSteams频道。如果我有6个正在触发的警报,我可以在alertmanager的用户界面中看到它们,但是只有一个或两个会发送到MS团队频道。

我可以在alertmanager窗格中看到此日志:

C:\monitoring>kubectl logs alertmanager-monitor-prometheus-operato-alertmanager-0 -c alertmanager

level=info ts=2019-11-04T09:16:47.358Z caller=main.go:217 msg="Starting Alertmanager" version="(version=0.19.0,branch=HEAD,revision=7aa5d19fea3f58e3d27dbdeb0f2883037168914a)"
level=info ts=2019-11-04T09:16:47.358Z caller=main.go:218 build_context="(go=go1.12.8,user=root@587d0268f963,date=20190903-15:01:40)"
level=warn ts=2019-11-04T09:16:47.553Z caller=cluster.go:228 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to resolve alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc on 169.254.25.10:53: no such host\n\n"
level=info ts=2019-11-04T09:16:47.553Z caller=cluster.go:230 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-11-04T09:16:47.553Z caller=main.go:308 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to resolve alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc on 169.254.25.10:53: no such host\n\n"
level=info ts=2019-11-04T09:16:47.553Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-11-04T09:16:47.597Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-11-04T09:16:47.598Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-11-04T09:16:47.601Z caller=main.go:466 msg=Listening address=:9093
level=info ts=2019-11-04T09:16:49.554Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000149822s
level=info ts=2019-11-04T09:16:57.555Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.001110685s

level=error ts=2019-11-04T09:38:02.472Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:38:02.472Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=4 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"
level=error ts=2019-11-04T09:43:02.472Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:43:02.472Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=5 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"
level=error ts=2019-11-04T09:48:02.473Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:48:02.473Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=5 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"
level=error ts=2019-11-04T09:53:02.473Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:53:02.473Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=5 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"

如何解决此错误?

编辑:

该设置使用prometheus-msteams作为Webhook,将警报通知从alertmanager重定向到MSTeams通道。

prometheus-msteams容器日志也有一些错误:

C:\> kubectl logs prometheus-msteams-564bc7d99c-dpzsm

time="2019-11-06T06:45:14Z" level=info msg="Version: v1.1.4,Commit: d47a7ab,Branch: HEAD,Build Date: 2019-08-04T17:17:06+0000"
time="2019-11-06T06:45:14Z" level=info msg="Parsing the message card template file: /etc/template/card.tmpl"
time="2019-11-06T06:45:15Z" level=warning msg="If the 'config' flag is used,the 'webhook-url' and 'request-uri' flags will be ignored."
time="2019-11-06T06:45:15Z" level=info msg="Parsing the configuration file: /etc/config/connectors.yaml"
time="2019-11-06T06:45:15Z" level=info msg="Creating the server request path \"/alertmanager\" with webhook \"https://outlook.office.com/webhook/00ce0266-7013-4d53-a20f-115ece04042d@9afb1f8a-2192-45ba-b0a1-6b193c758e24/IncomingWebhook/43c3d745ff5e426282f1bc6b5e79bfea/8368b12d-8ac9-4832-b7b5-b337ac267220\""
time="2019-11-06T06:45:15Z" level=info msg="prometheus-msteams server started listening at 0.0.0.0:2000"

time="2019-11-06T07:01:07Z" level=info msg="/alertmanager received a request"
time="2019-11-06T07:01:07Z" level=debug msg="Prometheus Alert: {\"receiver\":\"prometheus-msteams\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"deployment\":\"storagesvc\",\"endpoint\":\"http\",\"instance\":\"10.233.108.72:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"fission\",\"pod\":\"monitor-kube-state-metrics-856bc9455b-7z5qx\",\"prometheus\":\"monitoring/monitor-prometheus-operato-prometheus\",\"service\":\"monitor-kube-state-metrics\",\"severity\":\"critical\"},\"annotations\":{\"message\":\"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes.\",\"runbook_url\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch\"},\"startsAt\":\"2019-11-06T07:00:32.453590324Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"http://monitor-prometheus-operato-prometheus.monitoring:9090/graph?g0.expr=kube_deployment_spec_replicas%7Bjob%3D%22kube-state-metrics%22%7D+%21%3D+kube_deployment_status_replicas_available%7Bjob%3D%22kube-state-metrics%22%7D\\u0026g0.tab=1\"},{\"status\":\"firing\",\"labels\":{\"alertname\":\"KubepodNotReady\",\"pod\":\"storagesvc-5bff46b69b-vfdrd\",\"annotations\":{\"message\":\"pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes.\",\"runbook_url\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready\"},\"generatorURL\":\"http://monitor-prometheus-operato-prometheus.monitoring:9090/graph?g0.expr=sum+by%28namespace%2C+pod%29+%28kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cphase%3D~%22Failed%7CPending%7CUnknown%22%7D%29+%3E+0\\u0026g0.tab=1\"}],\"groupLabels\":{\"namespace\":\"fission\",\"commonLabels\":{\"namespace\":\"fission\",\"commonAnnotations\":{},\"externalURL\":\"http://monitor-prometheus-operato-alertmanager.monitoring:9093\",\"version\":\"4\",\"groupKey\":\"{}:{namespace=\\\"fission\\\",severity=\\\"critical\\\"}\"}"
time="2019-11-06T07:01:07Z" level=debug msg="Alert rendered in template file: \r\n{\r\n  \"@type\": \"MessageCard\",\r\n  \"@context\": \"http://schema.org/extensions\",\r\n  \"themeColor\": \"8C1A1A\",\r\n  \"summary\": \"\",\r\n  \"title\": \"Prometheus Alert (firing)\",\r\n  \"sections\": [ \r\n    {\r\n      \"activityTitle\": \"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)\",\r\n      \"facts\": [\r\n        {\r\n          \"name\": \"message\",\r\n          \"value\": \"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes.\"\r\n        },\r\n        {\r\n          \"name\": \"runbook\\\\_url\",\r\n          \"value\": \"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch\"\r\n        },\r\n        {\r\n          \"name\": \"alertname\",\r\n          \"value\": \"KubeDeploymentReplicasMismatch\"\r\n        },\r\n        {\r\n          \"name\": \"deployment\",\r\n          \"value\": \"storagesvc\"\r\n        },\r\n        {\r\n          \"name\": \"endpoint\",\r\n          \"value\": \"http\"\r\n        },\r\n        {\r\n          \"name\": \"instance\",\r\n          \"value\": \"10.233.108.72:8080\"\r\n        },\r\n        {\r\n          \"name\": \"job\",\r\n          \"value\": \"kube-state-metrics\"\r\n        },\r\n        {\r\n          \"name\": \"namespace\",\r\n          \"value\": \"fission\"\r\n        },\r\n        {\r\n          \"name\": \"pod\",\r\n          \"value\": \"monitor-kube-state-metrics-856bc9455b-7z5qx\"\r\n        },\r\n        {\r\n          \"name\": \"prometheus\",\r\n          \"value\": \"monitoring/monitor-prometheus-operato-prometheus\"\r\n        },\r\n        {\r\n          \"name\": \"service\",\r\n          \"value\": \"monitor-kube-state-metrics\"\r\n        },\r\n        {\r\n          \"name\": \"severity\",\r\n          \"value\": \"critical\"\r\n        }\r\n      ],\r\n      \"markdown\": true\r\n    },\r\n    {\r\n      \"activityTitle\": \"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)\",\r\n          \"value\": \"pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes.\"\r\n        },\r\n          \"value\": \"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready\"\r\n        },\r\n          \"value\": \"KubepodNotReady\"\r\n        },\r\n          \"value\": \"storagesvc-5bff46b69b-vfdrd\"\r\n        },\r\n      \"markdown\": true\r\n    }\r\n  ]\r\n}\r\n"
time="2019-11-06T07:01:07Z" level=debug msg="Size of message is 1714 Bytes (~1 KB)"
time="2019-11-06T07:01:07Z" level=info msg="Created a card for microsoft Teams /alertmanager"
time="2019-11-06T07:01:07Z" level=debug msg="Teams message cards: [{\"@type\":\"MessageCard\",\"@context\":\"http://schema.org/extensions\",\"themeColor\":\"8C1A1A\",\"summary\":\"\",\"title\":\"Prometheus Alert (firing)\",\"sections\":[{\"activityTitle\":\"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)\",\"facts\":[{\"name\":\"message\",\"value\":\"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes.\"},{\"name\":\"runbook\\\\_url\",\"value\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch\"},{\"name\":\"alertname\",\"value\":\"KubeDeploymentReplicasMismatch\"},{\"name\":\"deployment\",\"value\":\"storagesvc\"},{\"name\":\"endpoint\",\"value\":\"http\"},{\"name\":\"instance\",\"value\":\"10.233.108.72:8080\"},{\"name\":\"job\",\"value\":\"kube-state-metrics\"},{\"name\":\"namespace\",\"value\":\"fission\"},{\"name\":\"pod\",\"value\":\"monitor-kube-state-metrics-856bc9455b-7z5qx\"},{\"name\":\"prometheus\",\"value\":\"monitoring/monitor-prometheus-operato-prometheus\"},{\"name\":\"service\",\"value\":\"monitor-kube-state-metrics\"},{\"name\":\"severity\",\"value\":\"critical\"}],\"markdown\":true},{\"activityTitle\":\"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)\",\"value\":\"pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes.\"},\"value\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready\"},\"value\":\"KubepodNotReady\"},\"value\":\"storagesvc-5bff46b69b-vfdrd\"},\"markdown\":true}]}]"
time="2019-11-06T07:01:07Z" level=info msg="microsoft Teams response text: 1"
time="2019-11-06T07:01:07Z" level=info msg="A card was successfully sent to microsoft Teams Channel. Got http status: 200 OK"
time="2019-11-06T07:01:07Z" level=info msg="microsoft Teams response text: Summary or Text is required."
time="2019-11-06T07:01:07Z" level=error msg="Failed sending to the Teams Channel. Teams http response: 400 Bad Request"

time="2019-11-06T07:01:08Z" level=info msg="/alertmanager received a request"
time="2019-11-06T07:01:08Z" level=debug msg="Prometheus Alert: {\"receiver\":\"prometheus-msteams\",severity=\\\"critical\\\"}\"}"
time="2019-11-06T07:01:08Z" level=debug msg="Alert rendered in template file: \r\n{\r\n  \"@type\": \"MessageCard\",\r\n      \"markdown\": true\r\n    }\r\n  ]\r\n}\r\n"
time="2019-11-06T07:01:08Z" level=debug msg="Size of message is 1714 Bytes (~1 KB)"
time="2019-11-06T07:01:08Z" level=info msg="Created a card for microsoft Teams /alertmanager"
time="2019-11-06T07:01:08Z" level=debug msg="Teams message cards: [{\"@type\":\"MessageCard\",\"markdown\":true}]}]"
time="2019-11-06T07:01:08Z" level=info msg="microsoft Teams response text: Summary or Text is required."
time="2019-11-06T07:01:08Z" level=error msg="Failed sending to the Teams Channel. Teams http response: 400 Bad Request"

可能是由于prometheus-msteams的400 bad request错误,警报管理器返回了unexpected status code 500

he85367243 回答:Prometheus alertmanager由于“超出上下文截止日期”而无法发送通知”

文件https://github.com/bzon/prometheus-msteams/blob/master/chart/prometheus-msteams/card.tmpl的问题导致了这些错误。

问题在于摘要字段为空。如本tutorial中所述,对该文件进行了少许更改以解决错误。

您可以通过覆盖默认模板来使用新的修改后的卡模板。

本文链接:https://www.f2er.com/3166752.html

大家都在问