Setup Metrics Analyzer¶
Overview¶
Distributed cloud application typically generate high number of KPI (multivariate time series) data. Anomaly detection of these KPIs is critical to meet service level objective (SLO) of an application.
Metrics Analyzer is an AI-powered managed service that surfaces anomalous KPIs from applications by providing actionable operational insights. It intelligently analyzes the data in near real time using machine learning (ML) and deep learning (DL) models and can detect errors, outliers or any anomalous activities in user applications within minutes of their occurrence. It surfaces the top metrics that were reflecting the anomaly. This helps to reduce MTTD (mean time to detect).
This topic describes how to setup the AI powered metrics analyzers for realtime anomaly detection.
Prerequisites¶
- User must setup a metrics service for the application in question.
- The metrics services must be added to a service group.
To ensure that there is enough data for training the AI/ML model(s), we recommend creating this metrics analyzer few (5-7) days after the creation of the corresponding metrics service. Please notify us if this data includes production incidents (failures) so that we can adjust the threshold appropriately.
Why?¶
- Modern cloud applications are changed on a regular (e.g. daily or weekly) basis. It is very hard to keep track of the applications using a static approach
- Distributed cloud applications can produce large amount of metrics per day. It is very hard to analyze so much data manually
- It is very hard to model seasonality with static alerts. A system with online adaptive learning algorithms is required.
- Downtime is expensive. Having the ability to detect incidents in a timely manner saves enterprises money and reputation
How it Works?¶
CloudAEye offers unsupervised models for detecting anomaly from metrics data. The model learns patterns from normal execution and can detect an anomaly when the KPI pattern deviates.
Our models deal with temporal dependence and stochasticity of multivariate time series data to provide accurate results (F1-Score). The models can work well regardless of the predictability of the multivariate time series. The model can capture complicated data patterns and identify anomalies accurately.
Anomaly Scores¶
Our models rank the anomalies detected based on the significance of an anomaly score. An anomaly score usually represents the confidence level of the model about
the likelyhood
that the detected incident is an anomaly.
CloudAEye uses the following criteria to rank anomalies:
Percentile Range | Anomaly Score | Confidence Level |
---|---|---|
96-97 | 0-25 | low |
97-98 | 25-50 | medium |
98-99 | 50-75 | high |
99-100 | 75-100 | very high |
Setup¶
Create a New Metrics Analyzer¶
From left navigation menu, select Services > Metrics Analyzer
.
A list of metrics analyzer services that are already created will be shown.
The table will be empty if there are no metrics analyzer services being created in the system.
To create a new Metrics Analyzer
service, click on CREATE
on the top right corner. A new form will appear under Metrics Analyzer > Create
.
Provide the following informaiton in the form:
- Name: Name of the metrics analyzer service. This is usually an alpha-numeric string. For example, orders-app-metrics-analyzer.
- Data source (metrics): Pick the
Metrics Service
this service will be analyzing. A data pipeline will be created from the metrics service. All metrics data will then be analyzed by AI models.
Click SUBMIT
to create the Metrics Analyzer
service.
User may use the command shown below to create
a metrics analyzer service.
caeops metrics-analyzers create --data-sources=[{metrics=demo-metrics-service}] --name=demo-metrics-analyzer
--data-sources: - metrics : points to an Prometheus based metrics service
This will initiate training of AI/ML models and deploy them for realtime metrics analysis. A data-pipeline will be created between Prometheus and AI/ML models.
Output from the CLI command may look like the following:
{
"serviceName": "demo-metrics-analyzer",
"serviceType": "metrics-analyzer",
"groupName": "demo-grp",
"dataSources": {
"metrics": "demo-metrics-service"
},
"createdAt": 1629949067277,
"updatedAt": 1629949067277
}
Note that the above created metrics analyzer service is automatically added to the service group of the metrics service
demo-metrics-service
List All Metrics Analyzer(s)¶
From left navigation menu, select Services > Metrics Analyzer
.
A list of metrics analyzer services that are already created will be shown.
The table will be empty if there are no metrics analyzer services being created in the system.
Click on a specific service name link under Service Name
column to see details of a Metrics Analyzer
service.
The following information is shown in the details page:
- Name - Name of the metrics analyzer service.
- Date created - Date when the metrics analyzer service was created.
- Date updated - Date when the metrics analyzer service was last updated.
- Group name - Name of the
Service Groups
this analyzer is analyzing. - Data Source - Metrics service associated with this analyzer.
- Dashboard - Metrics analyzer dashboard. Click
OPEN
to see the dashboard. This shows the metrics anomalies associated with the metrics service.
User may use the command shown below to list
all the created metrics analyzers
caeops metrics-analyzers list
The output from the command may look like the following:
[
{
"serviceName": "demo-metrics-analyzer",
"serviceType": "metrics-analyzer",
"groupName": "demo-group",
"dataSources": {
"metrics": "demo-metrics-service"
},
"createdAt": 1629949067277,
"updatedAt": 1629949067277
}
]
Delete a Metrics Analyzer¶
From left navigation menu, select Services > Metrics Analyzer
.
A list of metrics analyzer services that are already created will be shown.
The table will be empty if there are no metrics analyzer services being created in the system.
Click on X
button under Actions
column to delete a specific Metrics Analyzer
.
A confirmation windown will be shown.
Click CONFIRM
to delete the Metrics Analyzer
.
User may use the command shown below to delete
a particular metrics analyzer.
caeops metrics-analyzers delete --name=demo-metrics-analyzer