Skip to content

Setup Metrics Analyzer


Overview

Distributed cloud application typically generate high number of KPI (multivariate time series) data. Anomaly detection of these KPIs is critical to meet service level objective (SLO) of an application.

Metrics Analyzer is an AI-powered managed service that surfaces anomalous KPIs from applications by providing actionable operational insights. It intelligently analyzes the data in near real time using machine learning (ML) and deep learning (DL) models and can detect errors, outliers or any anomalous activities in user applications within minutes of their occurrence. It surfaces the top metrics that were reflecting the anomaly. This helps to reduce MTTD (mean time to detect).

This topic describes how to setup the AI powered metrics analyzers for realtime anomaly detection.

Prerequisites

To ensure that there is enough data for training the AI/ML model(s), we recommend creating this metrics analyzer few (5-7) days after the creation of the corresponding metrics service. Please notify us if this data includes production incidents (failures) so that we can adjust the threshold appropriately.

Why?

  • Modern cloud applications are changed on a regular (e.g. daily or weekly) basis. It is very hard to keep track of the applications using a static approach
  • Distributed cloud applications can produce large amount of metrics per day. It is very hard to analyze so much data manually
  • It is very hard to model seasonality with static alerts. A system with online adaptive learning algorithms is required.
  • Downtime is expensive. Having the ability to detect incidents in a timely manner saves enterprises money and reputation

How it Works?

CloudAEye offers unsupervised models for detecting anomaly from metrics data. The model learns patterns from normal execution and can detect an anomaly when the KPI pattern deviates.

Our models deal with temporal dependence and stochasticity of multivariate time series data to provide accurate results (F1-Score). The models can work well regardless of the predictability of the multivariate time series. The model can capture complicated data patterns and identify anomalies accurately.

Anomaly Scores

Our models rank the anomalies detected based on the significance of an anomaly score. An anomaly score usually represents the confidence level of the model about the likelyhood that the detected incident is an anomaly.

CloudAEye uses the following criteria to rank anomalies:

Percentile Range Anomaly Score Confidence Level
96-97 0-25 low
97-98 25-50 medium
98-99 50-75 high
99-100 75-100 very high

Setup

Create a New Metrics Analyzer

From left navigation menu, select Services > Metrics Analyzer. A list of metrics analyzer services that are already created will be shown. The table will be empty if there are no metrics analyzer services being created in the system.

To create a new Metrics Analyzer service, click on CREATE on the top right corner. A new form will appear under Metrics Analyzer > Create.

Provide the following informaiton in the form:

  • Name: Name of the metrics analyzer service. This is usually an alpha-numeric string. For example, orders-app-metrics-analyzer.
  • Data source (metrics): Pick the Metrics Service this service will be analyzing. A data pipeline will be created from the metrics service. All metrics data will then be analyzed by AI models.

Click SUBMIT to create the Metrics Analyzer service.

User may use the command shown below to create a metrics analyzer service.

caeops metrics-analyzers create --data-sources=[{metrics=demo-metrics-service}] --name=demo-metrics-analyzer
where

--data-sources: - metrics : points to an Prometheus based metrics service

This will initiate training of AI/ML models and deploy them for realtime metrics analysis. A data-pipeline will be created between Prometheus and AI/ML models.

Output from the CLI command may look like the following:

{
  "serviceName": "demo-metrics-analyzer",
  "serviceType": "metrics-analyzer",
  "groupName": "demo-grp",
  "dataSources": {
    "metrics": "demo-metrics-service"
  },
  "createdAt": 1629949067277,
  "updatedAt": 1629949067277
}

Note that the above created metrics analyzer service is automatically added to the service group of the metrics service demo-metrics-service

List All Metrics Analyzer(s)

From left navigation menu, select Services > Metrics Analyzer. A list of metrics analyzer services that are already created will be shown. The table will be empty if there are no metrics analyzer services being created in the system.

Click on a specific service name link under Service Name column to see details of a Metrics Analyzer service.

The following information is shown in the details page:

  • Name - Name of the metrics analyzer service.
  • Date created - Date when the metrics analyzer service was created.
  • Date updated - Date when the metrics analyzer service was last updated.
  • Group name - Name of the Service Groups this analyzer is analyzing.
  • Data Source - Metrics service associated with this analyzer.
  • Dashboard - Metrics analyzer dashboard. Click OPEN to see the dashboard. This shows the metrics anomalies associated with the metrics service.

User may use the command shown below to list all the created metrics analyzers

caeops metrics-analyzers list

The output from the command may look like the following:

[
  {
    "serviceName": "demo-metrics-analyzer",
    "serviceType": "metrics-analyzer",
    "groupName": "demo-group",
    "dataSources": {
      "metrics": "demo-metrics-service"
    },
    "createdAt": 1629949067277,
    "updatedAt": 1629949067277
  }
]

Delete a Metrics Analyzer

From left navigation menu, select Services > Metrics Analyzer. A list of metrics analyzer services that are already created will be shown. The table will be empty if there are no metrics analyzer services being created in the system.

Click on X button under Actions column to delete a specific Metrics Analyzer. A confirmation windown will be shown.

Click CONFIRM to delete the Metrics Analyzer.

User may use the command shown below to delete a particular metrics analyzer.

caeops metrics-analyzers delete --name=demo-metrics-analyzer