[go: nahoru, domu]

CN116846729A - Method for managing monitoring alarm notification based on multi-tenant mode under cloud container - Google Patents

Method for managing monitoring alarm notification based on multi-tenant mode under cloud container Download PDF

Info

Publication number
CN116846729A
CN116846729A CN202310550342.XA CN202310550342A CN116846729A CN 116846729 A CN116846729 A CN 116846729A CN 202310550342 A CN202310550342 A CN 202310550342A CN 116846729 A CN116846729 A CN 116846729A
Authority
CN
China
Prior art keywords
tenant
alarm
manager
information
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310550342.XA
Other languages
Chinese (zh)
Inventor
张云峰
田吉
李佳
刘彪
娄江南
李成
杨爽
牛建平
孙大臣
管春元
谢斌
焦质晔
滕训超
孙增强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
QIMING INFORMATION TECHNOLOGY CO LTD
Original Assignee
QIMING INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by QIMING INFORMATION TECHNOLOGY CO LTD filed Critical QIMING INFORMATION TECHNOLOGY CO LTD
Priority to CN202310550342.XA priority Critical patent/CN116846729A/en
Publication of CN116846729A publication Critical patent/CN116846729A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Alarm Systems (AREA)

Abstract

The invention aims to provide a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, which comprises the following steps: s1: expanding information by using a label rewriting mechanism; s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism; s3: notifying channel structure optimization; s4: and newly adding a front-end service. By combining Prometaus with a cloud platform multi-tenant scene, the method for managing and monitoring alarm notification based on a multi-tenant mode under a cloud container introduces tenant dimension identification tags in index information acquisition and alarm rule management links, greatly improves the relevance between monitoring data, provides technical support for subsequent realization of multi-tenant notification, introduces notification services and adapts notification configuration custom objects, and realizes dynamic management of alarm information notification flow tenant information.

Description

Method for managing monitoring alarm notification based on multi-tenant mode under cloud container
Technical Field
The invention relates to the field of monitoring alarm operation and maintenance, in particular to a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container.
Background
With the rapid development of technologies such as big data, internet of things and cloud computing, the emerging container technology is gradually accepted by various big enterprises by virtue of the advantages of low coupling, distribution, elastic expansion, resource sharing and the like, more and more enterprises begin to deploy application programs in the container cloud in the production environment, in the informatization transformation process, a large amount of monitoring data are generated by a platform, the monitoring data are complicated in relation and nonuniform in structure, how to persist and analyze the monitoring data is a new problem faced by various big enterprises with high efficiency, stable and reliable service is provided for the platform, and unprecedented importance is paid to the platform monitoring and early warning function.
Prometheus is an open-source and complete monitoring solution in the cloud era, thoroughly overturns a traditional monitoring alarm model, and is strong in query data time sequence aiming at high-write low-check of monitoring service, easy to integrate and the like, and a set of centralized rule calculation and verification and unified analysis alarm data model is reconstructed. Prometheus is fast attracting wide attention by virtue of the characteristics of simplicity, high efficiency, easy management, easy integration and the like, CNCF (Cloud Native Computing Foundation) is added by Prometheus, which becomes a cloud primary open source project with the attention degree inferior to Kubernetes, and CNCF is added at the same time to mark that a monitoring scheme provided by Prometheus is widely accepted in industry and is deeply matched and fused with other cloud primary services.
The romicheus provides a monitoring scheme that can be divided into the following modules: and monitoring index information acquisition, index data system management, alarm rule system management and user interaction management. The user interaction management can be subdivided into two parts, namely web terminal management and alarm notification management, and because the organization structures of the operation and maintenance teams of all projects are different, the alarm notification link is often a part of each project which needs to be customized and modified.
Disclosure of Invention
Aiming at the problems that tenant information is difficult to identify and alarm notification channel information is difficult to manage due to different structures of two parts of alarm notification links, the invention provides a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, which is used for realizing monitoring early warning notification, optimizing three links of monitoring index information acquisition, alarm rule system management and alarm notification management respectively, and integrating monitoring service with business scenes.
A method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container comprises the following steps:
s1: expanding information by using a label rewriting mechanism;
s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism;
s3: notifying channel structure optimization;
s4: and newly adding a front-end service.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S1 includes the following substeps:
s11: the promethaus service collects xporter service data from the tenant;
s12: and expanding the required index by using a label rewriting mechanism during information acquisition.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S2 includes the following substeps:
s21: when the promethaus service generates an alarm message, the alarm message carries an expanded tag;
s22: the alert manager service receives the alarm information and groups the tenants to which the labels belong as data bases;
s23: different notification channels are configured for different tenant groups according to the labels.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S3 includes the following substeps:
s31: redefining an information manager hierarchy;
s32: the administrator is divided into three levels of platform administrator, tenant administrator and project administrator:
the platform manager is responsible for managing a monitoring alarm system of the whole system, and the system manager receives and manages all alarm information;
the tenant manager is defined in the system, two sets of tenant logic are mutually isolated in data, the tenant manager can only receive and manage the alarm information of subordinate projects of the tenant, and one tenant can manage a plurality of projects;
the project manager can only receive and manage the alarm message under the project;
s33, generating an alarm message by adopting an bubbling strategy, firstly transmitting the alarm message to a project manager, then transmitting the alarm message to a tenant manager to which the project belongs, and finally transmitting the alarm message to a platform manager.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S4 includes the following substeps:
s41: alert messages for an alert manager service are first sent to a pre-service;
s42: the front-end service searches for notification channels meeting the conditions according to a three-layer notification structure;
s43: and calling an actual message sending interface according to different channel configuration modes to realize message pushing.
The invention has the beneficial effects that: the method combines Prometaus with a cloud platform multi-tenant scene, introduces tenant dimension identification tags in index information acquisition and alarm rule management links, greatly improves relevance among monitoring data, provides technical support for subsequent multi-tenant notification realization, introduces notification services and adapts notification configuration custom objects, and realizes dynamic management of alarm information notification flow tenant information.
Drawings
Fig. 1 is a block diagram of a promethaus monitoring alarm service.
Fig. 2 is a flow chart of the present invention.
Fig. 3 is a flow chart of the original alert information.
Fig. 4 is a flow chart for implementing alarm information using a packet mechanism.
Fig. 5 is a structural diagram of the original notification channel.
Fig. 6 is a block diagram of an information manager hierarchy.
Fig. 7 is a structural diagram of a notification service.
Detailed Description
For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.
As shown in fig. 1, a structure diagram of a promethaus monitoring alarm service:
exporter service: various index generation services in a monitoring system are generally referred to, and common exporter services include node-exporter aiming at a server monitoring scene, redis-exporter in charge of monitoring cache services, kafka-exporter in charge of monitoring message queue services and the like, wherein the exporter services are in charge of interacting with a monitoring object, generating index information according to a unified specification, and the precursor service is in charge of accessing each exporter service at regular time to acquire monitoring information.
Prometheus service: in order to acquire monitoring indexes, the promethaus monitoring service generally initiates a pull (pull) operation by adopting an http request, the index content is provided by various exporter monitoring services, the return value of the exporter service needs to meet the requirement of the promethaus specification, and the format of returned index information is as follows:
<metric name>{<label name>=<label value>, ...}<metric value>
the format information can be divided into three parts of index names (< metric name >), label information (< label name > = < label value >), and index values (< metric value >), wherein the index names are used for identifying the monitoring information, the label information is used for describing the monitoring object in the form of key/value as the monitoring information attribute, and the index values are used for describing the index change trend in the float64 floating point format.
The Prometheus service stores the acquired information in a time sequence database, the Prometheus service can maintain Prometheus rule alarm rules, the alarm rules are uniformly scheduled and calculated and managed by the Prometheus service, alarm content can carry partial tag information (< tag name > = < tag value >) besides necessary description information, text information is used for describing problems, and tag information (< tag name > = < tag value >) is mainly used for classifying the alarm information. When the Prometaus server triggers the alarm rules, the alarm information is uniformly sent to the alert manager server.
Alert manager service: alert manager, as an alert management service, receives and manages alert information sent from various clients including promethaus, and common management operations include: deduplication, grouping, buffering, silence, suppression, and forwarding. Wherein grouping, silencing and suppressing links require an administrator to configure the system according to the actual conditions of the system. Because the alert manager does not directly send the alert information to the user, the alert manager is usually sent to the notification server of the third party first, and the alert information is sent to the user terminal by different types of notification services.
Notification service: various notification services are responsible for sending messages to users or other notification services when alarm information is generated, notification channels supported by an alert manager by default are few and are not easy to manage, and configuration files need to be manually updated and services need to be restarted when a scene changes.
The promethaus service structure focuses on the monitoring service itself and does not generate strong coupling with tenant service data, if the alert manager service defaults to support a few notification channels and does not support the function of dynamically inquiring a notification object when the alert information needs to be sent, besides, the exor mode design is originally designed to focus the exor service on the monitoring object index itself, and the monitoring data is often not related with the service data, so that the monitoring data finally becomes isolated low-value data.
As shown in fig. 2, for the above pain point problem, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container is provided, which includes the steps:
s1: expanding information by using a label rewriting mechanism;
s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism;
s3: notifying channel structure optimization;
s4: and newly adding a front-end service.
The step S1 comprises the following substeps:
s11: the promethaus service collects xporter service data from the tenant;
s12: and expanding the required index by using a label rewriting mechanism during information acquisition.
In the embodiment, in the figure, the promethaus service needs to collect 3 exporter service data from two tenants of tenant 1 and tenant 2 respectively, and an index named up is used for identifying whether a monitored object can be monitored normally, the data is 1 when monitoring is normal, the data is 0 when monitoring is abnormal, an alarm is triggered when the data is 0 generally, and the collected index is as follows when not being optimized:
up { job = "tenant 1-monitoring service 1" } 1
up { job = "tenant 1-monitoring service 2" } 1
up { job = "tenant 2-monitoring service 1" } 1
The label information has no other service description information except the job for identifying the source of the exporter, the single index information cannot judge where the service is deployed, what type the exporter belongs to and which tenant, and the index effect acquired after the label rewriting mechanism is used for expanding and optimizing during the information acquisition is as follows:
up { job = "tenant 1-monitoring service 1", namespace = "default", type = "node", system_name = "tenant 1" } 1
up { job = "tenant 1-monitoring service 2", namespace = "default", type = "redis", system_name = "tenant 1" } 1
up { job = "tenant 2-monitoring service 1", namespace = "default", type = "node", system_name = "tenant 2" } 1
Three tags, namely, a name, a type and a system_name, are respectively extended, wherein the name is used for identifying exporter data isolation logic, the type is used for identifying exporter types, and the system_name is used for identifying the affiliated tenant.
As shown in fig. 3, the step S2 includes the following sub-steps:
s21: when the promethaus service generates an alarm message, the alarm message carries an expanded tag;
s22: the alert manager service receives the alarm information and groups the tenants to which the labels belong as data bases;
s23: different notification channels are configured for different tenant groups according to the labels.
The embodiment is an up index in the graph, wherein when the value of the up index is 1, the monitoring state is normal, when the value of the up index is 0, the monitoring state is abnormal, and when the state is abnormal, an alarm is triggered.
Before optimization, when promethaus generates an alarm message, all alarm messages are sent to an alert manager service, because there is not enough tag base data, and the alert manager service can only put all alarm messages under one packet when receiving alarm messages.
As shown in fig. 4, after optimization, because three labels of namespace, type, system _name are expanded during information collection, when an alarm message is generated, the alarm message can also carry the expanded three labels, after receiving the alarm message, the alert manager can use the label system_name (the tenant to which the alert manager belongs) as a data basis for grouping, and finally, different notification channels are configured for different tenant groups.
As shown in fig. 5, when the alert manager server does not introduce multi-tenant configuration, all the alert messages are temporarily stored in the same group, and the alert channels are bound with the group information in the conventional mode, so that the singleness of the alert channel types is indirectly caused, in addition, the group channels in the conventional mode are in a level relationship with each other, in the multi-tenant scenario, not only are different tenants data isolated from each other, but also data isolation or upper and lower authority requirements are often involved in one tenant.
As shown in fig. 6, the step S3 includes the following sub-steps:
s31: redefining an information manager hierarchy;
s32: the administrator is divided into three levels of platform administrator, tenant administrator and project administrator:
the platform manager is responsible for managing a monitoring alarm system of the whole system, and the system manager receives and manages all alarm information;
the tenant manager is defined in the system, two sets of tenant logic are mutually isolated in data, the tenant manager can only receive and manage the alarm information of subordinate projects of the tenant, and one tenant can manage a plurality of projects;
the project manager can only receive and manage the alarm message under the project;
s33, generating an alarm message by adopting an bubbling strategy, firstly transmitting the alarm message to a project manager, then transmitting the alarm message to a tenant manager to which the project belongs, and finally transmitting the alarm message to a platform manager.
As shown in fig. 7, the step S4 includes the following sub-steps:
s41: alert messages for an alert manager service are first sent to a pre-service;
s42: the front-end service searches for notification channels meeting the conditions according to a three-layer notification structure;
s43: and calling an actual message sending interface according to different channel configuration modes to realize message pushing.
According to the method for managing and monitoring alarm notification based on the multi-tenant mode under the cloud container, prometaus is combined with a multi-tenant scene of a cloud platform, tenant dimension identification tags are introduced in index information acquisition and alarm rule management links, relevance among monitoring data is greatly improved, technical support is provided for subsequent realization of multi-tenant notification, notification service is introduced, notification configuration custom objects are adapted, and dynamic management of alarm information notification flow tenant information is achieved; the index tag information is used as important data penetrating through the whole process of monitoring and alarming, and by expanding the dimension of the monitoring tag and correlating the tag information, the notification configuration custom object and the message notification service, the dynamic management of the monitoring and notification function in a multi-tenant scene is finally realized, the information transfer efficiency is improved, and the project operation and maintenance cost is reduced.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container is characterized by comprising the following steps:
s1: expanding information by using a label rewriting mechanism;
s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism;
s3: notifying channel structure optimization;
s4: and newly adding a front-end service.
2. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S1 comprises the following substeps:
s11: the promethaus service collects xporter service data from the tenant;
s12: and expanding the required index by using a label rewriting mechanism during information acquisition.
3. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S2 comprises the substeps of:
s21: when the promethaus service generates an alarm message, the alarm message carries an expanded tag;
s22: the alert manager service receives the alarm information and groups the tenants to which the labels belong as data bases;
s23: different notification channels are configured for different tenant groups according to the labels.
4. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S3 comprises the substeps of:
s31: redefining an information manager hierarchy;
s32: the administrator is divided into three levels of platform administrator, tenant administrator and project administrator:
the platform manager is responsible for managing a monitoring alarm system of the whole system, and the system manager receives and manages all alarm information;
the tenant manager is defined in the system, two sets of tenant logic are mutually isolated in data, the tenant manager can only receive and manage the alarm information of subordinate projects of the tenant, and one tenant can manage a plurality of projects;
the project manager can only receive and manage the alarm message under the project;
s33, generating an alarm message by adopting an bubbling strategy, firstly transmitting the alarm message to a project manager, then transmitting the alarm message to a tenant manager to which the project belongs, and finally transmitting the alarm message to a platform manager.
5. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S4 comprises the substeps of:
s41: alert messages for an alert manager service are first sent to a pre-service;
s42: the front-end service searches for notification channels meeting the conditions according to a three-layer notification structure;
s43: and calling an actual message sending interface according to different channel configuration modes to realize message pushing.
CN202310550342.XA 2023-05-16 2023-05-16 Method for managing monitoring alarm notification based on multi-tenant mode under cloud container Pending CN116846729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310550342.XA CN116846729A (en) 2023-05-16 2023-05-16 Method for managing monitoring alarm notification based on multi-tenant mode under cloud container

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310550342.XA CN116846729A (en) 2023-05-16 2023-05-16 Method for managing monitoring alarm notification based on multi-tenant mode under cloud container

Publications (1)

Publication Number Publication Date
CN116846729A true CN116846729A (en) 2023-10-03

Family

ID=88167813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310550342.XA Pending CN116846729A (en) 2023-05-16 2023-05-16 Method for managing monitoring alarm notification based on multi-tenant mode under cloud container

Country Status (1)

Country Link
CN (1) CN116846729A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692302A (en) * 2024-02-02 2024-03-12 深圳感臻智能股份有限公司 Method and system for data collection, storage and intelligent monitoring and alarming

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692302A (en) * 2024-02-02 2024-03-12 深圳感臻智能股份有限公司 Method and system for data collection, storage and intelligent monitoring and alarming
CN117692302B (en) * 2024-02-02 2024-05-28 深圳感臻智能股份有限公司 Method and system for data collection, storage and intelligent monitoring and alarming

Similar Documents

Publication Publication Date Title
CN112511339B (en) Container monitoring alarm method, system, equipment and storage medium based on multiple clusters
US10212055B2 (en) System and method for dynamically grouping devices based on present device conditions
US8504733B1 (en) Subtree for an aggregation system
US7099879B2 (en) Real-time monitoring of service performance through the use of relational database calculation clusters
US8175862B1 (en) Model-based systems and methods for monitoring resources
EP1361761A1 (en) Telecommunications network management system and method for service monitoring
CN111143382B (en) Data processing method, system and computer readable storage medium
US8001150B2 (en) Device management method using nodes having additional attribute and device management client thereof
CN112507029A (en) Data processing system and data real-time processing method
CN113377626B (en) Visual unified alarm method, device, equipment and medium based on service tree
CN112882900B (en) Method and device for recording service data change log
CN113094166B (en) Link tracking method, device, medium and computing equipment
CN116846729A (en) Method for managing monitoring alarm notification based on multi-tenant mode under cloud container
US20210392202A1 (en) Artificial intelligence log processing and content distribution network optimization
CN112685499A (en) Method, device and equipment for synchronizing process data of work service flow
CN114900449B (en) Resource information management method, system and device
US20070043752A1 (en) Disparate network model synchronization
CN113486095A (en) Civil aviation air traffic control cross-network safety data exchange management platform
CN109324892B (en) Distributed management method, distributed management system and device
CN117950850A (en) Data transmission method, device, electronic equipment and computer readable medium
WO2021120986A1 (en) Service status analysis method, server, and storage medium
CN110764882B (en) Distributed management method, distributed management system and device
CN113282431A (en) Abnormal data processing method and device, storage medium and electronic equipment
CN115114316A (en) Processing method, device, cluster and storage medium for high-concurrency data
CN113010385A (en) Task state updating method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination