CN116846729A - Method for managing monitoring alarm notification based on multi-tenant mode under cloud container - Google Patents
Method for managing monitoring alarm notification based on multi-tenant mode under cloud container Download PDFInfo
- Publication number
- CN116846729A CN116846729A CN202310550342.XA CN202310550342A CN116846729A CN 116846729 A CN116846729 A CN 116846729A CN 202310550342 A CN202310550342 A CN 202310550342A CN 116846729 A CN116846729 A CN 116846729A
- Authority
- CN
- China
- Prior art keywords
- tenant
- alarm
- manager
- information
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 230000005587 bubbling Effects 0.000 claims description 3
- 238000007726 management method Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Alarm Systems (AREA)
Abstract
The invention aims to provide a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, which comprises the following steps: s1: expanding information by using a label rewriting mechanism; s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism; s3: notifying channel structure optimization; s4: and newly adding a front-end service. By combining Prometaus with a cloud platform multi-tenant scene, the method for managing and monitoring alarm notification based on a multi-tenant mode under a cloud container introduces tenant dimension identification tags in index information acquisition and alarm rule management links, greatly improves the relevance between monitoring data, provides technical support for subsequent realization of multi-tenant notification, introduces notification services and adapts notification configuration custom objects, and realizes dynamic management of alarm information notification flow tenant information.
Description
Technical Field
The invention relates to the field of monitoring alarm operation and maintenance, in particular to a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container.
Background
With the rapid development of technologies such as big data, internet of things and cloud computing, the emerging container technology is gradually accepted by various big enterprises by virtue of the advantages of low coupling, distribution, elastic expansion, resource sharing and the like, more and more enterprises begin to deploy application programs in the container cloud in the production environment, in the informatization transformation process, a large amount of monitoring data are generated by a platform, the monitoring data are complicated in relation and nonuniform in structure, how to persist and analyze the monitoring data is a new problem faced by various big enterprises with high efficiency, stable and reliable service is provided for the platform, and unprecedented importance is paid to the platform monitoring and early warning function.
Prometheus is an open-source and complete monitoring solution in the cloud era, thoroughly overturns a traditional monitoring alarm model, and is strong in query data time sequence aiming at high-write low-check of monitoring service, easy to integrate and the like, and a set of centralized rule calculation and verification and unified analysis alarm data model is reconstructed. Prometheus is fast attracting wide attention by virtue of the characteristics of simplicity, high efficiency, easy management, easy integration and the like, CNCF (Cloud Native Computing Foundation) is added by Prometheus, which becomes a cloud primary open source project with the attention degree inferior to Kubernetes, and CNCF is added at the same time to mark that a monitoring scheme provided by Prometheus is widely accepted in industry and is deeply matched and fused with other cloud primary services.
The romicheus provides a monitoring scheme that can be divided into the following modules: and monitoring index information acquisition, index data system management, alarm rule system management and user interaction management. The user interaction management can be subdivided into two parts, namely web terminal management and alarm notification management, and because the organization structures of the operation and maintenance teams of all projects are different, the alarm notification link is often a part of each project which needs to be customized and modified.
Disclosure of Invention
Aiming at the problems that tenant information is difficult to identify and alarm notification channel information is difficult to manage due to different structures of two parts of alarm notification links, the invention provides a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, which is used for realizing monitoring early warning notification, optimizing three links of monitoring index information acquisition, alarm rule system management and alarm notification management respectively, and integrating monitoring service with business scenes.
A method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container comprises the following steps:
s1: expanding information by using a label rewriting mechanism;
s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism;
s3: notifying channel structure optimization;
s4: and newly adding a front-end service.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S1 includes the following substeps:
s11: the promethaus service collects xporter service data from the tenant;
s12: and expanding the required index by using a label rewriting mechanism during information acquisition.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S2 includes the following substeps:
s21: when the promethaus service generates an alarm message, the alarm message carries an expanded tag;
s22: the alert manager service receives the alarm information and groups the tenants to which the labels belong as data bases;
s23: different notification channels are configured for different tenant groups according to the labels.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S3 includes the following substeps:
s31: redefining an information manager hierarchy;
s32: the administrator is divided into three levels of platform administrator, tenant administrator and project administrator:
the platform manager is responsible for managing a monitoring alarm system of the whole system, and the system manager receives and manages all alarm information;
the tenant manager is defined in the system, two sets of tenant logic are mutually isolated in data, the tenant manager can only receive and manage the alarm information of subordinate projects of the tenant, and one tenant can manage a plurality of projects;
the project manager can only receive and manage the alarm message under the project;
s33, generating an alarm message by adopting an bubbling strategy, firstly transmitting the alarm message to a project manager, then transmitting the alarm message to a tenant manager to which the project belongs, and finally transmitting the alarm message to a platform manager.
Further, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container, wherein the step S4 includes the following substeps:
s41: alert messages for an alert manager service are first sent to a pre-service;
s42: the front-end service searches for notification channels meeting the conditions according to a three-layer notification structure;
s43: and calling an actual message sending interface according to different channel configuration modes to realize message pushing.
The invention has the beneficial effects that: the method combines Prometaus with a cloud platform multi-tenant scene, introduces tenant dimension identification tags in index information acquisition and alarm rule management links, greatly improves relevance among monitoring data, provides technical support for subsequent multi-tenant notification realization, introduces notification services and adapts notification configuration custom objects, and realizes dynamic management of alarm information notification flow tenant information.
Drawings
Fig. 1 is a block diagram of a promethaus monitoring alarm service.
Fig. 2 is a flow chart of the present invention.
Fig. 3 is a flow chart of the original alert information.
Fig. 4 is a flow chart for implementing alarm information using a packet mechanism.
Fig. 5 is a structural diagram of the original notification channel.
Fig. 6 is a block diagram of an information manager hierarchy.
Fig. 7 is a structural diagram of a notification service.
Detailed Description
For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.
As shown in fig. 1, a structure diagram of a promethaus monitoring alarm service:
exporter service: various index generation services in a monitoring system are generally referred to, and common exporter services include node-exporter aiming at a server monitoring scene, redis-exporter in charge of monitoring cache services, kafka-exporter in charge of monitoring message queue services and the like, wherein the exporter services are in charge of interacting with a monitoring object, generating index information according to a unified specification, and the precursor service is in charge of accessing each exporter service at regular time to acquire monitoring information.
Prometheus service: in order to acquire monitoring indexes, the promethaus monitoring service generally initiates a pull (pull) operation by adopting an http request, the index content is provided by various exporter monitoring services, the return value of the exporter service needs to meet the requirement of the promethaus specification, and the format of returned index information is as follows:
<metric name>{<label name>=<label value>, ...}<metric value>
the format information can be divided into three parts of index names (< metric name >), label information (< label name > = < label value >), and index values (< metric value >), wherein the index names are used for identifying the monitoring information, the label information is used for describing the monitoring object in the form of key/value as the monitoring information attribute, and the index values are used for describing the index change trend in the float64 floating point format.
The Prometheus service stores the acquired information in a time sequence database, the Prometheus service can maintain Prometheus rule alarm rules, the alarm rules are uniformly scheduled and calculated and managed by the Prometheus service, alarm content can carry partial tag information (< tag name > = < tag value >) besides necessary description information, text information is used for describing problems, and tag information (< tag name > = < tag value >) is mainly used for classifying the alarm information. When the Prometaus server triggers the alarm rules, the alarm information is uniformly sent to the alert manager server.
Alert manager service: alert manager, as an alert management service, receives and manages alert information sent from various clients including promethaus, and common management operations include: deduplication, grouping, buffering, silence, suppression, and forwarding. Wherein grouping, silencing and suppressing links require an administrator to configure the system according to the actual conditions of the system. Because the alert manager does not directly send the alert information to the user, the alert manager is usually sent to the notification server of the third party first, and the alert information is sent to the user terminal by different types of notification services.
Notification service: various notification services are responsible for sending messages to users or other notification services when alarm information is generated, notification channels supported by an alert manager by default are few and are not easy to manage, and configuration files need to be manually updated and services need to be restarted when a scene changes.
The promethaus service structure focuses on the monitoring service itself and does not generate strong coupling with tenant service data, if the alert manager service defaults to support a few notification channels and does not support the function of dynamically inquiring a notification object when the alert information needs to be sent, besides, the exor mode design is originally designed to focus the exor service on the monitoring object index itself, and the monitoring data is often not related with the service data, so that the monitoring data finally becomes isolated low-value data.
As shown in fig. 2, for the above pain point problem, a method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container is provided, which includes the steps:
s1: expanding information by using a label rewriting mechanism;
s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism;
s3: notifying channel structure optimization;
s4: and newly adding a front-end service.
The step S1 comprises the following substeps:
s11: the promethaus service collects xporter service data from the tenant;
s12: and expanding the required index by using a label rewriting mechanism during information acquisition.
In the embodiment, in the figure, the promethaus service needs to collect 3 exporter service data from two tenants of tenant 1 and tenant 2 respectively, and an index named up is used for identifying whether a monitored object can be monitored normally, the data is 1 when monitoring is normal, the data is 0 when monitoring is abnormal, an alarm is triggered when the data is 0 generally, and the collected index is as follows when not being optimized:
up { job = "tenant 1-monitoring service 1" } 1
up { job = "tenant 1-monitoring service 2" } 1
up { job = "tenant 2-monitoring service 1" } 1
The label information has no other service description information except the job for identifying the source of the exporter, the single index information cannot judge where the service is deployed, what type the exporter belongs to and which tenant, and the index effect acquired after the label rewriting mechanism is used for expanding and optimizing during the information acquisition is as follows:
up { job = "tenant 1-monitoring service 1", namespace = "default", type = "node", system_name = "tenant 1" } 1
up { job = "tenant 1-monitoring service 2", namespace = "default", type = "redis", system_name = "tenant 1" } 1
up { job = "tenant 2-monitoring service 1", namespace = "default", type = "node", system_name = "tenant 2" } 1
Three tags, namely, a name, a type and a system_name, are respectively extended, wherein the name is used for identifying exporter data isolation logic, the type is used for identifying exporter types, and the system_name is used for identifying the affiliated tenant.
As shown in fig. 3, the step S2 includes the following sub-steps:
s21: when the promethaus service generates an alarm message, the alarm message carries an expanded tag;
s22: the alert manager service receives the alarm information and groups the tenants to which the labels belong as data bases;
s23: different notification channels are configured for different tenant groups according to the labels.
The embodiment is an up index in the graph, wherein when the value of the up index is 1, the monitoring state is normal, when the value of the up index is 0, the monitoring state is abnormal, and when the state is abnormal, an alarm is triggered.
Before optimization, when promethaus generates an alarm message, all alarm messages are sent to an alert manager service, because there is not enough tag base data, and the alert manager service can only put all alarm messages under one packet when receiving alarm messages.
As shown in fig. 4, after optimization, because three labels of namespace, type, system _name are expanded during information collection, when an alarm message is generated, the alarm message can also carry the expanded three labels, after receiving the alarm message, the alert manager can use the label system_name (the tenant to which the alert manager belongs) as a data basis for grouping, and finally, different notification channels are configured for different tenant groups.
As shown in fig. 5, when the alert manager server does not introduce multi-tenant configuration, all the alert messages are temporarily stored in the same group, and the alert channels are bound with the group information in the conventional mode, so that the singleness of the alert channel types is indirectly caused, in addition, the group channels in the conventional mode are in a level relationship with each other, in the multi-tenant scenario, not only are different tenants data isolated from each other, but also data isolation or upper and lower authority requirements are often involved in one tenant.
As shown in fig. 6, the step S3 includes the following sub-steps:
s31: redefining an information manager hierarchy;
s32: the administrator is divided into three levels of platform administrator, tenant administrator and project administrator:
the platform manager is responsible for managing a monitoring alarm system of the whole system, and the system manager receives and manages all alarm information;
the tenant manager is defined in the system, two sets of tenant logic are mutually isolated in data, the tenant manager can only receive and manage the alarm information of subordinate projects of the tenant, and one tenant can manage a plurality of projects;
the project manager can only receive and manage the alarm message under the project;
s33, generating an alarm message by adopting an bubbling strategy, firstly transmitting the alarm message to a project manager, then transmitting the alarm message to a tenant manager to which the project belongs, and finally transmitting the alarm message to a platform manager.
As shown in fig. 7, the step S4 includes the following sub-steps:
s41: alert messages for an alert manager service are first sent to a pre-service;
s42: the front-end service searches for notification channels meeting the conditions according to a three-layer notification structure;
s43: and calling an actual message sending interface according to different channel configuration modes to realize message pushing.
According to the method for managing and monitoring alarm notification based on the multi-tenant mode under the cloud container, prometaus is combined with a multi-tenant scene of a cloud platform, tenant dimension identification tags are introduced in index information acquisition and alarm rule management links, relevance among monitoring data is greatly improved, technical support is provided for subsequent realization of multi-tenant notification, notification service is introduced, notification configuration custom objects are adapted, and dynamic management of alarm information notification flow tenant information is achieved; the index tag information is used as important data penetrating through the whole process of monitoring and alarming, and by expanding the dimension of the monitoring tag and correlating the tag information, the notification configuration custom object and the message notification service, the dynamic management of the monitoring and notification function in a multi-tenant scene is finally realized, the information transfer efficiency is improved, and the project operation and maintenance cost is reduced.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (5)
1. A method for managing monitoring alarm notification based on a multi-tenant mode under a cloud container is characterized by comprising the following steps:
s1: expanding information by using a label rewriting mechanism;
s2: realizing the classification of the alarm information according to tenants by utilizing a grouping mechanism;
s3: notifying channel structure optimization;
s4: and newly adding a front-end service.
2. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S1 comprises the following substeps:
s11: the promethaus service collects xporter service data from the tenant;
s12: and expanding the required index by using a label rewriting mechanism during information acquisition.
3. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S2 comprises the substeps of:
s21: when the promethaus service generates an alarm message, the alarm message carries an expanded tag;
s22: the alert manager service receives the alarm information and groups the tenants to which the labels belong as data bases;
s23: different notification channels are configured for different tenant groups according to the labels.
4. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S3 comprises the substeps of:
s31: redefining an information manager hierarchy;
s32: the administrator is divided into three levels of platform administrator, tenant administrator and project administrator:
the platform manager is responsible for managing a monitoring alarm system of the whole system, and the system manager receives and manages all alarm information;
the tenant manager is defined in the system, two sets of tenant logic are mutually isolated in data, the tenant manager can only receive and manage the alarm information of subordinate projects of the tenant, and one tenant can manage a plurality of projects;
the project manager can only receive and manage the alarm message under the project;
s33, generating an alarm message by adopting an bubbling strategy, firstly transmitting the alarm message to a project manager, then transmitting the alarm message to a tenant manager to which the project belongs, and finally transmitting the alarm message to a platform manager.
5. The method for managing monitoring alarm notifications based on a multi-tenant mode under a cloud container of claim 1, wherein S4 comprises the substeps of:
s41: alert messages for an alert manager service are first sent to a pre-service;
s42: the front-end service searches for notification channels meeting the conditions according to a three-layer notification structure;
s43: and calling an actual message sending interface according to different channel configuration modes to realize message pushing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310550342.XA CN116846729A (en) | 2023-05-16 | 2023-05-16 | Method for managing monitoring alarm notification based on multi-tenant mode under cloud container |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310550342.XA CN116846729A (en) | 2023-05-16 | 2023-05-16 | Method for managing monitoring alarm notification based on multi-tenant mode under cloud container |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116846729A true CN116846729A (en) | 2023-10-03 |
Family
ID=88167813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310550342.XA Pending CN116846729A (en) | 2023-05-16 | 2023-05-16 | Method for managing monitoring alarm notification based on multi-tenant mode under cloud container |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116846729A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117692302A (en) * | 2024-02-02 | 2024-03-12 | 深圳感臻智能股份有限公司 | Method and system for data collection, storage and intelligent monitoring and alarming |
-
2023
- 2023-05-16 CN CN202310550342.XA patent/CN116846729A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117692302A (en) * | 2024-02-02 | 2024-03-12 | 深圳感臻智能股份有限公司 | Method and system for data collection, storage and intelligent monitoring and alarming |
CN117692302B (en) * | 2024-02-02 | 2024-05-28 | 深圳感臻智能股份有限公司 | Method and system for data collection, storage and intelligent monitoring and alarming |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112511339B (en) | Container monitoring alarm method, system, equipment and storage medium based on multiple clusters | |
US10212055B2 (en) | System and method for dynamically grouping devices based on present device conditions | |
US8504733B1 (en) | Subtree for an aggregation system | |
US7099879B2 (en) | Real-time monitoring of service performance through the use of relational database calculation clusters | |
US8175862B1 (en) | Model-based systems and methods for monitoring resources | |
EP1361761A1 (en) | Telecommunications network management system and method for service monitoring | |
CN111143382B (en) | Data processing method, system and computer readable storage medium | |
US8001150B2 (en) | Device management method using nodes having additional attribute and device management client thereof | |
CN112507029A (en) | Data processing system and data real-time processing method | |
CN113377626B (en) | Visual unified alarm method, device, equipment and medium based on service tree | |
CN112882900B (en) | Method and device for recording service data change log | |
CN113094166B (en) | Link tracking method, device, medium and computing equipment | |
CN116846729A (en) | Method for managing monitoring alarm notification based on multi-tenant mode under cloud container | |
US20210392202A1 (en) | Artificial intelligence log processing and content distribution network optimization | |
CN112685499A (en) | Method, device and equipment for synchronizing process data of work service flow | |
CN114900449B (en) | Resource information management method, system and device | |
US20070043752A1 (en) | Disparate network model synchronization | |
CN113486095A (en) | Civil aviation air traffic control cross-network safety data exchange management platform | |
CN109324892B (en) | Distributed management method, distributed management system and device | |
CN117950850A (en) | Data transmission method, device, electronic equipment and computer readable medium | |
WO2021120986A1 (en) | Service status analysis method, server, and storage medium | |
CN110764882B (en) | Distributed management method, distributed management system and device | |
CN113282431A (en) | Abnormal data processing method and device, storage medium and electronic equipment | |
CN115114316A (en) | Processing method, device, cluster and storage medium for high-concurrency data | |
CN113010385A (en) | Task state updating method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |