CN106533792A - Method and device for monitoring and configuring resources - Google Patents
Method and device for monitoring and configuring resources Download PDFInfo
- Publication number
- CN106533792A CN106533792A CN201611140737.9A CN201611140737A CN106533792A CN 106533792 A CN106533792 A CN 106533792A CN 201611140737 A CN201611140737 A CN 201611140737A CN 106533792 A CN106533792 A CN 106533792A
- Authority
- CN
- China
- Prior art keywords
- resource
- monitoring
- data
- ganglia
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/085—Retrieval of network configuration; Tracking network configuration history
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention discloses a method and a device for monitoring and configuring resources. The method comprises the steps: Ganglia collects information of different functional clusters dynamically, grades each piece of node information in a resource monitoring system and periodically records the grades into a log; Nagios sets alarms at different levels in a resource alarming system, sets plug-ins of different types of sent messages, self-defines message content, obtains data sent by the ganglia and records the data into the log; and the Nagios optimizes and redistributes tasks and resources based on a grade of a resource feedback system. Accordingly, a big data platform is optimized and adjusted based on quantified resources and task information; a knowledge base is formed for historical data and processing methods; convenience is brought to updating of a monitoring mode, and encountered problems can be processed based on the knowledge base.
Description
Technical field
The present embodiments relate to the monitoring of cluster and report to the police and operation tuning field in big data, more particularly to it is a kind of
Monitoring and the method and device of configuration resource.
Background technology
In big data process field, with the increase of the data and server of data center, monitoring to data and resource,
Using there is higher requirement.Due to increase and the increasing of requiring in terms of using resource of program of cluster scale, can be real-time
Therefore monitoring cluster state simultaneously makes the timely feedback for being directed to cluster and operation operation, affects to a great extent whole big
The allomeric function and operating efficiency of data platform.
Node in monitoring cluster is an important component part of cluster management, that is, track the state of node.Ganglia
It is an application program for monitoring cluster interior joint, is widely used in the big data platform and Yun Ping of Ge great Internet firms
On platform.
For system operator, the meaning of network monitoring system and effect essentially consist at following 2 points:One be can
To be timely concerned about some abnormal conditions of server, and alerted according to the threshold values of preset value, such as disk space is not
Foot, cpu and memory usage increase extremely, and the process of operation increases suddenly, and operation operating speed and contrast before bright
Aobvious slow, certain the stage committed memory exception for running operation simultaneously causes operation that failure is run multiple times, certain node delay machine or
There is large area and delays machine situation in cluster;Also one when gone wrong in complex applied environment, in such as network
When situations such as error of disconnected, application program, system crash occurs, in the face of server and application program, can be given according to monitoring system
The quick DXXW of warning for going out is located to problem, gains time to fix a breakdown.
Partial key operation system has deployed monitoring programme in actual production activity, but there is following the limitation
Property:
Monitoring procedural item is limited, is confined to cpu loads, and internal memory is used, the project such as disk space;Monitoring limitation
Property, it is impossible to it is generalized to other systems and carries out the integration of monitoring data;Safe limitation, it is desirable to be able to which direct detection should to other
Serve port and remotely reading Simple Network Management Protocol (Simple Network Management Protocol,
) etc. SNMP system information, challenging in this business higher for network security requirements.
Nagios can realize comprehensively monitoring the server on network, including the service run on server
The state of the state and server system resources of (Apache, MySQL, FTP, DNS and hadoop, Hbase, Solr etc.).
The quantity of big data application platform operation system is being constantly increasing, and fusion and interaction day each other is cumulative
Many, between application architecture system, produced problem probability also increases therewith, by automatically-monitored and reponse system, can be real-time
The state of platform application and service is checked, the bottleneck of systematic function is found when operation is run, and is automatically processed or is alerted, protected
Demonstrate,prove whole plateform system efficiently and reliably to operate, mitigate the working strength of detection and system manager, improve operating efficiency,
Optimum program design structure, and reduction hinders the loss for bringing for some reason.
Job scheduling system is the important component part for managing cluster and management operating operation, in big data platform application
The DAG (Database Availability Group) for having many job scheduling systems, such as hadoop and spark is dispatched,
The Workflow scheduling of Oozie, but how to combine scheduling system and resource monitoring is that each company focuses on
The problem of solution.Additionally, the monitoring system of big data platform is in specific production application, if can by monitoring data and
Real time execution work data combines, and scoring feeds back to corresponding process engineer and manager accordingly, and when will feed back
The status information of cluster and job information preserve log recording and get off, as knowledge base, to be for future reference.System engineer
Data can have deeper into understanding for the state of existing cluster accordingly, and data preparation is carried out for cluster dilatation in future.
(and business) monitoring software of increasing income has two main problems as follows:
(1) all contents needed for no any instrument can be monitored;
(2) need to allow these instruments to adapt to different self-defined work completely.
The content of the invention
The purpose of the embodiment of the present invention is the method and device for proposing a kind of monitoring and configuring resource, how to ensure prison
Control resource and while operation, can reach optimization cluster resource and utilizes according to monitoring situation again, program feature optimization and can and
Alarm reaches the purpose for stopping loss harm.
It is that, up to this purpose, the embodiment of the present invention is employed the following technical solutions:
In a first aspect, a kind of monitoring and the method for configuring resource, methods described includes:
The information of Ganglia dynamic collection difference in functionality clusters, is carried out to each nodal information in resource monitoring
Scoring is simultaneously periodically recorded in daily record;
Nagios arranges the warning of different stage in resource warning system, and arranges inserting for different types of transmission message
Part self-defined message content, obtain the data of the ganglia transmissions and recorded in the daily record;
The Nagios is optimized and is redistributed to task and resource according to the scoring of the resource reponse system.
Preferably, the information of the Ganglia dynamic collections difference in functionality cluster, saves to each in resource monitoring
Before point information is scored and periodically recorded in daily record, also include:
Monitored device be added by Host Administration, changed, being deleted and inquiry operation, the addition operation includes
It is manually entered and supports that the mode of all devices in the default network segment of automatic topology discovery, the support network topology are sent out automatically
The method for now presetting all devices in the network segment is included automatically finding needing user-specified network section, is scanned institute again in the way of ping
The method that some IP judgement are then added in host table after scanning the type of each equipment.
Preferably, the information of the Ganglia dynamic collections difference in functionality cluster, including:
The cpu and memory information of Ganglia monitoring cluster interior joint, according in the workflow of oozie not
With job and corresponding running status, occupation condition of the determining program in running, in getmad configuration files
Palmic rate required for configuration;
If the operation run on difference in functionality cluster exceedes default operation amount threshold, the Ganglia is according to feedback
The resource information of occupancy and combinations of states yarn or mesos of different nodes on job and corresponding operation conditions, system
Determine the optimisation strategy in program.
Preferably, the information of the Ganglia dynamic collections difference in functionality cluster, including:
The corresponding type of the monitored device is obtained, and corresponding service is gone out by the type search, then to arrange
The form of table shows the service that the monitored device can be monitored.
Preferably, it is described each nodal information to be scored in resource monitoring, including:
Data in different time sections are sampled by the data in each monitoring cycle statistics respective cycle, and
Most value, average to statistical sample, standard deviation.
Preferably, the data in different time sections are sampled, including:
A reference time t is selected first1, according to pi=wi/uiObtain the time tiData Vi of interior generation, the data
The weight of Vi is:wi=f (ti-t1), functions of the f for monotonic nondecreasing;The wi=ea(ti-t1), a>0;uiFor between 0 to 1 with
Machine number.
The device of second aspect, a kind of monitoring and configuration resource, described device include:
Collection module, for the information of dynamic collection difference in functionality cluster;
Grading module, for scoring to each nodal information in resource monitoring;
First logging modle, for periodically recorded in daily record;
Second logging modle, for the warning for arranging different stage in resource warning system, and arranges different types of
The plug-in unit self-defined message content of message are sent, the data of the ganglia transmissions is obtained and be recorded in the daily record;
Distribute module, for being optimized to task and resource and dividing again according to the scoring of the resource reponse system
Match somebody with somebody.
Preferably, described device also includes:
Processing module, for the information in the Ganglia dynamic collections difference in functionality cluster, in resource monitoring
Before each nodal information is scored and periodically recorded in daily record, monitored device is carried out by Host Administration
Addition, modification, deletion and inquiry operation, the addition operation include being manually entered and supporting that automatic topology discovery presets net
The mode of all devices in section, in the default network segment of the support automatic topology discovery, the method for all devices includes sending out automatically
User-specified network section is now needed, all of IP is scanned again in the way of ping and judgement is scanned after the type of each equipment again
The method being added in host table.
Preferably, the collection module, specifically for:
The cpu and memory information of monitoring cluster interior joint, according to different job in the workflow of oozie and right
The running status answered, occupation condition of the determining program in running, required for configuring in getmad configuration files
Palmic rate;If the operation run on difference in functionality cluster exceedes default operation amount threshold, according to the occupancy fed back
Job and corresponding operation conditions in combinations of states yarn or mesos of resource information and different nodes, in formulation program
Optimisation strategy;
The collection module, also particularly useful for::
The corresponding type of the monitored device is obtained, and corresponding service is gone out by the type search, then to arrange
The form of table shows the service that the monitored device can be monitored.
Preferably, institute's scoring module, specifically for:Data in each monitoring cycle statistics respective cycle, to not
It is sampled with the data in the time period, and obtains the most value of statistical sample, average, standard deviation;
Institute's scoring module, also particularly useful for:
A reference time t is selected first1, according to pi=wi/uiObtain the time tiData Vi of interior generation, the data
The weight of Vi is:wi=f (ti-t1), functions of the f for monotonic nondecreasing;The wi=ea(ti-t1), a>0;uiFor between 0 to 1 with
Machine number.
A kind of monitoring provided in an embodiment of the present invention and the method and device of configuration resource, Ganglia dynamic collections are different
The information of function cluster, in resource monitoring is scored and periodically recorded in daily record to each nodal information;
Nagios arranges the warning of different stage in resource warning system, and arranges different types of plug-in unit for sending message and make by oneself
Adopted message content, obtains the data of the ganglia transmissions and recorded in the daily record;The Nagios is according to the resource
The scoring of reponse system is optimized and is redistributed to task and resource.So as to be counted to big according to quantization resource and job information
It is optimized and adjusts according to platform operation;Knowledge base is formed to historical data and processing method, it is convenient to update monitoring mode, and can
To process the problem for running into according to knowledge base.
Description of the drawings
Fig. 1 is the schematic flow sheet of a kind of monitoring provided in an embodiment of the present invention and the method for configuring resource;
Fig. 2 is a kind of schematic flow sheet of the DFD of ganglia provided in an embodiment of the present invention;
Fig. 3 is a kind of Nagios performances processing framework schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of aggregated structure schematic diagram provided in an embodiment of the present invention;
Fig. 5 is the schematic flow sheet of a kind of monitoring provided in an embodiment of the present invention and the method for configuring resource;
Fig. 6 is a kind of schematic flow sheet of monitoring configuration feature provided in an embodiment of the present invention;
Fig. 7 is the high-level schematic functional block diagram of a kind of monitoring provided in an embodiment of the present invention and the device for configuring resource.
Specific embodiment
With reference to the accompanying drawings and examples the embodiment of the present invention is described in further detail.It is understood that this
The described specific embodiment in place is used only for explaining the embodiment of the present invention, rather than the restriction to the embodiment of the present invention.In addition also
It should be noted that for the ease of description, the part related to the embodiment of the present invention rather than entire infrastructure are illustrate only in accompanying drawing.
With reference to Fig. 1, Fig. 1 is the schematic flow sheet of a kind of monitoring provided in an embodiment of the present invention and the method for configuring resource.
As shown in figure 1, the method for the monitoring and configuration resource includes:
Step 101, the information of Ganglia dynamic collection difference in functionality clusters, to each node in resource monitoring
Information is scored and periodically recorded in daily record;
As shown in Fig. 2 wherein Ganglia monitoring systems include three major parts:Gmond, gmetad and ganglia-
web.Pass through XDL (compressed format of xml) between them or XML format transmits monitoring data, reach monitoring effect.In cluster
Node, by run gmond collect publisher node status information, then the periodic poll gmond of gmetad collect
Information, is then stored in rrd databases, can carry out inquiry displaying to which by web server.The system load that gmond brings
Seldom, the performance without affecting user can be run in the cluster on each server.As cluster is in the network, can be with
" shake (Jitter) " between clustered node is avoided by clock (NTP) setting of clustered node.
Secondary development for ganglia adopts SOA patterns.
As shown in figure 3, Nagios carries out data acquisition used in big data plateform system, due to the data lattice for collecting
Formula does not meet routine use and management, needs to parse by monitoring the performance data for producing to Nagios, is parsed into and meets
The data of daily management specification, and be saved in system database, for the displaying of data.
The mentality of designing of performance processing framework is sent out by the performance data that Nagios is collected by socket modes herein
The middleware program of independent research is delivered to, then the program carries out dissection process, after unified form is formed, then unified transmission
To system database.
In performance data analysis program, need process performance data options are opened inside the service definition of Nagios,
Otherwise can no performance data output.The definition process performance data order in command file:
Wherein 192.168.251.60 is the short IP of Nagios services in experiment.For development approach, adopt herein
Socket modes realize that generate jar bags the service of registering, method is as follows:
(1) judge performance data, then report an error and point out if null, find time resource associated component and index, modification
Index is 1.
(2) capability array segmentation is obtained by corresponding array by regular expression.
(3) each element in array is circulated, is split with equal sign, the equal sign left side is index name, again with branch point on the right of equal sign
Cut, take first element, be to refer to target value.
(4) whether query monitor example is in database.If not processed if, if there is then carrying out lower step
Process.
(5) by service name, index name inquires about the index with the presence or absence of in database, new if there is no then increase
Alert type.
(6) desired value is stored in into database.
Preferably, the information of the Ganglia dynamic collections difference in functionality cluster, including:
The cpu and memory information of Ganglia monitoring cluster interior joint, according in the workflow of oozie not
With job and corresponding running status, occupation condition of the determining program in running, in getmad configuration files
Palmic rate required for configuration;
If the operation run on difference in functionality cluster exceedes default operation amount threshold, the Ganglia is according to feedback
The resource information of occupancy and combinations of states yarn or mesos of different nodes on job and corresponding operation conditions, system
Determine the optimisation strategy in program.
Preferably, the information of the Ganglia dynamic collections difference in functionality cluster, including:
The corresponding type of the monitored device is obtained, and corresponding service is gone out by the type search, then to arrange
The form of table shows the service that the monitored device can be monitored.
Preferably, it is described each nodal information to be scored in resource monitoring, including:
Data in different time sections are sampled by the data in each monitoring cycle statistics respective cycle, and
Most value, average to statistical sample, standard deviation.
The data in different time sections are sampled, including:
A reference time t is selected first1, according to pi=wi/uiObtain the time tiData Vi of interior generation, the data
The weight of Vi is:wi=f (ti-t1), functions of the f for monotonic nondecreasing;The wi=ea(ti-t1), a>0;uiFor between 0 to 1 with
Machine number.
Step 102, Nagios arrange the warning of different stage in resource warning system, and arrange different types of transmission
The plug-in unit self-defined message content of message, obtains the data of the ganglia transmissions and recorded in the daily record;
Step 103, the Nagios are optimized and are weighed to task and resource according to the scoring of the resource reponse system
It is new to distribute.
Specifically, as shown in figure 4, being cluster environment in terms of software and hardware, different clusters can constitute different groups, such as
Hadoop groups, solr groups, spark groups etc., it is general for linux system in cluster, it is CentOs6.4 systems during this contrived experiment.
It is different function cluster components in being production system in big data platform, as the bottom of component is stored as hadoop's
HDFS, thus need configuration hadoop metrics so that the feature card of ganglia and nagios can and cluster
Association.The information of Ganglia dynamic collection difference in functionality clusters, in resource monitoring, comments to each nodal information
Divide and periodically recorded in daily record, business can be adjusted check from now on and record according to historical data accordingly
It is whole.The warning of different stage, in resource warning system, can be set, and different types of plug-in unit for sending message is set simultaneously certainly
Message content is defined, by the incoming data of ganglia, nagios makes corresponding reaction, and recorded in corresponding daily record.
The standards of grading such as table 1 of resource reponse system can be set, the behavior of warning is defined according to standard, it is possible to according to feedback system
The scoring of system is optimized and is redistributed to task and resource.
Table 1
Scoring object model in table 1 is the linear relationship scoring of acquiescence herein, also dependent on real data and needs
Change other models.
A kind of monitoring provided in an embodiment of the present invention and the method for configuring resource, Ganglia dynamic collection difference in functionality collection
The information of group, in resource monitoring is scored and periodically recorded in daily record to each nodal information;Nagios
The warning of different stage is set in resource warning system, and different types of plug-in unit self-defined message for sending message is set
Content, obtains the data of the ganglia transmissions and recorded in the daily record;The Nagios is according to resource feedback system
The scoring of system is optimized and is redistributed to task and resource.So as to according to quantifying resource and job information to big data platform
Operation is optimized and adjusts;Knowledge base is formed to historical data and processing method, it is convenient to update monitoring mode, it is possible to foundation
The problem that knowledge base process runs into.
With reference to Fig. 5, Fig. 5 is the schematic flow sheet of a kind of monitoring provided in an embodiment of the present invention and the method for configuring resource.
As shown in figure 5, the method for the monitoring and configuration resource includes:
Step 501, is added to monitored device by Host Administration, is changed, being deleted and inquiry operation, the addition
Operation includes the mode for being manually entered and supporting all devices in the default network segment of automatic topology discovery, and the support network is opened up
Flutter automatically find the method for all devices in the default network segment include finding automatically needing user-specified network section, again with the side of ping
Formula scans the method being then added in host table after all of IP judging scans the type of each equipment;
Specifically, as shown in fig. 6, the module design of monitoring system:
The management of main frame and host groups:(1) when host name (2) network address (3) monitoring period (4) contact person (5) notifies
Section.
Service and the management of service group:(1) host name (2) monitor command (3) monitoring period, contact person and notice section are notified
Deng.
Time rule is managed:(1) in (3) time period time period that title (2) is specifically defined, the date specifies (4) special day
Phase (festivals or holidays that need not such as monitor etc.).
Monitored device be added by Host Administration, changed, being deleted and inquiry operation.Addition equipment supports hand
Dynamic addition, i.e. manual input device title and IP address;Also support the side of all devices in the automatic topology discovery network segment
Formula, finds to need user-specified network section automatically, and acquiescence is the gateway that server is located herein, is then scanned in the way of ping
All of IP, judgement scan the type of each equipment, are finally then added to (res_host) in host table.
Specific policer operation is implemented to monitored object by resource distribution.Resource distribution can pass through devices selected first
The type corresponding to which is found, and then corresponding service is found out by the type, then show this in the form of a list
The service that equipment can be monitored:After validation, the service that will confirm that is added in resource instances list, finally by equipment and service
Example is written in configuration file.
Step 502, the information of Ganglia dynamic collection difference in functionality clusters, to each node in resource monitoring
Information is scored and periodically recorded in daily record;
Step 503, Nagios arrange the warning of different stage in resource warning system, and arrange different types of transmission
The plug-in unit self-defined message content of message, obtains the data of the ganglia transmissions and recorded in the daily record;
Step 504, the Nagios are optimized and are weighed to task and resource according to the scoring of the resource reponse system
It is new to distribute.
Represent aspect in application, the various application datas of monitoring resource can be collected and be arranged, to warning message
Presented with source, and dilatation accordingly to cluster and the distribution of task are provided informative suggestion, it is possible to as needed
Different plug-in units are set, so as to obtain different warning species.System personnel and programmer can also be combined according to sink information
Past monitoring knowledge, carries out overall merit to present running status, so as to carry out further resource allocation and task point
Match somebody with somebody, it is also possible to which the resource and job run situation of corresponding time period, such as accompanying drawing 6 are checked according to job run situation.
Modelling can be carried out according to difference in functionality cluster deck system personnel are served by, mainly include cluster
The postitallation evaluation of packet, the configuration of points-scoring system, different task and operation, and the early warning configuration of difference in functionality cluster is (herein
The alarm mode of acquiescence is note or email).Programmer can arrange and check related operation procedure operation conditions, and according to
Some important parameter indexs such as sessions run time, operation occupancy cpu check figures and internal memory ratio, and operation peak
Thread Count evaluating to operation overall operation index, and as improving the important references of programming and performance.
The cpu and memory information of Ganglia monitoring cluster interior joints, according to the different job in the workflow of oozie
And running status, two combine occupation condition of the determining program in running, match somebody with somebody in getmad configuration files
Palmic rate (generally 30ms) required for putting.If the operation for also running on cluster is more, need to be fed back according to ganglia
The resource information of occupancy and combinations of states yarn or mesos of different nodes on job operation conditions, optimize operation process
And in terms of adjusting the details of operation, and the thus optimisation strategy in formulation program.
In the job run of present cluster, consume with IO due to calculating so that need to obtain Query Result and final fortune
Row result has been sometimes more than the stipulated time, especially when spark clusters operationally, it is larger to memory consumption, sometimes can
Affected by the operation to same other job of cluster, be at this moment accomplished by according to feedback information optimizing program, fixed point is carried out
Compression Strategies (such as snappy and LZO) and serialization (Protobuf or Kryo, Avro) strategy, reduce resource consumption.
With reference to Fig. 7, Fig. 7 is a kind of monitoring provided in an embodiment of the present invention and the functional module signal of the device for configuring resource
Figure.
As shown in fig. 7, described device includes:
Collection module 701, for the information of dynamic collection difference in functionality cluster;
Grading module 702, for scoring to each nodal information in resource monitoring;
First logging modle 703, for periodically recorded in daily record;
Second logging modle 704, for the warning for arranging different stage in resource warning system, and arranges variety classes
Transmission message plug-in unit and self-defined message content, obtain the data that the ganglia sends and simultaneously recorded in the daily record;
Distribute module 705, for being optimized to task and resource and again according to the scoring of the resource reponse system
Distribution.
Preferably, described device also includes:
Processing module, for the information in the Ganglia dynamic collections difference in functionality cluster, in resource monitoring
Before each nodal information is scored and periodically recorded in daily record, monitored device is carried out by Host Administration
Addition, modification, deletion and inquiry operation, the addition operation include being manually entered and supporting that automatic topology discovery presets net
The mode of all devices in section, in the default network segment of the support automatic topology discovery, the method for all devices includes sending out automatically
User-specified network section is now needed, all of IP is scanned again in the way of ping and judgement is scanned after the type of each equipment again
The method being added in host table.
Preferably, the collection module 701, specifically for:
The cpu and memory information of monitoring cluster interior joint, according to different job in the workflow of oozie and right
The running status answered, occupation condition of the determining program in running, required for configuring in getmad configuration files
Palmic rate;If the operation run on difference in functionality cluster exceedes default operation amount threshold, according to the occupancy fed back
Job and corresponding operation conditions in combinations of states yarn or mesos of resource information and different nodes, in formulation program
Optimisation strategy;
The collection module 701, also particularly useful for:
The corresponding type of the monitored device is obtained, and corresponding service is gone out by the type search, then to arrange
The form of table shows the service that the monitored device can be monitored.
Preferably, institute's scoring module 702, specifically for:Data in each monitoring cycle statistics respective cycle are right
Data in different time sections are sampled, and obtain the most value of statistical sample, average, standard deviation;
Institute's scoring module 702, also particularly useful for:
A reference time t is selected first1, according to pi=wi/uiObtain the time tiData Vi of interior generation, the data
The weight of Vi is:wi=f (ti-t1), functions of the f for monotonic nondecreasing;The wi=ea(ti-t1), a>0;uiFor between 0 to 1 with
Machine number.
A kind of monitoring provided in an embodiment of the present invention and the device of configuration resource, Ganglia dynamic collection difference in functionality collection
The information of group, in resource monitoring is scored and periodically recorded in daily record to each nodal information;Nagios
The warning of different stage is set in resource warning system, and different types of plug-in unit self-defined message for sending message is set
Content, obtains the data of the ganglia transmissions and recorded in the daily record;The Nagios is according to resource feedback system
The scoring of system is optimized and is redistributed to task and resource.So as to according to quantifying resource and job information to big data platform
Operation is optimized and adjusts;Knowledge base is formed to historical data and processing method, it is convenient to update monitoring mode, it is possible to foundation
The problem that knowledge base process runs into.
The know-why of the embodiment of the present invention is described above in association with specific embodiment.These descriptions are intended merely to explain this
The principle of inventive embodiments, and the restriction to embodiment of the present invention protection domain can not be construed to by any way.Based on herein
Explanation, those skilled in the art associate by need not paying performing creative labour the embodiment of the present invention other are concrete
Embodiment, these modes are fallen within the protection domain of the embodiment of the present invention.
Claims (10)
1. it is a kind of monitoring and configure resource method, it is characterised in that methods described includes:
The information of Ganglia dynamic collection difference in functionality clusters, scores to each nodal information in resource monitoring
And periodically recorded in daily record;
Nagios arranges the warning of different stage in resource warning system, and arranges different types of plug-in unit for sending message simultaneously
Self-defined message content, obtains the data of the ganglia transmissions and recorded in the daily record;
The Nagios is optimized and is redistributed to task and resource according to the scoring of the resource reponse system.
2. method according to claim 1, it is characterised in that the letter of the Ganglia dynamic collections difference in functionality cluster
Breath, before each nodal information being scored and periodically recorded in daily record in resource monitoring, also includes:
Monitored device be added by Host Administration, changed, being deleted and inquiry operation, the addition operation is including manual
The mode of all devices in input and the default network segment of support automatic topology discovery, the support automatic topology discovery are pre-
If in the network segment method of all devices include automatically finding needing user-specified network section, scanned in the way of ping again it is all of
The method that IP judgement are then added in host table after scanning the type of each equipment.
3. method according to claim 1, it is characterised in that the letter of the Ganglia dynamic collections difference in functionality cluster
Breath, including:
The cpu and memory information of the Ganglia monitoring cluster interior joint, according to the different job in the workflow of oozie
And corresponding running status, occupation condition of the determining program in running, configure in getmad configuration files
Required palmic rate;
If the operation run on difference in functionality cluster exceedes default operation amount threshold, the Ganglia is accounted for according to feedback
Job and corresponding operation conditions in combinations of states yarn or mesos of resource information and different nodes, formulates journey
Optimisation strategy in sequence.
4. method according to claim 1, it is characterised in that the letter of the Ganglia dynamic collections difference in functionality cluster
Breath, including:
The corresponding type of the monitored device is obtained, and corresponding service is gone out by the type search, then with list
Form shows the service that the monitored device can be monitored.
5. method according to claim 1, it is characterised in that described each nodal information to be entered in resource monitoring
Row scoring, including:
Data in different time sections are sampled, and are united by the data in each monitoring cycle statistics respective cycle
The most value of meter sample, average, standard deviation.
6. method according to claim 5, it is characterised in that the data in different time sections are sampled, bag
Include:
A reference time t is selected first1, according to pi=wi/uiObtain the time tiData V of interior generationi, data Vi's
Weight is:wi=f (ti-t1), functions of the f for monotonic nondecreasing;It is describeda>0;uiFor the random number between 0 to 1.
7. it is a kind of monitoring and configure resource device, it is characterised in that described device includes:
Collection module, for the information of dynamic collection difference in functionality cluster;
Grading module, for scoring to each nodal information in resource monitoring;
First logging modle, for periodically recorded in daily record;
Second logging modle, for the warning for arranging different stage in resource warning system, and arranges different types of transmission
The plug-in unit self-defined message content of message, obtains the data of the ganglia transmissions and recorded in the daily record;
Distribute module, for task and resource being optimized and being redistributed according to the scoring of the resource reponse system.
8. device according to claim 7, it is characterised in that described device also includes:
Processing module, for the information in the Ganglia dynamic collections difference in functionality cluster, to each in resource monitoring
Before individual nodal information is scored and periodically recorded in daily record, monitored device is added by Host Administration
Plus, modification, delete and inquiry operation, addition operation is including being manually entered and support that automatic topology discovery presets the network segment
The mode of middle all devices, in the default network segment of the support automatic topology discovery, the method for all devices includes discovery automatically
User-specified network section is needed, all of IP is scanned again in the way of ping and is added after judgement scans the type of each equipment again
The method being added in host table.
9. device according to claim 7, it is characterised in that the collection module, specifically for:
The cpu and memory information of monitoring cluster interior joint, according to different job in the workflow of oozie and corresponding
Running status, occupation condition of the determining program in running, the heart required for configuring in getmad configuration files
Frequency hopping rate;If the operation run on difference in functionality cluster exceedes default operation amount threshold, according to the resource of the occupancy of feedback
Job and corresponding operation conditions in combinations of states yarn or mesos of information and different nodes, it is excellent in formulation program
Change strategy;
The collection module, also particularly useful for::
The corresponding type of the monitored device is obtained, and corresponding service is gone out by the type search, then with list
Form shows the service that the monitored device can be monitored.
10. device according to claim 7, it is characterised in that institute's scoring module, specifically for:In each monitoring week
Data in different time sections are sampled by the data in phase statistics respective cycle, and obtain the most value, of statistical sample
Value, standard deviation;
Institute's scoring module, also particularly useful for:
A reference time t is selected first1, according to pi=wi/uiObtain the time tiData V of interior generationi, data Vi's
Weight is:wi=f (ti-t1), functions of the f for monotonic nondecreasing;It is describeda>0;uiFor random between 0 to 1
Number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611140737.9A CN106533792A (en) | 2016-12-12 | 2016-12-12 | Method and device for monitoring and configuring resources |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611140737.9A CN106533792A (en) | 2016-12-12 | 2016-12-12 | Method and device for monitoring and configuring resources |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106533792A true CN106533792A (en) | 2017-03-22 |
Family
ID=58342011
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611140737.9A Pending CN106533792A (en) | 2016-12-12 | 2016-12-12 | Method and device for monitoring and configuring resources |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106533792A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108243061A (en) * | 2017-10-10 | 2018-07-03 | 北京车和家信息技术有限公司 | Apparatus monitoring method, device and computer equipment based on Nagios |
CN108616421A (en) * | 2018-04-13 | 2018-10-02 | 郑州云海信息技术有限公司 | A kind of condition detection method of multi-node cluster, device and equipment |
CN108845865A (en) * | 2018-06-28 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of monitoring service dispositions method, system and storage medium |
CN109951313A (en) * | 2019-01-18 | 2019-06-28 | 长江大学 | A kind of monitoring device and method of Hadoop cloud platform |
CN110545326A (en) * | 2019-09-10 | 2019-12-06 | 杭州数梦工场科技有限公司 | Cluster load scheduling method and device, electronic equipment and storage medium |
CN110795301A (en) * | 2018-08-01 | 2020-02-14 | 马上消费金融股份有限公司 | Job monitoring method, device, terminal and computer storage medium |
CN111435319A (en) * | 2019-01-15 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Cluster management method and device |
CN112241349A (en) * | 2020-10-21 | 2021-01-19 | 山东超越数控电子股份有限公司 | Method and system for automatically configuring and managing network IP address of whole cabinet server |
CN112291194A (en) * | 2020-09-27 | 2021-01-29 | 上海赫千电子科技有限公司 | State management method and device based on ECU in vehicle-mounted network and intelligent automobile |
CN113495840A (en) * | 2021-06-22 | 2021-10-12 | 北京交通大学 | Big data platform testing method based on bottleneck resource positioning and parameter optimization |
CN117749645A (en) * | 2023-11-29 | 2024-03-22 | 北京金诺珩科技发展有限公司 | Machine room dynamic IP address data acquisition method |
CN118394455A (en) * | 2024-07-01 | 2024-07-26 | 北京科杰科技有限公司 | Big data assembly cluster management system based on java language |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050132041A1 (en) * | 2003-12-10 | 2005-06-16 | Ashish Kundu | Systems, methods and computer programs for monitoring distributed resources in a data processing environment |
CN103905253A (en) * | 2014-04-04 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Server monitoring and management method based on Nagios and BMC |
CN104092575A (en) * | 2014-07-29 | 2014-10-08 | 中国联合网络通信集团有限公司 | Resource monitoring method and system |
CN105208098A (en) * | 2015-08-24 | 2015-12-30 | 用友网络科技股份有限公司 | Cloud monitoring system realization device and method |
CN105260235A (en) * | 2015-09-23 | 2016-01-20 | 浪潮集团有限公司 | Method and device for scheduling resources on basis of application scenarios in cloud platform |
CN105718351A (en) * | 2016-01-08 | 2016-06-29 | 北京汇商融通信息技术有限公司 | Hadoop cluster-oriented distributed monitoring and management system |
CN105975378A (en) * | 2016-05-11 | 2016-09-28 | 国网江苏省电力公司 | Distributed layering autonomous monitoring and management system facing supercomputer |
-
2016
- 2016-12-12 CN CN201611140737.9A patent/CN106533792A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050132041A1 (en) * | 2003-12-10 | 2005-06-16 | Ashish Kundu | Systems, methods and computer programs for monitoring distributed resources in a data processing environment |
CN103905253A (en) * | 2014-04-04 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Server monitoring and management method based on Nagios and BMC |
CN104092575A (en) * | 2014-07-29 | 2014-10-08 | 中国联合网络通信集团有限公司 | Resource monitoring method and system |
CN105208098A (en) * | 2015-08-24 | 2015-12-30 | 用友网络科技股份有限公司 | Cloud monitoring system realization device and method |
CN105260235A (en) * | 2015-09-23 | 2016-01-20 | 浪潮集团有限公司 | Method and device for scheduling resources on basis of application scenarios in cloud platform |
CN105718351A (en) * | 2016-01-08 | 2016-06-29 | 北京汇商融通信息技术有限公司 | Hadoop cluster-oriented distributed monitoring and management system |
CN105975378A (en) * | 2016-05-11 | 2016-09-28 | 国网江苏省电力公司 | Distributed layering autonomous monitoring and management system facing supercomputer |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108243061A (en) * | 2017-10-10 | 2018-07-03 | 北京车和家信息技术有限公司 | Apparatus monitoring method, device and computer equipment based on Nagios |
CN108616421A (en) * | 2018-04-13 | 2018-10-02 | 郑州云海信息技术有限公司 | A kind of condition detection method of multi-node cluster, device and equipment |
CN108845865A (en) * | 2018-06-28 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of monitoring service dispositions method, system and storage medium |
CN110795301A (en) * | 2018-08-01 | 2020-02-14 | 马上消费金融股份有限公司 | Job monitoring method, device, terminal and computer storage medium |
CN111435319A (en) * | 2019-01-15 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Cluster management method and device |
CN109951313B (en) * | 2019-01-18 | 2022-04-19 | 长江大学 | Monitoring device and method for Hadoop cloud platform |
CN109951313A (en) * | 2019-01-18 | 2019-06-28 | 长江大学 | A kind of monitoring device and method of Hadoop cloud platform |
CN110545326A (en) * | 2019-09-10 | 2019-12-06 | 杭州数梦工场科技有限公司 | Cluster load scheduling method and device, electronic equipment and storage medium |
CN110545326B (en) * | 2019-09-10 | 2022-09-16 | 杭州数梦工场科技有限公司 | Cluster load scheduling method and device, electronic equipment and storage medium |
CN112291194A (en) * | 2020-09-27 | 2021-01-29 | 上海赫千电子科技有限公司 | State management method and device based on ECU in vehicle-mounted network and intelligent automobile |
CN112291194B (en) * | 2020-09-27 | 2022-12-13 | 上海赫千电子科技有限公司 | State management method and device based on ECU in vehicle-mounted network and intelligent automobile |
CN112241349A (en) * | 2020-10-21 | 2021-01-19 | 山东超越数控电子股份有限公司 | Method and system for automatically configuring and managing network IP address of whole cabinet server |
CN113495840A (en) * | 2021-06-22 | 2021-10-12 | 北京交通大学 | Big data platform testing method based on bottleneck resource positioning and parameter optimization |
CN117749645A (en) * | 2023-11-29 | 2024-03-22 | 北京金诺珩科技发展有限公司 | Machine room dynamic IP address data acquisition method |
CN117749645B (en) * | 2023-11-29 | 2024-06-04 | 北京金诺珩科技发展有限公司 | Machine room dynamic IP address data acquisition method |
CN118394455A (en) * | 2024-07-01 | 2024-07-26 | 北京科杰科技有限公司 | Big data assembly cluster management system based on java language |
CN118394455B (en) * | 2024-07-01 | 2024-08-23 | 北京科杰科技有限公司 | Big data assembly cluster management system based on java language |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106533792A (en) | Method and device for monitoring and configuring resources | |
US11677635B2 (en) | Hierarchical network analysis service | |
US10108411B2 (en) | Systems and methods of constructing a network topology | |
US20180129579A1 (en) | Systems and Methods with a Realtime Log Analysis Framework | |
CN108197261A (en) | A kind of wisdom traffic operating system | |
CN106452881B (en) | Operation and maintenance data processing system based on cloud adding mode | |
US8504733B1 (en) | Subtree for an aggregation system | |
US8769095B2 (en) | System and method for dynamically grouping devices based on present device conditions | |
WO2023142054A1 (en) | Container microservice-oriented performance monitoring and alarm method and alarm system | |
US10318333B2 (en) | Optimizing allocation of virtual machines in cloud computing environment | |
US10819584B2 (en) | System and method for performing actions based on future predicted metric values generated from time-series data | |
CN104917627B (en) | A kind of log cluster for large server cluster scans and analysis method | |
CN101095307A (en) | Network management appliance | |
CN102567531B (en) | General method for monitoring status of light database | |
US9032518B2 (en) | Internet monitoring and alerting system | |
CN113179173A (en) | Operation and maintenance monitoring system for highway system | |
CN108052358B (en) | Distributed deployment system and method | |
US10466686B2 (en) | System and method for automatic configuration of a data collection system and schedule for control system monitoring | |
US11477077B1 (en) | Change management system with monitoring, alerting, and trending for information technology environment | |
CN116629802A (en) | Big data platform system for railway port station | |
CN111125450A (en) | Management method of multilayer topology network resource object | |
CN103226572A (en) | Expandable monitoring method and monitoring system based on data compression | |
KR20160097502A (en) | Ems server and log data management method thereof | |
AT&T | Microsoft Word - sigmod_2011_final.doc | |
CN114168672A (en) | Log data processing method, device, system and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170322 |