CN113868017B

CN113868017B - Data management method and system for full flash system

Info

Publication number: CN113868017B
Application number: CN202110960589.XA
Authority: CN
Inventors: 刘文国
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2024-01-12
Anticipated expiration: 2041-08-20
Also published as: CN113868017A

Abstract

The invention provides a data management method and a system of a full-flash system, wherein the full-flash system comprises a plurality of nodes, and the method comprises the following steps: logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node; selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain; when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume; when one node in the domain fails, adding a normal node of the associated domain into the domain with the node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located. The reliability and usability of the storage system are improved.

Description

Data management method and system for full flash system

Technical Field

The invention relates to the technical field of data center data management, in particular to a data management method and system of a full flash system.

Background

The simple volume is a data management method designed for optimizing the utilization rate of the storage space, and is different from the storage space which is allocated to the user for a fixed size at one time, and only the storage space which is actually used by the user is allocated to the simple volume. In a dual-control environment, two storage nodes form a storage system, wherein one node is used as a main node of a thin volume and used for processing read-write requests, metadata transactions and the like, and the other node is used as a standby node of the thin volume and used for backing up metadata of the thin volume on the main node. When one node of the double control fails, all the thin volumes are processed by a single node, so that the performance is reduced, and the metadata security cannot be ensured.

Disclosure of Invention

Aiming at the problems that when one node of double control fails, all the reduced volumes are processed by a single node, the performance is reduced, and the metadata security cannot be ensured, the invention provides a data management method and system of a full flash system.

The technical scheme of the invention is as follows:

in one aspect, the present invention provides a data management method for a full-flash system, where the full-flash system includes a plurality of nodes, and the method includes the following steps:

logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node;

selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;

when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume;

when one node in the domain fails, adding a normal node of the associated domain into the domain with the node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located. And ensuring that the service is not interrupted.

Further, when the request for accessing the thin volume is received, the step of performing request processing through the node for creating the thin volume includes:

when a request for accessing the thin volume is received, judging whether a node for receiving the request is a node for creating the thin volume;

if yes, carrying out request processing by creating nodes of the simple volume;

otherwise, forwarding the request to the node creating the thin volume; the method comprises the following steps: request processing is performed by the node that created the thin volume.

Further, the step of performing request processing by the node creating the thin volume includes:

the node for creating the thin volume judges the received request;

when the request is a write request, judging whether the request is update write;

if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address;

otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; data is written to the assigned physical address.

The node for creating the thin volume is a master node of the domain where the thin volume is located, and is used for processing a read-write request for accessing the thin volume, while the other node is used as a backup node for backing up metadata on the master node.

Further, the step of performing request processing by the node creating the thin volume further includes:

when the request is a read request, the corresponding physical address in the reduced volume is found by looking up the metadata according to the logical block address and the data length of the read request, and the data is read out from the physical address.

Further, when one node in the domain fails, the step of adding the normal node of the associated domain to the domain with the node failure includes:

when one node fails, setting the domain with the node failure and the associated domain into a silence state;

the main node of the fault domain reduced volume is switched from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;

adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;

the metadata of the fault domain reduced volume is mirrored from the original standby node to the new standby node;

the two domains that entered the silence state are restored to the operational state.

Further, the step of rejoining the original domain after the failed node returns to normal includes:

when the fault node is recovered to be normal from the fault state, setting the domain where the fault node is and the associated domain into a silent state;

the new standby node is withdrawn from the domain where the fault node is located, and the node which recovers the fault is added into the domain where the node is located before the fault;

switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;

discarding the metadata of the fault domain reduced volume stored by the new standby node;

mirroring the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault;

the setting is put into the silent state to restore the running state. The performance, reliability and usability of the storage system are improved.

In another aspect, a data management system of a full-flash system includes a plurality of nodes, the data management system including a domain division module, a volume creation module, a request processing module, and a node failure processing module;

the domain dividing module is used for logically forming a domain by two adjacent nodes, and logically forming a closed loop by a first node and a last node;

the volume creation module is used for selecting one node in each domain and creating a simplified volume on the selected node; the node for creating the reduced volume in each domain is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;

the request processing module is used for setting the node for creating the thin volume to process the request when the node receives the request for accessing the thin volume;

the node fault processing module is used for adding normal nodes of the associated domain into the domain with the node fault when one node in the domain fails; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located.

Further, the request processing module comprises a judging unit, a triggering unit and a request processing unit;

the judging unit is used for judging whether the node receiving the request is the node for creating the thin volume or not when the node receives the request for accessing the thin volume;

the triggering unit is used for triggering the node for creating the reduced volume to perform request processing;

and the request processing unit is used for forwarding the request to the node for creating the thin volume when the judging unit judges that the node for receiving the request is not the node for creating the thin volume.

Further, a node for creating the thin volume is used for judging the received request; when the request is a write request, judging whether the request is update write; if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address; otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; writing data into the allocated physical address; and the method is also used for finding the corresponding physical address in the reduced volume by searching the metadata according to the logical block address and the data length of the read request when the request is the read request, and reading the data from the physical address.

Further, the node fault processing module comprises a setting unit, a switching unit, a node domain processing unit and a mirror image processing unit;

a setting unit, configured to set, when a node fails, a domain in which the node fails and an associated domain into a silence state;

the switching unit is used for switching the main node of the reduced volume in the fault domain from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;

the node domain processing unit is used for adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;

the mirror image processing unit is used for mirroring the metadata of the fault domain reduced volume from the original standby node to the new standby node;

and the setting unit is also used for recovering the two domains entering the silence state to the running state.

Further, the node fault processing module further comprises a metadata processing unit;

the setting unit is also used for setting the domain where the fault node is and the associated domain into a silent state when the fault node is recovered from the fault state;

the node domain processing unit is also used for exiting the new standby node from the domain where the fault node is located, and adding the node with the fault recovery into the domain where the node is located before the fault;

the switching unit is also used for switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;

the metadata processing unit is used for discarding the metadata of the fault domain reduced volume stored by the new standby node;

and the mirror image processing unit mirrors the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault.

Further, the full flash system comprises four nodes, namely node 0, node 1, node 2 and node 3; node 0 and node 1 constitute domain 0, node 1 and node 2 constitute domain1, node 2 and node 3 constitute domain2, and node 3 and node 0 constitute domain 3;

creating a thin volume 0 on node 0, the thin volume being managed by domain 0 and the master node being node 0;

creating a thin volume 1 on node 1, the thin volume being managed by domain1 and the primary node being node 1;

creating a thin volume 2 on node 2, the thin volume being managed by domain2 and the master node being node 2;

a thin volume 3 is created on node 3, managed by domain 3 and the master node is node 3.

From the above technical scheme, the invention has the following advantages: the full flash system comprises a plurality of nodes, wherein any node can access a solid-state disk at the rear end, two adjacent nodes are selected to logically form a domain, and the first node and the last node also logically form a domain to form a closed loop; different thin volumes belong to different domain management, when one node in one domain fails, normal nodes in other domains can join the domain with the node failure, so that the service is not interrupted, and when the failed node is restored to normal, the original domain is added again. The reliability and usability of the storage system are improved.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

It can be seen that the present invention has outstanding substantial features and significant advances over the prior art, as well as its practical advantages.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic diagram of domain partitioning in a method of one embodiment of the invention.

FIG. 2 is a schematic diagram of a request processing flow in a method according to one embodiment of the invention.

Fig. 3 is a schematic diagram of four-control full flash system management according to an embodiment of the present invention.

FIG. 4 is a block diagram of a management system connection in one embodiment of the invention.

In the figure, the 11-domain partitioning module, the 22-volume creation module, the 33-request processing module, and the 44-node failure processing module.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a data management method of a full-flash system, where the full-flash system includes a plurality of nodes, node 1-node n, and the method includes the following steps:

step 1: logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node;

step 2: selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;

step 3: when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume;

step 4: when one node in the domain fails, adding a normal node of the associated domain into the domain with the node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located. And ensuring that the service is not interrupted.

In some embodiments, in step 3, when the request for accessing the thin volume is received, the step of performing the request processing by the node that creates the thin volume includes:

step 31: when a request for accessing the thin volume is received, judging whether a node for receiving the request is a node for creating the thin volume; if yes, go to step 33, otherwise go to step 32;

step 32: forwarding the request to the node that created the thin volume; step 33 is performed;

step 33: request processing is performed by the node that created the thin volume.

As shown in fig. 2, in step 33, the step of performing request processing by the node that creates the thin volume includes:

step 331: the node for creating the thin volume judges the received request; step 332 is performed when the request is a write request, and step 335 is performed when the request is a read request;

step 332: judging whether the request is update write; if yes, go to step 333, otherwise, go to step 334;

step 333: according to the logical block address and the data length of the write request, searching the metadata to find the corresponding physical address in the reduced volume, and writing the data into the physical address;

step 334: distributing a physical address according to the logical block address and the data length of the write request, generating new metadata, and transmitting the metadata to a standby node of a domain where the metadata is located; data is written to the assigned physical address.

Step 335: and according to the logical block address and the data length of the read request, finding the corresponding physical address in the reduced volume by searching the metadata, and reading the data from the physical address.

In some embodiments, in step 4, when one node in the domain fails, the step of adding the normal node of the associated domain to the domain with the node failure includes:

step 411: when one node fails, setting the domain with the node failure and the associated domain into a silence state;

step 412: the main node of the fault domain reduced volume is switched from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;

step 413: adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;

step 414: the metadata of the fault domain reduced volume is mirrored from the original standby node to the new standby node;

step 415: the two domains that entered the silence state are restored to the operational state.

In step 4, the step of rejoining the original domain after the failed node returns to normal includes:

step 421: when the fault node is recovered to be normal from the fault state, setting the domain where the fault node is and the associated domain into a silent state;

step 422: the new standby node is withdrawn from the domain where the fault node is located, and the node which recovers the fault is added into the domain where the node is located before the fault;

step 423: switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;

step 424: discarding the metadata of the fault domain reduced volume stored by the new standby node;

step 425: mirroring the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault;

step 426: the setting is put into the silent state to restore the running state.

The performance, reliability and usability of the storage system are improved.

As shown in fig. 3, in a specific embodiment, the all-flash system includes four nodes, namely, node 0, node 1, node 2, and node 3; node 0 and node 1 make up domain 0, node 1 and node 2 make up domain1, node 2 and node 3 make up domain2, and node 3 and node 0 make up domain 3; it should be noted that domain is a domain.

creating a thin volume 1 on a node 1, the thin volume being managed by domain1 and the master node being node 1;

creating a thin volume 2 on the node 2, the thin volume being managed by domain2 and the master node being node 2;

The writing process mainly comprises the following steps:

(1) The write request accesses the thin volume 0, is issued from the node 0, and turns to the step (4);

(2) The write request accesses the thin volume 0, is issued from node 1 or node 2 or node 3, and turns to step (3);

(3) The write request is forwarded from node 1 or node 2 or node 3 to node 0;

(4) Node 0 judges whether the request is update write, if yes, turning to step (5), if no, turning to step (6);

(5) According to LBA of the write request and the data length, searching metadata to find a corresponding physical address in the reduced volume, and writing the data into the physical address;

(6) And allocating a physical address according to the LBA and the data length of the write request, generating new metadata, sending the metadata to the standby node, and writing the data into the allocated physical address.

The reading process mainly comprises the following steps:

(1) The read request accesses the thin volume 0, is issued from the node 0, and turns to the step (4);

(2) The read request accesses the thin volume 0, is issued from the node 1 or the node 2 or the node 3, and turns to the step (3);

(3) The read request is forwarded from node 1 or node 2 or node 3 to node 0;

(4) And the node 0 finds the corresponding physical address in the reduced volume by searching the metadata according to the LBA and the data length of the read request, and reads the data from the physical address.

One node of domain (e.g., node 0 of domain 0) fails, and the process flow includes the following steps:

(1) Node 0 fails, domain 0 and domain1 enter silence states;

(2) The master node of the reduced volume 0 is switched from node 0 to node 1;

(3) Node 2 in domain1 joins domain 0 as a standby node for thin volume 0;

(4) Metadata of the thin volume is mirrored from node 1 to node 2;

(5) domain 0 and domain1 resume operating state.

One node of domain (e.g., node 0 of domain 0) returns to normal from a failure state, and the process flow includes the following steps:

(1) The node 0 returns to normal from the fault state, and domain 0 and domain1 enter a silence state;

(2) Node 2 exits domain 0 and node 0 joins domain 0;

(3) The master node of the thin volume 0 switches from node 1 back to node 0;

(4) Discarding the metadata of the reduced volume 0 stored by the node 2;

(5) Mirroring the metadata of the reduced volume 0 of node 1 to node 0;

(6) domain 0 and domain1 resume operating state.

As shown in fig. 4, an embodiment of the present invention provides a data management system of a full-flash system, the full-flash system including a plurality of nodes, the data management system including a domain division module 11, a volume creation module 22, a request processing module 33, a node failure processing module 44;

a domain dividing module 11, configured to logically form a domain from two adjacent nodes, and logically form a closed loop from a first node and a last node;

a volume creation module 22 for selecting a node in each domain, creating a thin volume on the selected node; the node for creating the reduced volume in each domain is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;

a request processing module 33, configured to, when a node receives a request for accessing a thin volume, set a node that creates the thin volume to perform request processing;

a node failure processing module 44, configured to add a normal node associated with a domain to the domain where the node failure occurs when one node in the domain fails; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located.

In some embodiments, the request processing module 33 includes a judging unit, a triggering unit, and a request processing unit;

In some embodiments, a node for creating a thin volume is configured to determine a received request; when the request is a write request, judging whether the request is update write; if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address; otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; writing data into the allocated physical address; and the method is also used for finding the corresponding physical address in the reduced volume by searching the metadata according to the logical block address and the data length of the read request when the request is the read request, and reading the data from the physical address.

In some embodiments, the node failure handling module 44 includes a setup unit, a switch unit, a node domain handling unit, a mirror handling unit;

In some embodiments, node failure handling module 44 also includes a metadata processing unit;

As shown in fig. 3, in some embodiments, the full flash system includes four nodes, node 0, node 1, node 2, and node 3, respectively; node 0 and node 1 constitute domain 0, node 1 and node 2 constitute domain1, node 2 and node 3 constitute domain2, and node 3 and node 0 constitute domain 3;

The full flash system consists of four nodes, wherein any node can access a solid-state disk at the back end, two nodes are selected to logically form a domain, different thin volumes belong to different domain management, the thin volumes are created on which node, which node is used as a main node of the thin volume for processing a read-write request for accessing the thin volume, and the other node is used as a backup node for backing up metadata on the main node. When one node in the domain fails, normal nodes of other domains can join the domain with the node failure, so that the service is not interrupted, and when the failed node is recovered to be normal, the original domain is added again.

Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for managing data of a full-flash system, the full-flash system comprising a plurality of nodes, the method comprising the steps of:

when one node in the domain fails, adding the node with normal associated domain into the domain with node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain where the standby node in the same domain as the fault node is located;

when one node in the domain fails, the step of joining the normal node of the associated domain to the domain where the node failure occurred includes:

2. The full flash system data management method as claimed in claim 1, wherein the step of performing the request processing by the node creating the thin volume upon receiving the request for accessing the thin volume comprises:

if yes, carrying out request processing by creating nodes of the simple volume;

3. The full flash system data management method as claimed in claim 2, wherein the step of performing request processing by the node creating the thin volume comprises:

the node for creating the thin volume judges the received request;

4. The full flash system data management method as claimed in claim 2, wherein the step of performing request processing by the node creating the thin volume further comprises: when the request is a read request, the corresponding physical address in the reduced volume is found by looking up the metadata according to the logical block address and the data length of the read request, and the data is read out from the physical address.

5. The method for managing data of a full flash system according to claim 4, wherein the step of rejoining the original domain after the failed node is restored comprises:

the setting is put into the silent state to restore the running state.

6. The data management system of the all-flash system is characterized by comprising a plurality of nodes, wherein the data management system comprises a domain division module, a volume creation module, a request processing module and a node fault processing module;

the node fault processing module is used for adding normal nodes of the associated domain into the domain with the node fault when one node in the domain fails; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain where the standby node in the same domain as the fault node is located;

the node fault processing module comprises a setting unit, a switching unit, a node domain processing unit and a mirror image processing unit;

a setting unit, configured to set, when a node fails, a domain in which the node fails and an associated domain into a silence state; and the two domains entering the silence state are also used for recovering the running state; the switching unit is used for switching the main node of the reduced volume in the fault domain from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node; the node domain processing unit is used for adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume; and the mirror image processing unit is used for mirroring the metadata of the fault domain reduced volume from the original standby node to the new standby node.

7. The system according to claim 6, wherein the request processing module comprises a judging unit, a triggering unit, and a request processing unit;

the judging unit is used for judging whether the node receiving the request is the node for creating the thin volume or not when the node receives the request for accessing the thin volume; the triggering unit is used for triggering the node for creating the reduced volume to perform request processing; and the request processing unit is used for forwarding the request to the node for creating the thin volume when the judging unit judges that the node for receiving the request is not the node for creating the thin volume.

8. The full flash system data management system of claim 7, wherein the node failure processing module further comprises a metadata processing unit;

the setting unit is also used for setting the domain where the fault node is and the associated domain into a silent state when the fault node is recovered from the fault state; the node domain processing unit is also used for exiting the new standby node from the domain where the fault node is located, and adding the node with the fault recovery into the domain where the node is located before the fault; the switching unit is also used for switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located; the metadata processing unit is used for discarding the metadata of the fault domain reduced volume stored by the new standby node; and the mirror image processing unit mirrors the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault.