[go: nahoru, domu]

CN113868017B - Data management method and system for full flash system - Google Patents

Data management method and system for full flash system Download PDF

Info

Publication number
CN113868017B
CN113868017B CN202110960589.XA CN202110960589A CN113868017B CN 113868017 B CN113868017 B CN 113868017B CN 202110960589 A CN202110960589 A CN 202110960589A CN 113868017 B CN113868017 B CN 113868017B
Authority
CN
China
Prior art keywords
node
domain
fault
request
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110960589.XA
Other languages
Chinese (zh)
Other versions
CN113868017A (en
Inventor
刘文国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110960589.XA priority Critical patent/CN113868017B/en
Publication of CN113868017A publication Critical patent/CN113868017A/en
Application granted granted Critical
Publication of CN113868017B publication Critical patent/CN113868017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data management method and a system of a full-flash system, wherein the full-flash system comprises a plurality of nodes, and the method comprises the following steps: logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node; selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain; when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume; when one node in the domain fails, adding a normal node of the associated domain into the domain with the node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located. The reliability and usability of the storage system are improved.

Description

Data management method and system for full flash system
Technical Field
The invention relates to the technical field of data center data management, in particular to a data management method and system of a full flash system.
Background
The simple volume is a data management method designed for optimizing the utilization rate of the storage space, and is different from the storage space which is allocated to the user for a fixed size at one time, and only the storage space which is actually used by the user is allocated to the simple volume. In a dual-control environment, two storage nodes form a storage system, wherein one node is used as a main node of a thin volume and used for processing read-write requests, metadata transactions and the like, and the other node is used as a standby node of the thin volume and used for backing up metadata of the thin volume on the main node. When one node of the double control fails, all the thin volumes are processed by a single node, so that the performance is reduced, and the metadata security cannot be ensured.
Disclosure of Invention
Aiming at the problems that when one node of double control fails, all the reduced volumes are processed by a single node, the performance is reduced, and the metadata security cannot be ensured, the invention provides a data management method and system of a full flash system.
The technical scheme of the invention is as follows:
in one aspect, the present invention provides a data management method for a full-flash system, where the full-flash system includes a plurality of nodes, and the method includes the following steps:
logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node;
selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;
when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume;
when one node in the domain fails, adding a normal node of the associated domain into the domain with the node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located. And ensuring that the service is not interrupted.
Further, when the request for accessing the thin volume is received, the step of performing request processing through the node for creating the thin volume includes:
when a request for accessing the thin volume is received, judging whether a node for receiving the request is a node for creating the thin volume;
if yes, carrying out request processing by creating nodes of the simple volume;
otherwise, forwarding the request to the node creating the thin volume; the method comprises the following steps: request processing is performed by the node that created the thin volume.
Further, the step of performing request processing by the node creating the thin volume includes:
the node for creating the thin volume judges the received request;
when the request is a write request, judging whether the request is update write;
if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address;
otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; data is written to the assigned physical address.
The node for creating the thin volume is a master node of the domain where the thin volume is located, and is used for processing a read-write request for accessing the thin volume, while the other node is used as a backup node for backing up metadata on the master node.
Further, the step of performing request processing by the node creating the thin volume further includes:
when the request is a read request, the corresponding physical address in the reduced volume is found by looking up the metadata according to the logical block address and the data length of the read request, and the data is read out from the physical address.
Further, when one node in the domain fails, the step of adding the normal node of the associated domain to the domain with the node failure includes:
when one node fails, setting the domain with the node failure and the associated domain into a silence state;
the main node of the fault domain reduced volume is switched from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;
adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;
the metadata of the fault domain reduced volume is mirrored from the original standby node to the new standby node;
the two domains that entered the silence state are restored to the operational state.
Further, the step of rejoining the original domain after the failed node returns to normal includes:
when the fault node is recovered to be normal from the fault state, setting the domain where the fault node is and the associated domain into a silent state;
the new standby node is withdrawn from the domain where the fault node is located, and the node which recovers the fault is added into the domain where the node is located before the fault;
switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;
discarding the metadata of the fault domain reduced volume stored by the new standby node;
mirroring the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault;
the setting is put into the silent state to restore the running state. The performance, reliability and usability of the storage system are improved.
In another aspect, a data management system of a full-flash system includes a plurality of nodes, the data management system including a domain division module, a volume creation module, a request processing module, and a node failure processing module;
the domain dividing module is used for logically forming a domain by two adjacent nodes, and logically forming a closed loop by a first node and a last node;
the volume creation module is used for selecting one node in each domain and creating a simplified volume on the selected node; the node for creating the reduced volume in each domain is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;
the request processing module is used for setting the node for creating the thin volume to process the request when the node receives the request for accessing the thin volume;
the node fault processing module is used for adding normal nodes of the associated domain into the domain with the node fault when one node in the domain fails; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located.
Further, the request processing module comprises a judging unit, a triggering unit and a request processing unit;
the judging unit is used for judging whether the node receiving the request is the node for creating the thin volume or not when the node receives the request for accessing the thin volume;
the triggering unit is used for triggering the node for creating the reduced volume to perform request processing;
and the request processing unit is used for forwarding the request to the node for creating the thin volume when the judging unit judges that the node for receiving the request is not the node for creating the thin volume.
Further, a node for creating the thin volume is used for judging the received request; when the request is a write request, judging whether the request is update write; if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address; otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; writing data into the allocated physical address; and the method is also used for finding the corresponding physical address in the reduced volume by searching the metadata according to the logical block address and the data length of the read request when the request is the read request, and reading the data from the physical address.
Further, the node fault processing module comprises a setting unit, a switching unit, a node domain processing unit and a mirror image processing unit;
a setting unit, configured to set, when a node fails, a domain in which the node fails and an associated domain into a silence state;
the switching unit is used for switching the main node of the reduced volume in the fault domain from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;
the node domain processing unit is used for adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;
the mirror image processing unit is used for mirroring the metadata of the fault domain reduced volume from the original standby node to the new standby node;
and the setting unit is also used for recovering the two domains entering the silence state to the running state.
Further, the node fault processing module further comprises a metadata processing unit;
the setting unit is also used for setting the domain where the fault node is and the associated domain into a silent state when the fault node is recovered from the fault state;
the node domain processing unit is also used for exiting the new standby node from the domain where the fault node is located, and adding the node with the fault recovery into the domain where the node is located before the fault;
the switching unit is also used for switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;
the metadata processing unit is used for discarding the metadata of the fault domain reduced volume stored by the new standby node;
and the mirror image processing unit mirrors the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault.
Further, the full flash system comprises four nodes, namely node 0, node 1, node 2 and node 3; node 0 and node 1 constitute domain 0, node 1 and node 2 constitute domain1, node 2 and node 3 constitute domain2, and node 3 and node 0 constitute domain 3;
creating a thin volume 0 on node 0, the thin volume being managed by domain 0 and the master node being node 0;
creating a thin volume 1 on node 1, the thin volume being managed by domain1 and the primary node being node 1;
creating a thin volume 2 on node 2, the thin volume being managed by domain2 and the master node being node 2;
a thin volume 3 is created on node 3, managed by domain 3 and the master node is node 3.
From the above technical scheme, the invention has the following advantages: the full flash system comprises a plurality of nodes, wherein any node can access a solid-state disk at the rear end, two adjacent nodes are selected to logically form a domain, and the first node and the last node also logically form a domain to form a closed loop; different thin volumes belong to different domain management, when one node in one domain fails, normal nodes in other domains can join the domain with the node failure, so that the service is not interrupted, and when the failed node is restored to normal, the original domain is added again. The reliability and usability of the storage system are improved.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
It can be seen that the present invention has outstanding substantial features and significant advances over the prior art, as well as its practical advantages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of domain partitioning in a method of one embodiment of the invention.
FIG. 2 is a schematic diagram of a request processing flow in a method according to one embodiment of the invention.
Fig. 3 is a schematic diagram of four-control full flash system management according to an embodiment of the present invention.
FIG. 4 is a block diagram of a management system connection in one embodiment of the invention.
In the figure, the 11-domain partitioning module, the 22-volume creation module, the 33-request processing module, and the 44-node failure processing module.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a data management method of a full-flash system, where the full-flash system includes a plurality of nodes, node 1-node n, and the method includes the following steps:
step 1: logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node;
step 2: selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;
step 3: when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume;
step 4: when one node in the domain fails, adding a normal node of the associated domain into the domain with the node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located. And ensuring that the service is not interrupted.
In some embodiments, in step 3, when the request for accessing the thin volume is received, the step of performing the request processing by the node that creates the thin volume includes:
step 31: when a request for accessing the thin volume is received, judging whether a node for receiving the request is a node for creating the thin volume; if yes, go to step 33, otherwise go to step 32;
step 32: forwarding the request to the node that created the thin volume; step 33 is performed;
step 33: request processing is performed by the node that created the thin volume.
As shown in fig. 2, in step 33, the step of performing request processing by the node that creates the thin volume includes:
step 331: the node for creating the thin volume judges the received request; step 332 is performed when the request is a write request, and step 335 is performed when the request is a read request;
step 332: judging whether the request is update write; if yes, go to step 333, otherwise, go to step 334;
step 333: according to the logical block address and the data length of the write request, searching the metadata to find the corresponding physical address in the reduced volume, and writing the data into the physical address;
step 334: distributing a physical address according to the logical block address and the data length of the write request, generating new metadata, and transmitting the metadata to a standby node of a domain where the metadata is located; data is written to the assigned physical address.
Step 335: and according to the logical block address and the data length of the read request, finding the corresponding physical address in the reduced volume by searching the metadata, and reading the data from the physical address.
The node for creating the thin volume is a master node of the domain where the thin volume is located, and is used for processing a read-write request for accessing the thin volume, while the other node is used as a backup node for backing up metadata on the master node.
In some embodiments, in step 4, when one node in the domain fails, the step of adding the normal node of the associated domain to the domain with the node failure includes:
step 411: when one node fails, setting the domain with the node failure and the associated domain into a silence state;
step 412: the main node of the fault domain reduced volume is switched from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;
step 413: adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;
step 414: the metadata of the fault domain reduced volume is mirrored from the original standby node to the new standby node;
step 415: the two domains that entered the silence state are restored to the operational state.
In step 4, the step of rejoining the original domain after the failed node returns to normal includes:
step 421: when the fault node is recovered to be normal from the fault state, setting the domain where the fault node is and the associated domain into a silent state;
step 422: the new standby node is withdrawn from the domain where the fault node is located, and the node which recovers the fault is added into the domain where the node is located before the fault;
step 423: switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;
step 424: discarding the metadata of the fault domain reduced volume stored by the new standby node;
step 425: mirroring the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault;
step 426: the setting is put into the silent state to restore the running state.
The performance, reliability and usability of the storage system are improved.
As shown in fig. 3, in a specific embodiment, the all-flash system includes four nodes, namely, node 0, node 1, node 2, and node 3; node 0 and node 1 make up domain 0, node 1 and node 2 make up domain1, node 2 and node 3 make up domain2, and node 3 and node 0 make up domain 3; it should be noted that domain is a domain.
Creating a thin volume 0 on node 0, the thin volume being managed by domain 0 and the master node being node 0;
creating a thin volume 1 on a node 1, the thin volume being managed by domain1 and the master node being node 1;
creating a thin volume 2 on the node 2, the thin volume being managed by domain2 and the master node being node 2;
a thin volume 3 is created on node 3, managed by domain 3 and the master node is node 3.
The writing process mainly comprises the following steps:
(1) The write request accesses the thin volume 0, is issued from the node 0, and turns to the step (4);
(2) The write request accesses the thin volume 0, is issued from node 1 or node 2 or node 3, and turns to step (3);
(3) The write request is forwarded from node 1 or node 2 or node 3 to node 0;
(4) Node 0 judges whether the request is update write, if yes, turning to step (5), if no, turning to step (6);
(5) According to LBA of the write request and the data length, searching metadata to find a corresponding physical address in the reduced volume, and writing the data into the physical address;
(6) And allocating a physical address according to the LBA and the data length of the write request, generating new metadata, sending the metadata to the standby node, and writing the data into the allocated physical address.
The reading process mainly comprises the following steps:
(1) The read request accesses the thin volume 0, is issued from the node 0, and turns to the step (4);
(2) The read request accesses the thin volume 0, is issued from the node 1 or the node 2 or the node 3, and turns to the step (3);
(3) The read request is forwarded from node 1 or node 2 or node 3 to node 0;
(4) And the node 0 finds the corresponding physical address in the reduced volume by searching the metadata according to the LBA and the data length of the read request, and reads the data from the physical address.
One node of domain (e.g., node 0 of domain 0) fails, and the process flow includes the following steps:
(1) Node 0 fails, domain 0 and domain1 enter silence states;
(2) The master node of the reduced volume 0 is switched from node 0 to node 1;
(3) Node 2 in domain1 joins domain 0 as a standby node for thin volume 0;
(4) Metadata of the thin volume is mirrored from node 1 to node 2;
(5) domain 0 and domain1 resume operating state.
One node of domain (e.g., node 0 of domain 0) returns to normal from a failure state, and the process flow includes the following steps:
(1) The node 0 returns to normal from the fault state, and domain 0 and domain1 enter a silence state;
(2) Node 2 exits domain 0 and node 0 joins domain 0;
(3) The master node of the thin volume 0 switches from node 1 back to node 0;
(4) Discarding the metadata of the reduced volume 0 stored by the node 2;
(5) Mirroring the metadata of the reduced volume 0 of node 1 to node 0;
(6) domain 0 and domain1 resume operating state.
As shown in fig. 4, an embodiment of the present invention provides a data management system of a full-flash system, the full-flash system including a plurality of nodes, the data management system including a domain division module 11, a volume creation module 22, a request processing module 33, a node failure processing module 44;
a domain dividing module 11, configured to logically form a domain from two adjacent nodes, and logically form a closed loop from a first node and a last node;
a volume creation module 22 for selecting a node in each domain, creating a thin volume on the selected node; the node for creating the reduced volume in each domain is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;
a request processing module 33, configured to, when a node receives a request for accessing a thin volume, set a node that creates the thin volume to perform request processing;
a node failure processing module 44, configured to add a normal node associated with a domain to the domain where the node failure occurs when one node in the domain fails; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain in which the standby node of the same domain as the fault node is located.
In some embodiments, the request processing module 33 includes a judging unit, a triggering unit, and a request processing unit;
the judging unit is used for judging whether the node receiving the request is the node for creating the thin volume or not when the node receives the request for accessing the thin volume;
the triggering unit is used for triggering the node for creating the reduced volume to perform request processing;
and the request processing unit is used for forwarding the request to the node for creating the thin volume when the judging unit judges that the node for receiving the request is not the node for creating the thin volume.
In some embodiments, a node for creating a thin volume is configured to determine a received request; when the request is a write request, judging whether the request is update write; if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address; otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; writing data into the allocated physical address; and the method is also used for finding the corresponding physical address in the reduced volume by searching the metadata according to the logical block address and the data length of the read request when the request is the read request, and reading the data from the physical address.
In some embodiments, the node failure handling module 44 includes a setup unit, a switch unit, a node domain handling unit, a mirror handling unit;
a setting unit, configured to set, when a node fails, a domain in which the node fails and an associated domain into a silence state;
the switching unit is used for switching the main node of the reduced volume in the fault domain from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;
the node domain processing unit is used for adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;
the mirror image processing unit is used for mirroring the metadata of the fault domain reduced volume from the original standby node to the new standby node;
and the setting unit is also used for recovering the two domains entering the silence state to the running state.
In some embodiments, node failure handling module 44 also includes a metadata processing unit;
the setting unit is also used for setting the domain where the fault node is and the associated domain into a silent state when the fault node is recovered from the fault state;
the node domain processing unit is also used for exiting the new standby node from the domain where the fault node is located, and adding the node with the fault recovery into the domain where the node is located before the fault;
the switching unit is also used for switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;
the metadata processing unit is used for discarding the metadata of the fault domain reduced volume stored by the new standby node;
and the mirror image processing unit mirrors the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault.
As shown in fig. 3, in some embodiments, the full flash system includes four nodes, node 0, node 1, node 2, and node 3, respectively; node 0 and node 1 constitute domain 0, node 1 and node 2 constitute domain1, node 2 and node 3 constitute domain2, and node 3 and node 0 constitute domain 3;
creating a thin volume 0 on node 0, the thin volume being managed by domain 0 and the master node being node 0;
creating a thin volume 1 on node 1, the thin volume being managed by domain1 and the primary node being node 1;
creating a thin volume 2 on node 2, the thin volume being managed by domain2 and the master node being node 2;
a thin volume 3 is created on node 3, managed by domain 3 and the master node is node 3.
The full flash system consists of four nodes, wherein any node can access a solid-state disk at the back end, two nodes are selected to logically form a domain, different thin volumes belong to different domain management, the thin volumes are created on which node, which node is used as a main node of the thin volume for processing a read-write request for accessing the thin volume, and the other node is used as a backup node for backing up metadata on the main node. When one node in the domain fails, normal nodes of other domains can join the domain with the node failure, so that the service is not interrupted, and when the failed node is recovered to be normal, the original domain is added again.
Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for managing data of a full-flash system, the full-flash system comprising a plurality of nodes, the method comprising the steps of:
logically forming a domain by two adjacent nodes, and forming a closed loop by logically forming a domain by the first node and the last node;
selecting one node in each domain to create a reduced volume, wherein the node in each domain for creating the reduced volume is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;
when a request for accessing the thin volume is received, performing request processing by a node for creating the thin volume;
when one node in the domain fails, adding the node with normal associated domain into the domain with node failure; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain where the standby node in the same domain as the fault node is located;
when one node in the domain fails, the step of joining the normal node of the associated domain to the domain where the node failure occurred includes:
when one node fails, setting the domain with the node failure and the associated domain into a silence state;
the main node of the fault domain reduced volume is switched from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node;
adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume;
the metadata of the fault domain reduced volume is mirrored from the original standby node to the new standby node;
the two domains that entered the silence state are restored to the operational state.
2. The full flash system data management method as claimed in claim 1, wherein the step of performing the request processing by the node creating the thin volume upon receiving the request for accessing the thin volume comprises:
when a request for accessing the thin volume is received, judging whether a node for receiving the request is a node for creating the thin volume;
if yes, carrying out request processing by creating nodes of the simple volume;
otherwise, forwarding the request to the node creating the thin volume; the method comprises the following steps: request processing is performed by the node that created the thin volume.
3. The full flash system data management method as claimed in claim 2, wherein the step of performing request processing by the node creating the thin volume comprises:
the node for creating the thin volume judges the received request;
when the request is a write request, judging whether the request is update write;
if yes, according to the logical block address and the data length of the write request, finding out the corresponding physical address in the reduced volume by searching the metadata, and writing the data into the physical address;
otherwise, according to the logical block address and the data length of the write request, distributing a physical address, generating new metadata, and transmitting the metadata to the standby node of the domain; data is written to the assigned physical address.
4. The full flash system data management method as claimed in claim 2, wherein the step of performing request processing by the node creating the thin volume further comprises: when the request is a read request, the corresponding physical address in the reduced volume is found by looking up the metadata according to the logical block address and the data length of the read request, and the data is read out from the physical address.
5. The method for managing data of a full flash system according to claim 4, wherein the step of rejoining the original domain after the failed node is restored comprises:
when the fault node is recovered to be normal from the fault state, setting the domain where the fault node is and the associated domain into a silent state;
the new standby node is withdrawn from the domain where the fault node is located, and the node which recovers the fault is added into the domain where the node is located before the fault;
switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located;
discarding the metadata of the fault domain reduced volume stored by the new standby node;
mirroring the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault;
the setting is put into the silent state to restore the running state.
6. The data management system of the all-flash system is characterized by comprising a plurality of nodes, wherein the data management system comprises a domain division module, a volume creation module, a request processing module and a node fault processing module;
the domain dividing module is used for logically forming a domain by two adjacent nodes, and logically forming a closed loop by a first node and a last node;
the volume creation module is used for selecting one node in each domain and creating a simplified volume on the selected node; the node for creating the reduced volume in each domain is a main node, and the other node is a standby node; each node belongs to two domains respectively, wherein the main node of one domain is also a standby node of the other domain;
the request processing module is used for setting the node for creating the thin volume to process the request when the node receives the request for accessing the thin volume;
the node fault processing module is used for adding normal nodes of the associated domain into the domain with the node fault when one node in the domain fails; when the failed node is recovered to be normal, the original domain is added again; the associated domain is another domain where the standby node in the same domain as the fault node is located;
the node fault processing module comprises a setting unit, a switching unit, a node domain processing unit and a mirror image processing unit;
a setting unit, configured to set, when a node fails, a domain in which the node fails and an associated domain into a silence state; and the two domains entering the silence state are also used for recovering the running state; the switching unit is used for switching the main node of the reduced volume in the fault domain from the node with the fault to the standby node in the same domain as the fault node; namely, the original standby node of the domain where the fault node is located is used as the main node of the fault domain simple volume; the fault domain reduced volume is a reduced volume created by a fault node; the node domain processing unit is used for adding the standby node of the associated domain into the domain where the fault node is located, and taking the standby node as a new standby node of the fault domain reduced volume; and the mirror image processing unit is used for mirroring the metadata of the fault domain reduced volume from the original standby node to the new standby node.
7. The system according to claim 6, wherein the request processing module comprises a judging unit, a triggering unit, and a request processing unit;
the judging unit is used for judging whether the node receiving the request is the node for creating the thin volume or not when the node receives the request for accessing the thin volume; the triggering unit is used for triggering the node for creating the reduced volume to perform request processing; and the request processing unit is used for forwarding the request to the node for creating the thin volume when the judging unit judges that the node for receiving the request is not the node for creating the thin volume.
8. The full flash system data management system of claim 7, wherein the node failure processing module further comprises a metadata processing unit;
the setting unit is also used for setting the domain where the fault node is and the associated domain into a silent state when the fault node is recovered from the fault state; the node domain processing unit is also used for exiting the new standby node from the domain where the fault node is located, and adding the node with the fault recovery into the domain where the node is located before the fault; the switching unit is also used for switching the main node of the fault domain reduced volume back to the node recovering the fault from the original standby node of the domain where the fault node is located; the metadata processing unit is used for discarding the metadata of the fault domain reduced volume stored by the new standby node; and the mirror image processing unit mirrors the metadata of the reduced volume of the fault domain from the original standby node of the domain where the fault node is located to the node recovering the fault.
CN202110960589.XA 2021-08-20 2021-08-20 Data management method and system for full flash system Active CN113868017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110960589.XA CN113868017B (en) 2021-08-20 2021-08-20 Data management method and system for full flash system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110960589.XA CN113868017B (en) 2021-08-20 2021-08-20 Data management method and system for full flash system

Publications (2)

Publication Number Publication Date
CN113868017A CN113868017A (en) 2021-12-31
CN113868017B true CN113868017B (en) 2024-01-12

Family

ID=78987933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110960589.XA Active CN113868017B (en) 2021-08-20 2021-08-20 Data management method and system for full flash system

Country Status (1)

Country Link
CN (1) CN113868017B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9483367B1 (en) * 2014-06-27 2016-11-01 Veritas Technologies Llc Data recovery in distributed storage environments
CN109857588A (en) * 2018-12-11 2019-06-07 浪潮(北京)电子信息产业有限公司 Simplification volume metadata processing method, apparatus and system based on more controlled storage systems

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11301144B2 (en) * 2016-12-28 2022-04-12 Amazon Technologies, Inc. Data storage system
US10452502B2 (en) * 2018-01-23 2019-10-22 International Business Machines Corporation Handling node failure in multi-node data storage systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9483367B1 (en) * 2014-06-27 2016-11-01 Veritas Technologies Llc Data recovery in distributed storage environments
CN109857588A (en) * 2018-12-11 2019-06-07 浪潮(北京)电子信息产业有限公司 Simplification volume metadata processing method, apparatus and system based on more controlled storage systems

Also Published As

Publication number Publication date
CN113868017A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
US12117911B2 (en) Remote data replication method and system
EP3694148B1 (en) Configuration modification method for storage cluster, storage cluster and computer system
CN103577121B (en) A kind of highly reliable linear file access method based on Nand Flash
US7143249B2 (en) Resynchronization of mirrored storage devices
US8762768B2 (en) Storage system for restoring data stored in failed storage device
US7412578B2 (en) Snapshot creating method and apparatus
US6675176B1 (en) File management system
EP3179359B1 (en) Data sending method, data receiving method, and storage device
US6823474B2 (en) Method and system for providing cluster replicated checkpoint services
US20020194529A1 (en) Resynchronization of mirrored storage devices
US11307776B2 (en) Method for accessing distributed storage system, related apparatus, and related system
US11422703B2 (en) Data updating technology
WO2021139571A1 (en) Data storage method, apparatus, and system and data reading method, apparatus, and system in storage system
WO2024148856A1 (en) Data writing method and system, and storage hard disk, electronic device and storage medium
CN117591038B (en) Data access method, device, distributed storage system, equipment and medium
CN114518973A (en) Distributed cluster node downtime restarting recovery method
WO2022033269A1 (en) Data processing method, device and system
CN114089923A (en) Double-live storage system and data processing method thereof
CN113868017B (en) Data management method and system for full flash system
WO2020135889A1 (en) Method for dynamic loading of disk and cloud storage system
CN112256657A (en) Log mirroring method and system
JP7520773B2 (en) STORAGE SYSTEM AND DATA PROCESSING METHOD
JPH07319637A (en) Controller and control method for disk device
CN110515778B (en) Method, device and system for data protection based on shared logical volume
EP4394610A1 (en) Computer device, method for processing data, and computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant