[go: nahoru, domu]

CN108920600B - Distributed file system metadata prefetching method based on data relevance - Google Patents

Distributed file system metadata prefetching method based on data relevance Download PDF

Info

Publication number
CN108920600B
CN108920600B CN201810681784.7A CN201810681784A CN108920600B CN 108920600 B CN108920600 B CN 108920600B CN 201810681784 A CN201810681784 A CN 201810681784A CN 108920600 B CN108920600 B CN 108920600B
Authority
CN
China
Prior art keywords
file
metadata
client
prefetching
index node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810681784.7A
Other languages
Chinese (zh)
Other versions
CN108920600A (en
Inventor
许胤龙
陈友旭
李�诚
李永坤
吕敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201810681784.7A priority Critical patent/CN108920600B/en
Publication of CN108920600A publication Critical patent/CN108920600A/en
Application granted granted Critical
Publication of CN108920600B publication Critical patent/CN108920600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a distributed file system metadata prefetching method based on data relevance, which is characterized by comprising the steps of designing an extraction mode and a storage structure of the data relevance, prefetching metadata of a relevant file, dynamically feeding back the data relevance and dynamically updating the data relevance; compared with the traditional metadata access mode of the distributed file system, the invention provides a light-weight syntax analysis mode of data relevance, expands the metadata structure of the file system to support the data relevance, caches the metadata of the relevant files in the local client in advance through a prefetching mode, thereby reducing the cross-network interaction times of the client and the metadata server, simultaneously dynamically adjusts the compactness of the relevant files according to a file access mode by combining a dynamic feedback mechanism of the client, further improves the prefetching accuracy by utilizing threshold control, reduces the occupation of the cache space of the client, reduces the response delay of metadata access of the relevant files, and improves the metadata service performance.

Description

Distributed file system metadata prefetching method based on data relevance
Technical Field
The invention belongs to the technical field of computer distributed storage systems, and particularly relates to a prefetching method for accelerating metadata access by utilizing data relevance.
Background
With the rapid development of the internet, the data volume is increasing, and therefore mass data storage becomes crucial. The distributed file system provides high-speed data access service for users and stable system extensibility by utilizing the physical resources of distributed deployment of computer nodes interconnected by a computer network and providing file system management. The distributed file system comprises three components, namely a metadata server, a data server and a client. The metadata server is responsible for managing metadata of the whole file system, including directory entries and index nodes (inodes), the data server is responsible for storing data of the file system, and the client is responsible for initiating metadata and data requests. In the decoupled distributed file system architecture, a client wants to view data of a file, first needs to interact with a metadata server to perform corresponding metadata operation, and finally performs data transmission with the data server to complete data access. Pages 41 to 54 of USENIX annual technical conference published by the american USENIX association 2000 indicate that at least 50% of user requests are for access to file metadata, and thus metadata access performance is critical in a distributed file system. Reference relationships exist between the data of documents mentioned on pages 15 to 22 of the USENIX document and conference on storage published by the american USENIX association 2016, and accessing a document causes access to the document associated with its data. However, the existing distributed file system architecture does not consider the relevance among file data during design, so that the data relevance among files cannot be found, frequent interaction between a client and a metadata server is caused, and the metadata access flow of the associated files is difficult to optimize and the access delay of the metadata of the associated files is difficult to reduce.
Disclosure of Invention
The invention aims to provide a distributed file system metadata prefetching method based on data relevance, which overcomes the defects in the prior art, reduces the cross-network interaction times of a client and a metadata server, shortens the response time of requests and improves the throughput of a system under the condition of ensuring low overhead.
The invention discloses a distributed file system metadata prefetching method based on data relevance, which is characterized by comprising the following steps:
the first step is as follows: extraction mode and storage structure for designing data relevance
Inquiring a corresponding reference or link syntax expression according to a syntax format corresponding to the file type, and designing a target regular expression based on the inquired reference or link syntax expression; when an application program of a client modifies data of a file, syntactic analysis is carried out on data content of the file by using a designed target regular expression to extract a file path name of a reference or link file (associated file), and meanwhile, the offset of the associated file path name appearing in a data part and the length of the path name are recorded;
storing data relevance by adopting a data structure of key-value pairs (key-value pairs), wherein keys of the key-value pairs are numbers of index nodes of relevant files and are used for uniquely marking the files, and the keys are obtained by a metadata server according to path names of the relevant files by searching the contents of the index nodes of the corresponding files and occupy 8 bytes; the value of the key-value pair contains three parts, namely an association score (score) ranging from [0,1], the length of the associated file path name and the offset of the associated file path name in the data part, which respectively occupy 4 bytes, 4 bytes and 8 bytes; expanding a metadata structure of the index node of the distributed file system, and storing the key value pair for storing the data relevance in the expanded attribute of the index node of the file so that the distributed file system supports the data relevance; after the client analyzes the modified data content, the client sends data relevance synchronization information to the metadata server; after the metadata server receives the synchronization information, persistently updating the data relevance to the storage device;
the second step is that: prefetching metadata for associated files
When a metadata server processes a metadata operation request of a target file initiated by a client, firstly, a directory entry and an index node of the target file are obtained in a metadata cache of the metadata server; after the index node of the target file is obtained, retrieving each extended attribute of the index node, and obtaining the data relevance of the target file;
setting a threshold value T in the range of [0,1] to represent the lowest value of the closeness degree of the target file and the pre-fetched associated file, and pre-fetching when the value of the association score of the target file and the associated file exceeds the threshold value T; traversing each data relevance of the target file, when the value of the relevance score of the value part in the key value pair is larger than a threshold value T, extracting the number of the index node of the relevant file, and inquiring the directory entry and the index node content of the relevant file in the metadata cache according to the retrieved number of the index node of the relevant file; when the value of the correlation score is smaller than or equal to the threshold value T, skipping the correlation of the data, and performing pre-fetching operation of the correlation of the next data;
the metadata server constructs a reply message to return the metadata of the target file and the associated file to the client, the directory entry and the index node content of the target file are added into the reply message, meanwhile, the metadata server adds the directory entry and the index node content of the associated file inquired in the second step into the reply message, and a pre-fetching mark 1 is set for the reply message to indicate that the constructed reply message contains the metadata of the associated file; if the reply message does not contain the prefetch content, setting a prefetch mark as 0 for the reply message; then the metadata server sends the reply message to the client;
the third step: dynamic feedback of data relevance
When a client receives a reply message sent by a metadata server, firstly, judging whether the reply message is provided with a prefetch mark or not; if the prefetch mark is not set, analyzing the reply request content to obtain a directory entry and an index node of the target file, caching the directory entry and the index node of the analyzed target file in a memory of the client, and simultaneously linking the index node of the target file to the directory entry to establish a logical structure of a target file path;
if the prefetch mark is set, after the contents of the reply request are analyzed to obtain the directory entry and the index node of the target file, the subsequent contents of the reply request are further analyzed to obtain the directory entry and the index node of the associated file, and meanwhile, the index node of the associated file is linked to the directory entry of the associated file, a logic structure of an associated file path is established and cached in a memory of the client; recording the prefetched associated file information, including index node numbers of the associated files, index node numbers of target files triggering prefetching, prefetching time and access marks, and adding the information into a prefetching feedback table of the client; if the prefetched associated file is requested to be accessed by a subsequent client, setting an access flag of the corresponding associated file in the prefetching feedback table to be 1; if the prefetched associated file is not requested to be accessed by a subsequent client, setting an access flag of the corresponding associated file in the prefetching feedback table to be 0;
setting a Time interval (Time) for traversing the client pre-fetching feedback table, wherein the Time interval (Time) ranges from [0, N ]; the client side traverses all records in the pre-fetching feedback table one by one every other Time second and feeds back the access information of the pre-fetched associated files to the metadata server; if the value of the current Time minus the value of the prefetching Time of the traversing associated file is greater than the Time interval Time, constructing a client prefetching feedback request, and adding the index node number of the associated file, the index node number of the target file triggering prefetching and an access mark into the feedback request; if the current Time minus the prefetching Time of the associated file is less than or equal to the Time interval Time, skipping the record and traversing the next prefetching record in the prefetching feedback table; after all records in the pre-fetching feedback table are traversed once, the client sends the constructed pre-fetching feedback request to the metadata server;
the fourth step: dynamic update of data associations
When the metadata server receives a pre-fetching feedback request sent by a client, the pre-fetching records in the request are processed one by one; firstly, inquiring index node information of an associated file and index node information of a target file triggering prefetching according to the index node number of the associated file of each prefetching record and the index node number of the target file triggering prefetching, and retrieving data relevance of the corresponding associated file in the index node information of the target file triggering prefetching to obtain a key value pair of the corresponding associated file;
setting an adjustment score(s) in the range of [0,1] to represent the adjustment granularity of the closeness degree of the target file and the associated file each time, and if the access mark in the pre-fetching record is 1, increasing the association score of the key value pair by s; if the access mark in the entry prefetch record is 0, reducing the association score of the key-value pair by s; and traversing the prefetching records in the feedback request one by one, updating the data relevance in the index node of the target file triggering prefetching according to the access condition of the prefetched associated file, and finally persisting the data relevance in the storage device of the metadata server.
The distributed file system metadata prefetching method based on the data relevance adopts the operation steps of designing an extraction mode and a storage structure of the data relevance, prefetching metadata of the relevant file, dynamically feeding back the data relevance and dynamically updating the data relevance; compared with the traditional metadata access mode of the distributed file system, the method of the invention provides a light-weight syntax analysis mode of data relevance, expands the metadata structure of the file system to support the data relevance, caches the metadata of the relevant file in the local client in advance through a prefetching mode, thereby reducing the cross-network interaction times of the client and the metadata server, simultaneously dynamically adjusts the compactness of the relevant file according to the file access mode by combining a dynamic feedback mechanism of the client, further improves the prefetching accuracy by utilizing threshold control, reduces the occupation of the cache space of the client, reduces the response delay of metadata access of the relevant file, and improves the metadata service performance.
Compared with the prior art, the distributed file system metadata prefetching method based on the data relevance has the following advantages that:
1. because the invention considers the relevance among the file data, designs the extraction mode of the file data relevance mode, and expands the storage structure of the metadata of the distributed file system to support the data relevance.
2. Because the invention designs the metadata pre-fetching method based on the data relevance and utilizes the threshold control and the dynamic feedback mechanism to further improve the pre-fetching precision, compared with the prior art, the invention reduces the occupation of the cache space of the client, reduces the number of requests of the client and the metadata server interacting across the network and shortens the response delay of the metadata access of the associated file.
Drawings
FIG. 1 is a diagram of a distributed file system architecture.
Html document data correlation extraction information schematic diagram is shown in fig. 2.
Fig. 3 is a diagram illustrating a reference or link format corresponding to various syntax types.
Fig. 4 is a schematic diagram of a data association storage structure.
FIG. 5 is a flow chart illustrating a metadata pre-fetching operation performed by the metadata server.
Fig. 6 is a diagram showing a prefetch feedback table structure.
Fig. 7 is a flow chart illustrating the operation of the dynamic feedback mechanism.
FIG. 8 is a flowchart illustrating the general operation of the method for prefetching metadata of an associated file according to the present invention.
Detailed Description
The following describes the data association-based distributed file system metadata prefetching method according to the present invention in further detail by using specific embodiments in conjunction with the accompanying drawings.
Example 1:
the distributed file system metadata prefetching method based on data relevance in the embodiment specifically comprises the following steps:
the first step is as follows: extraction mode and storage structure for designing data relevance
Fig. 1 shows a schematic diagram of a distributed file system architecture, which includes three components, namely, a distributed file system client, a metadata server, and a data server, which interact with each other via a network. Application in which distributed file system clients
Figure GDA0003014286150000041
By virtual file systems
Figure GDA0003014286150000042
With distributed file system clients
Figure GDA0003014286150000043
The interaction is carried out by the user,
Figure GDA0003014286150000044
is responsible for initiating metadata and data requests while client caches
Figure GDA0003014286150000045
Caching metadata and data to speed up the response to the request; metadata Server includes metadata requestProcessing program
Figure GDA0003014286150000046
Metadata caching
Figure GDA0003014286150000047
And metadata storage
Figure GDA0003014286150000048
Three parts; the data server is responsible for providing data access. If the application program needs to read the file content, firstly, the distributed file system client and the metadata server interact to obtain the metadata of the file, and the metadata is cached in the client cache; and acquiring a data address according to the information in the metadata, and then, the client interacts with the data server to finish reading the file data. In order to ensure consistency of metadata and data, the metadata server and the data server interactively update file metadata.
Fig. 2 shows part of data of an index. Html document, the client parses the data content to extract data relevance. FIG. 3 shows a reference or link format for various types of syntax, such as html syntax, c + + syntax, and the like. Designing a target regular expression src [ ^ ] according to the reference or link format of the html syntax given in fig. 3, matching the data content of the index.
Fig. 4 shows a storage structure of data association. Storing a data relevance key value pair mode about a/sponsors.png file extracted from index.html file data in an extended attribute of an index node of the index.html file, wherein the content of a specific key value pair is <10101586 (0.5,13,108) >, wherein 10101586 is the index node number of the associated file/sponsors.png, and after a client finishes analyzing the modified data content in the index.html file, sending data relevance synchronization information to a metadata server; after the metadata server receives the synchronization information, persistently updating the data relevance to the storage device;
the second step is that: prefetching metadata for associated files
Html files need to be accessed when an application program of a client loads the index. And if the local cache does not have the metadata of the index. FIG. 5 illustrates the operational flow of the metadata server processing a metadata request and prefetching associated file metadata. First, the metadata server performs operation (i) in fig. 5, and receives a metadata request from a client. Then, the metadata server searches the directory entry and the inode of the target file index.html in the metadata cache, and adds the directory entry and the inode to the reply message, that is, operation two, where the inode number of index.html is 10101567. And after the index node of the target file index. Html, if there is no associated file in the target file index, operation # is performed, the prefetch flag of the reply message is set to 0, and the reply message is sent to the client, i.e., operation # is performed. If the target file has the associated file, which is a "/sports.png" file in this embodiment, operation (c) is performed, and the data relevance is traversed one by one, and whether the relevance score of the relevance is greater than the threshold T is determined. If the association score is larger than the threshold value T, operation four is carried out, directory entries and index nodes of the association files/sponsors.png are searched in the metadata cache according to the index node numbers of the association files stored in the data association, and the directory entries and the index nodes are added to the reply message, wherein the index node number of the/sponsors.png files is 10101586. After all data relevance traversal is completed, operation is performed, a prefetch flag is set to be 1 for the reply message, and finally the reply message is sent to the client, namely operation.
The third step: dynamic feedback of data relevance
When the client receives the reply message sent by the metadata server, whether the received reply message is provided with the prefetch mark is judged firstly. If the prefetch mark is 0, the reply message does not contain the metadata of the prefetched file, the directory entry and the inode of the target file are analyzed and cached in the cache of the client, and meanwhile, the logical structures of the directory entry and the inode are established. If the prefetch mark is 1, the directory entry and the inode of the prefetched associated file are analyzed and cached in the cache of the client, and meanwhile, the logical structures of the directory entry and the inode are established. Html and relevant files/sponsors, png, caching the directory entries and inodes of the target files and the relevant files/sponsors into a cache of a client, and establishing a logic structure of the directory entries and the inodes. Html file access usually causes access to associated files/sponsors.png due to the relevance among file data, and subsequent access to metadata of the/sponsors.png files by a client can be completed in a client cache without interaction with a metadata server, so that the number of interaction with the metadata server across a network is reduced.
Fig. 6 is a diagram showing a prefetch feedback table structure. To improve the accuracy of prefetching, the client maintains a prefetch feedback table, organized as shown in FIG. 6. And after the client analyzes the directory entry and the index node of the associated file, adding a prefetch record into the prefetch feedback table. The prefetch record contains the inode number of the associated file, the inode number of the target file that triggered the prefetch, the prefetch time, and access tag information. In this embodiment, the prefetch record of the prefetched associated file/sponsors<10101586,10101567,t1,0>Wherein t is1Png metadata prefetch time, expressed as the time the reply message reaches the client. Since the client side subsequent metadata operation accesses the metadata of the associated file/sponsors.png, the access mark in the prefetch record of the associated file/sponsors.png in the prefetch feedback table is updated to 1, and the prefetch record of the associated file/sponsors.png is updated to 1<10101586,10101567,t1,1>。
Fig. 7 is a schematic diagram illustrating an operation flow of the dynamic feedback mechanism, where the left side of fig. 7 is an operation flow of the client, and the right side is an operation flow of the metadata server. The client traverses the prefetch feedback table once every other Time second, and first performs the operations in fig. 7
Figure GDA0003014286150000061
A prefetch record is retrieved in a prefetch feedback table. If the value of the current Time minus the prefetch Time in the prefetch record is greater than the Time, then the operation is performed
Figure GDA0003014286150000062
Adding the prefetch record to the feedback information of the client; and if the value obtained by subtracting the prefetching Time in the prefetching record from the current Time is not more than the Time, acquiring the next prefetching record for judgment. After traversing the pre-fetching feedback table for one time, the operation is carried out
Figure GDA0003014286150000063
And sending the feedback information to the metadata server. The content of the feedback information in this embodiment is<10101586,10101567,t1,1>。
The fourth step: dynamic update of data associations
The metadata server performs the operations shown in FIG. 7
Figure GDA0003014286150000064
And receiving the prefetch feedback information sent by the client. The metadata server successively traverses the access information of each prefetch record in the feedback information, i.e. the operation
Figure GDA0003014286150000065
If the access mark in the record is 1, the associated file indicating the pre-fetching is accessed by the subsequent client operation, and the operation is carried out
Figure GDA0003014286150000066
Data correlation at target file inode triggering prefetchingIncreasing the association score corresponding to the association by s in the association; if the access flag in the record is not 1, the prefetched associated file is not accessed by the subsequent client operation, and the operation is carried out
Figure GDA0003014286150000067
And subtracting s from the relevance score of the corresponding relevance in the data relevance of the target file index node triggering prefetching. When the pre-fetching record in the feedback information is traversed completely, the operation is executed
Figure GDA0003014286150000068
The updated metadata is persisted to a storage of a metadata server. The closeness degree of the relevance is dynamically adjusted according to the access information of the prefetched file, so that the probability that the file metadata with close relevance is prefetched is higher, the probability that the file metadata with untight relevance is prefetched is lower, and the accuracy of metadata prefetching is further improved. In this embodiment, if the prefetched "/spots. png" file metadata is accessed by the client, the association score in the key value pair with key 10101586 is increased by s in the data association of the index node numbered 10101567.
FIG. 8 is a schematic diagram of the general operation flow of the manner of prefetching associated file metadata according to the method of the present invention, and the access flow of the associated file metadata is optimized by using the reference or link relationship between file data. The client is responsible for operation
Figure GDA0003014286150000069
And
Figure GDA00030142861500000610
initiating metadata requests, operations
Figure GDA00030142861500000611
Caching associated file metadata, operations
Figure GDA00030142861500000612
And
Figure GDA00030142861500000613
dynamically feeding back the prefetching access information to the metadata server; metadata Server is responsible for operations
Figure GDA00030142861500000614
Receiving client request and operation
Figure GDA00030142861500000615
Searching target file metadata and operating
Figure GDA00030142861500000616
Prefetching associated files and data and replying to metadata requests and operations
Figure GDA00030142861500000617
And updating the data relevance in real time according to the prefetching feedback information to complete prefetching and updating of the relevant file.
The method of the invention obtains the data relevance through syntax analysis and integrates the data relevance into the metadata of the distributed file system, so that the distributed file system can support the storage of the data relevance; and the accuracy of prefetching files is improved by threshold control when associated file metadata is prefetched, and the occupation of the cache space of a client is reduced. And designing a client dynamic feedback mechanism according to the access information of the pre-fetched file to update the relevance score of the relevance in real time so as to further improve the accuracy of pre-fetching. Compared with the traditional distributed file system, the metadata access flow of the subsequent associated file can be optimized by prefetching the metadata of the associated file, the number of times of interaction between the client and the metadata server across the network is reduced, and the metadata access delay of the associated file is shortened. Taking the index.html file and/sponsors.png file in the embodiment as an example, the existing distributed file system cannot sense the data association between files, and after accessing the index.html file metadata, the client still needs to interact with the metadata server to obtain the metadata of the associated file/sponsors.png, and in the whole access flow, the client interacts with the metadata server twice; by the method, the distributed file system senses the data relevance in advance, and when the metadata of the index.

Claims (1)

1. A distributed file system metadata prefetching method based on data relevance is characterized by comprising the following steps:
the first step is as follows: extraction mode and storage structure for designing data relevance
Inquiring a corresponding reference or link syntax expression according to a syntax format corresponding to the file type, and designing a target regular expression based on the inquired reference or link syntax expression; when an application program of a client modifies data of a file, syntactic analysis is carried out on data content of the file by using a designed target regular expression to extract a file path name which refers to or links an associated file, and meanwhile, the offset of the associated file path name appearing in a data part and the length of the path name are recorded;
storing data relevance by adopting a data structure of a key value pair, wherein the key of the key value pair is the number of a relevant file index node and is used for uniquely marking a file, and the key value pair is obtained by a metadata server according to the path name of the relevant file to retrieve the content of the corresponding file index node and occupies 8 bytes; the value of the key value pair comprises three parts, namely an association score in the range of [0,1], the length of an associated file path name and the offset of the associated file path name in the data part, and the three parts respectively occupy 4 bytes, 4 bytes and 8 bytes; expanding a metadata structure of the index node of the distributed file system, and storing the key value pair for storing the data relevance in the expanded attribute of the index node of the file so that the distributed file system supports the data relevance; after the client analyzes the modified data content, the client sends data relevance synchronization information to the metadata server; after the metadata server receives the synchronization information, persistently updating the data relevance to the storage device;
the second step is that: prefetching metadata for associated files
When a metadata server processes a metadata operation request of a target file initiated by a client, firstly, a directory entry and an index node of the target file are obtained in a metadata cache of the metadata server; after the index node of the target file is obtained, retrieving each extended attribute of the index node, and obtaining the data relevance of the target file;
setting a threshold value T in the range of [0,1] to represent the lowest value of the closeness degree of the target file and the pre-fetched associated file, and pre-fetching when the value of the association score of the target file and the associated file exceeds the threshold value T; traversing each data relevance of the target file, when the value of the relevance score of the value part in the key value pair is larger than a threshold value T, extracting the number of the index node of the relevant file, and inquiring the directory entry and the index node content of the relevant file in the metadata cache according to the retrieved number of the index node of the relevant file; when the value of the correlation score is smaller than or equal to the threshold value T, skipping the correlation of the data, and performing pre-fetching operation of the correlation of the next data;
the metadata server constructs a reply message to return the metadata of the target file and the associated file to the client, the directory entry and the index node content of the target file are added into the reply message, meanwhile, the metadata server adds the directory entry and the index node content of the associated file inquired in the second step into the reply message, and a pre-fetching mark 1 is set for the reply message to indicate that the constructed reply message contains the metadata of the associated file; if the reply message does not contain the prefetch content, setting a prefetch mark as 0 for the reply message; then the metadata server sends the reply message to the client;
the third step: dynamic feedback of data relevance
When a client receives a reply message sent by a metadata server, firstly, judging whether the reply message is provided with a prefetch mark or not; if the prefetch mark is not set, analyzing the reply request content to obtain a directory entry and an index node of the target file, caching the directory entry and the index node of the analyzed target file in a memory of the client, and simultaneously linking the index node of the target file to the directory entry to establish a logical structure of a target file path;
if the prefetch mark is set, after the contents of the reply request are analyzed to obtain the directory entry and the index node of the target file, the subsequent contents of the reply request are further analyzed to obtain the directory entry and the index node of the associated file, and meanwhile, the index node of the associated file is linked to the directory entry of the associated file, a logic structure of an associated file path is established and cached in a memory of the client; recording the prefetched associated file information, including index node numbers of the associated files, index node numbers of target files triggering prefetching, prefetching time and access marks, and adding the information into a prefetching feedback table of the client; if the prefetched associated file is requested to be accessed by a subsequent client, setting an access flag of the corresponding associated file in the prefetching feedback table to be 1; if the prefetched associated file is not requested to be accessed by a subsequent client, setting an access flag of the corresponding associated file in the prefetching feedback table to be 0;
setting a Time interval Time for traversing the client pre-fetching feedback table within the range of [0, N ]; the client side traverses all records in the pre-fetching feedback table one by one every other Time second and feeds back the access information of the pre-fetched associated files to the metadata server; if the value of the current Time minus the value of the prefetching Time of the traversing associated file is greater than the Time interval Time, constructing a client prefetching feedback request, and adding the index node number of the associated file, the index node number of the target file triggering prefetching and an access mark into the feedback request; if the current Time minus the prefetching Time of the associated file is less than or equal to the Time interval Time, skipping the record and traversing the next prefetching record in the prefetching feedback table; after all records in the pre-fetching feedback table are traversed once, the client sends the constructed pre-fetching feedback request to the metadata server;
the fourth step: dynamic update of data associations
When the metadata server receives a pre-fetching feedback request sent by a client, the pre-fetching records in the request are processed one by one; firstly, inquiring index node information of an associated file and index node information of a target file triggering prefetching according to the index node number of the associated file of each prefetching record and the index node number of the target file triggering prefetching, and retrieving data relevance of the corresponding associated file in the index node information of the target file triggering prefetching to obtain a key value pair of the corresponding associated file;
setting an adjustment score s in the range of [0,1] to represent the adjustment granularity of the tightness degree of the target file and the associated file each time, and if the access mark in the pre-fetching record is 1, increasing the association score of the key value pair by s; if the access mark in the entry prefetch record is 0, reducing the association score of the key-value pair by s; and traversing the prefetching records in the feedback request one by one, updating the data relevance in the index node of the target file triggering prefetching according to the access condition of the prefetched associated file, and finally persisting the data relevance in the storage device of the metadata server.
CN201810681784.7A 2018-06-27 2018-06-27 Distributed file system metadata prefetching method based on data relevance Active CN108920600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810681784.7A CN108920600B (en) 2018-06-27 2018-06-27 Distributed file system metadata prefetching method based on data relevance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810681784.7A CN108920600B (en) 2018-06-27 2018-06-27 Distributed file system metadata prefetching method based on data relevance

Publications (2)

Publication Number Publication Date
CN108920600A CN108920600A (en) 2018-11-30
CN108920600B true CN108920600B (en) 2021-07-06

Family

ID=64422333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810681784.7A Active CN108920600B (en) 2018-06-27 2018-06-27 Distributed file system metadata prefetching method based on data relevance

Country Status (1)

Country Link
CN (1) CN108920600B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471894A (en) * 2019-07-22 2019-11-19 腾讯科技(深圳)有限公司 A kind of data prefetching method, device, terminal and storage medium
CN111026707B (en) * 2019-11-05 2023-01-17 中国科学院计算机网络信息中心 Access method and device for small file object
CN111654540A (en) * 2020-06-01 2020-09-11 重庆高开清芯智联网络科技有限公司 Method and system for prefetching and pushing node data in Internet of things system
CN113688113A (en) * 2021-07-28 2021-11-23 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Metadata prefetching system and method for distributed file system
CN113760190A (en) * 2021-08-23 2021-12-07 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Small file merging system and method based on Ceph storage
CN114168075B (en) * 2021-11-29 2024-05-14 华中科技大学 Method, equipment and system for improving load access performance based on data relevance
CN114297157B (en) * 2021-12-30 2024-04-09 北京字节跳动网络技术有限公司 File processing method, device, equipment and medium
CN114996234B (en) * 2022-06-17 2024-08-13 同盾科技有限公司 Data acquisition method and device, computer storage medium and electronic equipment
CN115269277B (en) * 2022-09-27 2022-12-27 山东恒辉软件有限公司 Intelligent laboratory data collaborative comprehensive management system
CN116841978A (en) * 2023-08-31 2023-10-03 北京趋动智能科技有限公司 Path analysis method, device and storage medium based on distributed file system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515290A (en) * 2009-03-25 2009-08-26 中国工商银行股份有限公司 Metadata management system with bidirectional interactive characteristics and implementation method thereof
CN103970899A (en) * 2014-05-27 2014-08-06 重庆大学 Service-oriented metadata relevance extraction management method and management system
WO2015127404A1 (en) * 2014-02-24 2015-08-27 Microsoft Technology Licensing, Llc Unified presentation of contextually connected information to improve user efficiency and interaction performance
CN105279240A (en) * 2015-09-28 2016-01-27 暨南大学 Client origin information associative perception based metadata pre-acquisition method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515290A (en) * 2009-03-25 2009-08-26 中国工商银行股份有限公司 Metadata management system with bidirectional interactive characteristics and implementation method thereof
WO2015127404A1 (en) * 2014-02-24 2015-08-27 Microsoft Technology Licensing, Llc Unified presentation of contextually connected information to improve user efficiency and interaction performance
CN103970899A (en) * 2014-05-27 2014-08-06 重庆大学 Service-oriented metadata relevance extraction management method and management system
CN105279240A (en) * 2015-09-28 2016-01-27 暨南大学 Client origin information associative perception based metadata pre-acquisition method and system

Also Published As

Publication number Publication date
CN108920600A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN108920600B (en) Distributed file system metadata prefetching method based on data relevance
US10958752B2 (en) Providing access to managed content
US10685017B1 (en) Methods and systems for efficient query rewriting
US5933832A (en) Retrieval system for frequently updated data distributed on network
CN101604324B (en) Method and system for searching video service websites based on meta search
US8959075B2 (en) Systems for storing data streams in a distributed environment
US8738572B2 (en) System and method for storing data streams in a distributed environment
US20040205044A1 (en) Method for storing inverted index, method for on-line updating the same and inverted index mechanism
US11347815B2 (en) Method and system for generating an offline search engine result page
US20080082554A1 (en) Systems and methods for providing a dynamic document index
US9262511B2 (en) System and method for indexing streams containing unstructured text data
CN113836162A (en) Method and device for service decoupling and automatic updating of multi-level cache
US20080189262A1 (en) Word pluralization handling in query for web search
JP5322019B2 (en) Predictive caching method for caching related information in advance, system thereof and program thereof
JPH11102366A (en) Retrieval method and retrieval device
CN114297145A (en) Method, medium and system for searching file based on keywords locally by IPFS node
KR100269114B1 (en) Cache managing method
KR102415155B1 (en) Apparatus and method for retrieving data
US20190057120A1 (en) Efficient Key Data Store Entry Traversal and Result Generation
JP2004070957A (en) Retrieval system
CN117520377A (en) Method and device for inquiring elastic search deep paging and electronic equipment
JP2009187435A (en) Data cache system in resource-saving terminal, and method and program for the same
JPH1153322A (en) Object searching and acquiring method, search server and recording medium
JPH11120067A (en) Data base processing method for managing partial information of data precedently
JP5437219B2 (en) Document search apparatus and document search program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant