SYSTEM AND METHOD FOR SEARCHING IN A DISTRIBUTED COMPUTING ENVIRONMENT
Background To The Invention
This invention relates to a system and method for searching in a distributed computing environment and in particular, but not exclusively to a system for searching in a peer-to-peer network environment.
The widespread acceptance of the World Wide Web has resulted in a corresponding explosion in available information. The millions of servers connected to the Web creates a huge challenge to searchers, due to the sheer volume of information available. This problem has been addressed by the various search engines available, which often attempt to search the Web in its entirety, or perform localised searches, for example within a top level domain name, based on key words.
Proponents of peer-to-peer computing methodologies view peer- to-peer systems as the next stage of development for the Internet and other local and wide area networks, especially as it applies to machine to machine interaction across organisational boundaries. This methodology involves allowing access to much more of the content of a server than simply its web page and associated links. This increased access results in difficulties in searching, as techniques currently used for searching web pages become inadequate due to the even larger volumes of information available, more rapid changes to the available information and the diversity of information that may be available from each server.
To address this problem, searching methodologies have been developed which allow each server to actively respond to search requests, rather than just supply information regarding its content. In
order to reduce the number of responses, only a subset of the available servers are requested to respond. For example, the approach used in Sun Microsystems JXTA project, available at www.gonesilent.com, which was derived from the public domain code based on Gnutella, is to build a subset, of say 10,000 servers to query. The subset is formed by asking a subset of the servers already known to the searcher to suggest at random another subset, of say 40 servers. These servers are asked in turn to supply another subset of randomly selected servers and the process is repeated until the required 10,000 servers have been identified.
However, a problem with this approach is that the search results may not be particularly reliable and hence many failed searches may result. This is due to the underlying assumption in using this search methodology that the subset of all possible terminals is sufficiently representative to find at least one that has the required information. This may be reliable for popular information such as industry standard reference documents, but is less' likely to work for content that is more obscure, for example details of an individual transaction. There is a need in peer-to-peer systems for a method of searching that increases the probability of relevant hits.
An object of the present invention is to provide a system and/or method of searching a distributed computing environment that will provide improved search results, overcome or alleviate problems in search systems and methods at present, or at least one that will provide the public with a useful alternative.
Other objects of the present invention may become apparent from the following description.
1 . Summary Of The Invention
According to a first aspect of the invention, there is provided a method for searching a distributed computing environment, the method
including defining a search query, selecting a plurality of intelligent , nodes, sending said search query to said selected intelligent nodes and receiving search results from said selected intelligent nodes, wherein the method includes selecting said plurality of intelligent nodes by:
a) defining one or more search parameters;
b) selecting a predetermined number of nodes known to a first node of the distributed computing environment according to the degree of satisfaction of said one or more search parameters;
c) requesting the predetermined number of nodes selected in step b) to select further intelligent nodes within the network according to the degree of satisfaction of said one or more search parameters; and
d) repeatedly requesting selected nodes to select further nodes until a required number of intelligent nodes have been selected or a set of possible nodes that meet certain selection criteria has been exhausted.
Preferably, the selection of nodes may be performed by each node based on its own records of communication with the network.
Preferably, the step of receiving search results from said selected intelligent nodes may include each selected node sending the search results directly to- said first node.
Preferably, the step of receiving search results from said selected intelligent nodes may include each selected node sending its search results to the node that selected it until the search results are sent to said first node.
Preferably, said search parameters may include any one or combination of the number or times the intelligent node has- been
accessed, volume of information transfer, value of transactions, and reliability of connection or service.
Preferably, said search parameters may include the logical or geographical location of a node within the network.
Preferably, the method may further include providing with said search parameters a counter to record the number of nodes used to select further nodes and wherein the counter is used to limit the number of nodes used to select further nodes.
Preferably, the method may further include providing with said search parameters instructions to nodes to progressively decrease the number of nodes selected as each set of selected nodes is used to select further nodes.
According to a second aspect of the invention, there is provided a program executable by a computer processing means and including instructions for causing a node within a distributed computing environment to search the distributed computing environment, wherein said instructions when executed cause the node to:
a) select a predetermined number of nodes known to it within the distributed computing environment according to the degree of satisfaction of said one or more search parameters and send a search query to selected nodes;
b) send a request or instruction to nodes selected in step a) to select further nodes within the network according to the degree of satisfaction of said one or more search parameters; and
c) receive search results from selected nodes in steps a) and b) .
Preferably, the program may include instructions to cause each selected node to select a set of further nodes until a required number of nodes have been selected.
Preferably, the program may include instructions to cause the node to store a record of selected network activities so that such record may be used to select nodes according the degree of satisfaction with a set of search parameters related to said network activities.
Preferably, the record may include details of other nodes which the node communicates with.
Preferably, the program may further include instructions to cause the node to only be selected by another node once.
Preferably, the program may further include instructions to include with said search parameters a counter to record the number of nodes used to select further nodes and wherein the counter is used to limit the number of nodes used to select further nodes.
Preferably, the program may further include instructions to include with said search parameters instructions to nodes to progressively decrease the number of nodes selected as each set of selected nodes is used to select further nodes.
According to a third aspect of the invention, there is provided a program executable by a computer processing means and including instructions when executed cause a node in a network to:
a) receive a search query and search information available to it and return search results to another node in the network;
b) receive one or more search parameters and select a predetermined number of nodes known to it within the network according to the degree of satisfaction of said one or more search parameters;
c) send said search query and said search parameters to nodes selected in step b).
Preferably, the program may further include instructions to receive search results from nodes selected in step b) and send the search results to a node from which it received the search query and one or more search parameters.
According to a fourth aspect of the invention, there is provided a computer readable media containing the program as described in the immediately preceding paragraphs.
According to a fifth aspect of the invention, there is provided a node within a computer network having a computer processing means and a memory readable by the computer processing means, wherein the memory stores a program as described in the preceding paragraphs.
Further aspects of the present invention, which should be considered in all its novel aspects may become apparent from the following description given by way of example only.
Modes for Performing the Invention
The volume of information available on the World-Wide-Web makes exhaustive searching almost impossible, even without taking into account the dynamic nature of the available information. In peer- to-peer network systems, the amount of information available is increased further and becomes even more dynamic. With application of peer-to-peer methodologies to a network, a network effectively becomes a distributed computing environment. The present invention provides a means for selecting within a distributed network suitable servers, terminals, searchable databases, sub-networks or the like (hereinafter intelligent nodes) that are suitable to be used for searching for specific information.
Referring to Figure 1 , a simplified block diagram representation of a computer network in which the present invention may be implemented is shown. The computer network includes nodes 1 a-e, which may be located anywhere in the network. Each node 1 a-e includes a network interface 2 and stores in memory or otherwise has access to computer readable information. This information, which may include code for applications, and data associated with applications such as documents is herein referred to and referenced in the accompanying drawings as objects 3, with each node 1 a-e having its associated objects 3a-e respectively.
Each object 3 may be referred to by its logical address. For the Internet, documents are typically identified by a uniform resource locator (URL). In a peer-to-peer network system URI's may be used for identification of objects. For clarity, throughout the remainder of the following description it is assumed that the nodes 1 a-e are connected to and may communicate with each other through the Internet. In order to provide a suitable interactive user experience, each node may support interactive access from a browsing application that provides functionality consistent with the mode of access used by an interactive user (for example a browsing program for use on a personal network might announce or play back content it browses in response to user's browsing instructions entered by a stylus on a PDA), and each node may connect to a network consistent with its mode of access for connectivity with other nodes. Connectivity to the Internet for example would be by a TCP/IP connection. Thus, the present invention may have application to many networks, the only requirement on the network is that end points (cellphones, PDAs, computers) are uniquely addressable either globally (ESNs on cellphones, MACs on Ethernet cards) or via a gateway (e.g. NAT in private IP networks) and that any logical addressing scheme (phone numbers, IP addresses or dns entries) must be resolvable to a unique network/device specific address for the endpoint, typically on demand. The underlying network may therefore may include, for example
personal networks (blue-tooth), telephony networks, wireless networks, voice recognition networks, and device networks.
The present invention may have particular application to peer- to-peer networked systems. In Figure 1 nodes 1 a-d represent peer nodes, forming a sub-network. The present invention may search for any form of content in peer nodes, for example XML fragments, database rows, text files (which includes web pages), binary format content (e.g. music) and documents generally). Node l e represents a web site host and thus has a number web pages stored in memory 4 available for display through the Internet on request. Some of the web pages available through node 1 e may be administered by users of nodes 1 a- d. Node 1 e may not be part of a peer network, limiting the amount of information available for searching. In a practical implementation, the total number of nodes in the network may number in the millions, and there may be many peer sub-networks within the network, each node or sub-network providing various amounts of material to be searched.
A search within a network typically involves a set of search criteria. The present invention also uses search criteria. A user at node 1 a may enter the search criteria in the appropriate form using a user interface 4, typically a keyboard and visual display unit. The search criteria identifies, broadly or specifically the information that is sought from the search process. This information can be formatted in the form of a search query. A common form of search query suitable for the purposes of the present invention is a keyword search that makes use of Boolean operators such as "AND", "OR", "NOT" and
"XOR" to define what information is required, optional and/or excluded from an object, such as a document in order to satisfy the search query. The present invention may use an existing form of search query. The form, structure or criteria used in the search query is not important. Various forms of search queries and methods of interpreting search queries for computer network searching are well known and therefore will not be detailed further herein.
A problem with searching a network including peer-to-peer networked systems is that any one of the nodes of the network may contain relevant information. Searching every node in large networks is impractical and therefore the problem arises how to select appropriate nodes to search. The present invention avoids having to rely on centralised search engines and their associated disadvantages by making use of the knowledge that the intelligent nodes in the network have of other nodes. This knowledge is recorded as a set of search parameters at each intelligent node.
The search parameters identify features of intelligent nodes in the network that would make them likely candidates for containing information that satisfy the search criteria. In order to allow searching based on search parameters, each node records and has available on request information regarding the nodes with which it has communicated. Referring again to Figure 1 , this information is stored at each node 1 a-e persistently (e.g. in a file) 5a-e respectively. An example of a search parameter may be the level of use of an intelligent node. The level of use of a node may be measured, for example by the volume of information transferred to and or from the intelligent node, or the commercial value of transactions condμcted through the node.
There are many different variables that may be used as search parameters depending on the search requirements. For example, other variables that may be used include the logical location of the intelligent node, the physical location of the node, the physical location of objects that the node has information relating to, the type of information that the node has access to, and/or the quality and reliability of service. Weighted combinations of search parameters or a search parameter that is a mathematical transformation of one or more search parameters may be used if required.
The parameters used for searches may be predefined, for example a search may use parameters suitable to "Identify and rank
suppliers of Foreign Exchange products (typically banks at the moment) whose customers report less than 0.5% errors 95% of the time in order fulfilment". The parameters may also specify (by making reference to a metric such as value or volume of transactions that the first search step conducted by the initiating search node ranks customers that fall within a group of people that starts with the people that users of that node do business with most of the time and grows through the use of other nodes to include the business partners they do business with most of the time etc". Alternatively the parameters may be ad hoc. Ad hoc searches are created, populated with parameters and invoked by users having regard to the information that is available in files 5a-e.
Referring to Figure 2, a flow chart of the steps involved for performing a search in accordance with the present invention is shown. It is assumed that a user at node 1 a (see Figure 1 ) requires the search to be conducted. When preparing a search according to the present invention, the search query and a set of search parameters are formed at node 1 a (boxes 1 and 4).
The node 1 a ranks other intelligent nodes within the network that it has recorded characteristics of in its search parameter file 5a according to their degree of satisfaction of the search parameter(s) (box 2). For example, if a search parameter is the volume of information sent to and received from an intelligent node, the searching node places the highest rank to the intelligent node that it communicates with the most. The node 1 a then selects a predetermined number of the highest ranking nodes for use in expanding the search. One of the predetermined number of intelligent nodes selected by the search node A is referenced by node 10. Where the node 1 a is part of a peer-to-peer network, it is likely that the selected intelligent nodes will include the nodes 1 b-d due to increased communication with these nodes. However, it is not necessary that
the selected nodes form part of a peer network with node 1 a. For example, node 1 e may be one of the selected intelligent nodes.
The search node 1 a sends the search parameters and search query (boxes 3 and 5 respectively) to each selected intelligent node, including node 10. Node 10 (and the other selected nodes) receives the search parameters and search query (box 6) and searches its contents according to the search query (box 7). The results of the search are then returned to node 1 a (boxes 8 and 9). The search of each node is preferably performed locally rather than the node 1 a remotely interrogating the contents of each of node, although remote interrogation may be used where required.
Node 10 (and the other selected nodes) also evaluates and ranks intelligent nodes that are known to it according to the same search parameters as node 1 a used to select node 10 (box 10). This results in the identification of further intelligent nodes, one of which is indicated by node 20. Node 10 sends and node 20 receives the search query and search parameters (boxes 1 1 and 1 2).
Thus, if node 1 a identified, say 40 intelligent nodes and sent requests to each of those to identify a further 40 intelligent nodes, 1 600 nodes (or 1 601 nodes if the search node 1 a is also counted) are identified. Each of these 1 600 nodes returns the results of their search to node 1 a, which receives the results and collates them for provision to a user in a suitable format.
In an alternative, but less preferred embodiment of the invention, the search request may be sent directly from node 1 a to each identified node, requiring each level to communicate the results of the identification and ranking process back to node 1 a. In a further alternative embodiment, each new set of selected nodes may, instead of returning results that satisfy the search criteria directly back to node A, return them back through the same path that the search parameters travelled to reach them.
Node 20 may also rank nodes known to it to identify further nodes that may be used for searching. If another 40 nodes were identified, a total of 64000 (or 64001 ) nodes results. This process may be repeated until a required number of nodes have been identified. Typically the required number of nodes over which a search is conducted is an approximate rather than exact number. The size of the set of nodes to query is chosen to be sufficient that the likelihood of obtaining non representative results is statistically insignificant. To ensure that the automated propagation of requests eventually stops the requests may have a "time to live" counter, which is decremented when processed by a node for the first time. This has the effect of creating a maximum 'horizon' for the subnet that is searched - effectively constraining the maximum 'depth' of the search"..
If a node receives the same request a second time then it may fail the request, forcing the requester to deal with the failure, typically by selecting a less desirable node within the constraints or if none exist within the constraints, forcing its requester to deal with the failure, etc until the original requester is informed of the success or failure. The net result is that the original requesting node is informed of the total number of nodes queried and that no node is queried more than once.
Alternatively or in addition, the- number of nodes used in the search may be regulated by progressively decreasing selected nodes at each layer to bias the results to those within the communication proximity of node 1 a. Instructions to regulate the number of nodes selected may be sent with the search parameters for use by the nodes.
If an intelligent node does not have information relevant to the search parameters sent by node 1 a, it may either not send a response to the request or may randomly select a number of intelligent nodes, which may be less than 40 to reflect the random nature of the selection. Which option the node chooses may be determined by the communication sent from node 1 a. The node may send notification to
node 1 a to inform it of this occurrence so that node 1 a remains aware of the number of nodes that have been utilised in the search.
If the available nodes are exhausted and a required number of nodes has not been reached, the search may continue using only the selected nodes. The user of node 1 a may be informed that insufficient nodes were used in the search.
Instructions to enable each node within a network to search in the above described manner may be in a program stored in each node. Alternatively, each node may receive its instructions from a centralised location, such as from the node that started the search. The node that starts the search (node 1 a) may send preferences for the search, which dictate how the search is to be performed. Such preferences may include excluding any nodes having certain characteristics and what search parameters should be used. In one embodiment of the invention, node 1 a may send a program to each node, the program when executed causing the node to which it was sent to perform a search according to the search query and select further nodes if required.
Thus, the present invention allows searching of a distributed computing network through identification and ranking of intelligent nodes that satisfy a set of search parameters. This process aims to bias the selection of intelligent nodes to those that are more likely to result in successful hits when performing a full search for information that satisfy a set of search criteria.
Where in the foregoing description, reference has been made to specific components or integers of the invention having known equivalents then such equivalents are herein incorporated as if individually set forth.
Although this invention has been described by way of example and with reference to possible embodiments thereof, it is to be
understood that modifications or improvements may be made thereto without departing from the scope of the invention as defined in the appended claims.