[go: nahoru, domu]

WO2012006285A1 - Method for quantifying and analyzing intrinsic parallelism of an algorithm - Google Patents

Method for quantifying and analyzing intrinsic parallelism of an algorithm Download PDF

Info

Publication number
WO2012006285A1
WO2012006285A1 PCT/US2011/042962 US2011042962W WO2012006285A1 WO 2012006285 A1 WO2012006285 A1 WO 2012006285A1 US 2011042962 W US2011042962 W US 2011042962W WO 2012006285 A1 WO2012006285 A1 WO 2012006285A1
Authority
WO
WIPO (PCT)
Prior art keywords
parallelism
algorithm
computer
information related
sense
Prior art date
Application number
PCT/US2011/042962
Other languages
French (fr)
Inventor
Gwo-Giun Chris Lee
He-Yuan Lin
Original Assignee
National Cheng Kung University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from TW099122162A external-priority patent/TWI501168B/en
Priority claimed from US12/832,557 external-priority patent/US20120011186A1/en
Application filed by National Cheng Kung University filed Critical National Cheng Kung University
Priority to CN201180033554.6A priority Critical patent/CN103180821B/en
Priority to KR1020137001820A priority patent/KR20130038903A/en
Priority to JP2013518789A priority patent/JP5925202B2/en
Priority to EP11804255.5A priority patent/EP2591414A4/en
Publication of WO2012006285A1 publication Critical patent/WO2012006285A1/en
Priority to HK13112720.7A priority patent/HK1187121A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present invention relates to a method for quantifying and analyzing parallelism of an algorithm, more particularly to a method for quantifying and analyzing intrinsic parallelism of an algorithm.
  • G. M. Amdahl introduced a method for parallelization of an algorithm according to a ratio of sequential portion of the algorithm ("Validity of single-processor approach to achieving large-scale computing capability, " Proc. of AFIPS Conference, pages 483-485, 1967) .
  • a drawback of Amdahl's method is that a degree of parallel ism of the algorithm obtained using the method is dependent on a target platform executing the method, and is not necessarily dependent on the algorithm itself. Therefore, the degree of parallelism obtained using Amdahl's method is extrinsic to the algorithm and is biased by the target platform.
  • A. Prihozhy et al. proposed a method for evaluating parallelization potential of an algorithm based on a ratio between complexity and a critical path length of the algorithm ("Evaluation of the parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path," IEEE Trans, on Circuits and Systems for Video Technology, pages 593-608, Vol.15, No.5, May2005) .
  • the complexity is a total number of operations in the algorithm
  • the critical path length is the largest number of operations that need to be sequentially executed due to computational data dependencies.
  • the method may characterize an average degree of parallelism embedded in the algorithm, it is insufficient for exhaustively characterizing versatile multigrain parallelisms embedded in the algorithm.
  • the object of the present invention is to provide a method for quantifying and analyzing intrinsic parallelism of an algorithm that is not susceptible to bias by a target hardware and/or software platform .
  • a method of the present invention for quantifying and analyzing intrinsic parallelism of an algorithm is adapted to be implemented by a computer and comprises the steps of:
  • Figure 1 is a flow chart illustrating a preferred embodiment of a method for quantifying and analyzing intrinsic parallelism of an algorithm according to the present invention
  • Figure 2 is a schematic diagram illustrating dataflow information related to an exemplary algorithm
  • Figure 3 is a schematic diagram of an exemplary set of dataflow graphs
  • Figure 4 is a schematic diagram illustrating operation sets of a 4x4 discrete cosine transform algorithm
  • Figure 5 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 6;
  • Figure 6 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 5; and Figure 7 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 3 .
  • the preferred embodiment of a method according to the present invention for evaluating intrinsic parallelism of an algorithm is adapted to be implemented by a computer, and includes the following steps.
  • a degree of intrinsic parallelism indicates a degree of parallelism of an algorithm itself without considering designs and configuration of software and hardware, that is to say, the method according to this invention is not limited by software and hardware when it is used for analyzing an algorithm.
  • the computer is configured to represent an algorithm by means of a plurality of operation sets.
  • Each of the operation sets may be an equation, a program code, a flow chart, or any other form for expressing the algorithm.
  • the algorithm includes three operation sets 01 , 02 and 03 that are expressed as
  • Step 12 is to configure the computer to obtain a
  • Laplacian matrix L d according to the operation sets, and includes the following sub-steps.
  • the computer is configured to obtain dataflow information related to the algorithm.
  • the dataflow information corresponding to the operation sets of the example may be expressed as follows .
  • the computer is configured to obtain a dataflow graph according to the dataflow information.
  • the dataflow graph is composed of a plurality of vertexes that denote operations in the algorithm, and a plurality of directed edges that indicate interconnection between corresponding two of the vertexes and that represent sources and destinations of data in the algorithm.
  • operator symbols Vi to V 7 i.e., the vertexes
  • addition operators and arrows i.e., the directed edges
  • the operator symbol Vi represents the addition operation for Ai+Bi
  • the operator symbol V 2 represents the addition operation for A 2 +B 2
  • the operator symbol V 3 represents the addition operation for A3+B3
  • the operator symbol V 4 represents the addition operation for Datal+Data7
  • the operator symbol V 5 represents the addition operation for Data2+C 2
  • the operator symbol V 6 represents the addition operation for Data3+C 3
  • the operator symbol V 7 represents the addition operation for D1+C1.
  • the operator symbol V 4 is dependent on the operator symbols V x and V 7 .
  • the operator symbol V 5 is dependent on the operator symbol V 2
  • the operator symbol V 6 is dependent on the operator symbol V 3
  • the operator symbols V 4 , V 5 and V 6 are independent of each other.
  • the computer is configured to obtain the Laplacian matrix ⁇ according to the dataflow graphs .
  • the i th diagonal element shows a number of operator symbols that are connected to the operator symbol V ⁇ , and the off-diagonal element denotes whether two operator symbols are connected. Therefore, the Laplacian matrix L d can clearly express the dataflow graphs by a compact linear algebraic form.
  • the set of dataflow graphs shown in Figure 3 may be expressed as follows . 1 0 0 -1 0 0 0 0
  • the Laplacian matrix Lj represents connect ivity among the operator symbols V x to V 7 , and the first column to the seventh column represent the operator symbols Vi to V 7 , respectively.
  • the operator symbol Vi is connected to the operator symbbl V 4 , and thus the matrix element (1,4) is -1.
  • step 13 the computer is configured to compute eigenvalues ⁇ and eigenvectors 3 ⁇ 4of the Laplacian matrix Ld.
  • the eigenvalues ⁇ and the eigenvectors are configured to compute eigenvalues ⁇ and eigenvectors 3 ⁇ 4of the Laplacian matrix Ld.
  • the computer is configured to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues ⁇ and the eigenvectors X d of the Laplacian matrix Ld-
  • the set of information related to intrinsic parallelism is defined in a strict manner to recognize independent ones of the operation sets that are independent of each other and hence can be executed in parallel.
  • the set of information related to strict-sense parallelism includes a degree of strict-sense parallelism representing a number of independent ones of the operation sets of the algorithm, and a set of compositions of strict-sense parallelism corresponding to the operation sets, respectively.
  • a number of connected components in a graph is equal to a number of the eigenvalues of the Laplacian matrix that are equal to 0.
  • the degree of strict-sense parallelism embedded within the algorithm is thus equal to a number of eigenvalues ⁇ that are equal to 0.
  • the compositions of strict-sense parallelism may be identified according to the eigenvectors X d associated with the eigenvalues ⁇ that are equal to 0.
  • the set of dataflow graphs is composed of three independent operation sets, since there exist three Laplacian eigenvalues that are equal to 0.
  • the degree of strict-sense parallelism embedded in the exemplified algorithm is equal to 3.
  • the first, second and third ones of the eigenvectors Xd are associated with the eigenvalues that are equal to 0.
  • the values corresponding to the operator symbols Vi, V 4 and V 7 are non-zero, that is to say, the operator symbols Vi, V 4 and V 7 are dependent and form a connected one (Vi-V 4 -V 7 ) of the dataflow graph.
  • the computer is configured to obtain the degree of strict-sense parallelism that is equal to 3, and the compositions of strict-sense parallelism that may be expressed in the form of a graph (shown in Figure 3) , a table, equations, or program codes.
  • the computer is configured to obtain a plurality of sets of information related to multigrain parallelism of the algorithm according to the set of information related to strict-sense parallelism and at least one of a plurality of dependency depths of the algorithm.
  • the sets of information related to multigrain parallelism include a set of information related to wide-sense parallelism of the algorithm that characterizes all possible parallelisms embedded in an independent operation set.
  • the dependency depths of an algorithm represent associated sequential steps essential for processing the algorithm, and thus are complementary to potential parallelism of the algorithm.
  • information related to different intrinsic parallelisms of an algorithm may be obtained based on different dependency depths.
  • the information related to strict-sense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a maximum one of the dependency depths of the algorithm
  • the information related to wide-sense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a minimum one of the dependency depths .
  • the above-mentioned algorithm includes two different compositions of strict-sense parallelism, i.e., Vi-V 4 -V 7 and V 2 -V 5 (V 3 -V 6 is similar to V 2 -V 5 and can be considered to be the same composition) .
  • the composition of the strict-sense parallelism Vi ⁇ V 4 -V 7 it can be known that the operator symbols Vi and V 7 are independent of each other, i.e. , the operator symbols Vi and V 7 can be processed in parallel . Therefore, the set of information related to wide-sense parallelism of the algorithm includes a degree of wide-sense parallelism that is equal to 4, and compositions of wide-sense parallelism are similar to the compositions of strict-sense parallelism.
  • the degree of wide-sense parallelism of the above-mentioned algorithm is equal to 4. It is assumed that a processing element requires 7 processing cycles for implementing the algorithm, since the algorithm includes 7 operator symbols Vi ⁇ V 7 . According to the degree of strict-sense parallelism that is equal to 3, using 3 processing elements to implement the algorithm will take up 3 processing cycles. According to the degree of wide-sense parallelism that is equal to 4, using 4 processing elements to implement the algorithm will take up 2 processing cycles. Further, it can be known that at least 2 processing cycles are necessary for implementing the algorithm even though more processing elements are used. Therefore, an optimum number of processing elements used for implementing an algorithm may be obtained according to the method of this embodiment .
  • the composition of strict-sense parallelism of this algorithm may be obtained as shown in Figure 5, and the degree of strict-sense parallelism of this algorithm is equal to 4 according to the method of this embodiment.
  • the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 6, and the degree of intrinsic parallelism is equal to 8.
  • the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 7, and the degree of intrinsic parallelism is equal to 16.
  • the method according to this invention may be used to evaluate the intrinsic parallelism of an algorithm.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Discrete Mathematics (AREA)
  • Complex Calculations (AREA)
  • Stored Programmes (AREA)

Abstract

A method for quantifying and analyzing intrinsic parallelism of an algorithm is adapted to be implemented by a computer, and includes the steps of: configuring the computer to represent the algorithm by means of a plurality of operation sets; configuring the computer to obtain a Laplacian matrix according to the operation sets; configuring the computer to compute eigenvalues and eigenvectors of the Laplacian matrix; and configuring the computer to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues and the eigenvectors of the Laplacian matrix.

Description

METHOD FOR QUANTIFYING AND ANALYZING INTRINSIC PARALLELISM OF AN ALGORITHM
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method for quantifying and analyzing parallelism of an algorithm, more particularly to a method for quantifying and analyzing intrinsic parallelism of an algorithm.
2. Description of the Related Art
G. M. Amdahl introduced a method for parallelization of an algorithm according to a ratio of sequential portion of the algorithm ("Validity of single-processor approach to achieving large-scale computing capability, " Proc. of AFIPS Conference, pages 483-485, 1967) . A drawback of Amdahl's method is that a degree of parallel ism of the algorithm obtained using the method is dependent on a target platform executing the method, and is not necessarily dependent on the algorithm itself. Therefore, the degree of parallelism obtained using Amdahl's method is extrinsic to the algorithm and is biased by the target platform.
A. Prihozhy et al. proposed a method for evaluating parallelization potential of an algorithm based on a ratio between complexity and a critical path length of the algorithm ("Evaluation of the parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path," IEEE Trans, on Circuits and Systems for Video Technology, pages 593-608, Vol.15, No.5, May2005) . The complexity is a total number of operations in the algorithm, and the critical path length is the largest number of operations that need to be sequentially executed due to computational data dependencies. Although the method may characterize an average degree of parallelism embedded in the algorithm, it is insufficient for exhaustively characterizing versatile multigrain parallelisms embedded in the algorithm.
SUMMARY OF THE INVENTION
Therefore, the object of the present invention is to provide a method for quantifying and analyzing intrinsic parallelism of an algorithm that is not susceptible to bias by a target hardware and/or software platform .
Accordingly, a method of the present invention for quantifying and analyzing intrinsic parallelism of an algorithm is adapted to be implemented by a computer and comprises the steps of:
a) configuring the computer to represent the algorithm by means of a plurality of operation sets; b) configuring the computer to obtain a Laplacian matrix according to the operation sets;
c) configuring the computer to compute eigenvalues and eigenvectors of the Laplacian matrix; and d) configuring the computer to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues and the eigenvectors of the Laplacian matrix.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the present invention will become apparent in the following detailed description of the preferred embodiment with reference to the accompanying drawings, of which:
Figure 1 is a flow chart illustrating a preferred embodiment of a method for quantifying and analyzing intrinsic parallelism of an algorithm according to the present invention;
Figure 2 is a schematic diagram illustrating dataflow information related to an exemplary algorithm;
Figure 3 is a schematic diagram of an exemplary set of dataflow graphs;
Figure 4 is a schematic diagram illustrating operation sets of a 4x4 discrete cosine transform algorithm;
Figure 5 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 6;
Figure 6 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 5; and Figure 7 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 3 .
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring to Figure 1 , the preferred embodiment of a method according to the present invention for evaluating intrinsic parallelism of an algorithm is adapted to be implemented by a computer, and includes the following steps. A degree of intrinsic parallelism indicates a degree of parallelism of an algorithm itself without considering designs and configuration of software and hardware, that is to say, the method according to this invention is not limited by software and hardware when it is used for analyzing an algorithm.
In step 1 1 , the computer is configured to represent an algorithm by means of a plurality of operation sets. Each of the operation sets may be an equation, a program code, a flow chart, or any other form for expressing the algorithm. In the fol lowing example , the algorithm includes three operation sets 01 , 02 and 03 that are expressed as
01 =A! + B i + Ci + Di ,
02 =A2 + B2 + C2 , and
03 =A3 + B3 + C3 .
Step 12 is to configure the computer to obtain a
Laplacian matrix Ld according to the operation sets, and includes the following sub-steps. In sub-step 121, according to the operation sets, the computer is configured to obtain dataflow information related to the algorithm. As shown in Figure 2, the dataflow information corresponding to the operation sets of the example may be expressed as follows .
Datal=A!+Bi
Data2=A2+B2
Data3=A3+B3
Data4=Datal+Data7
Data5=Data2+C2
Data6=Data3+C3
Figure imgf000006_0001
In sub-step 122, the computer is configured to obtain a dataflow graph according to the dataflow information. The dataflow graph is composed of a plurality of vertexes that denote operations in the algorithm, and a plurality of directed edges that indicate interconnection between corresponding two of the vertexes and that represent sources and destinations of data in the algorithm. For the dataflow information shown in Figure 2, operator symbols Vi to V7 (i.e., the vertexes) are used instead of addition operators and arrows (i.e., the directed edges) represent the sources and destinations of data to thereby obtain the dataflow graph as shown in Figure 3. In particular, the operator symbol Vi represents the addition operation for Ai+Bi, the operator symbol V2 represents the addition operation for A2+B2, the operator symbol V3 represents the addition operation for A3+B3, the operator symbol V4 represents the addition operation for Datal+Data7, the operator symbol V5 represents the addition operation for Data2+C2, the operator symbol V6 represents the addition operation for Data3+C3, and the operator symbol V7 represents the addition operation for D1+C1.
From the dataflow graph shown in Figure 3, it can be appreciated that the operator symbol V4 is dependent on the operator symbols Vx and V7. Similarly, the operator symbol V5 is dependent on the operator symbol V2, the operator symbol V6 is dependent on the operator symbol V3, and the operator symbols V4, V5 and V6 are independent of each other.
In sub-step 123, the computer is configured to obtain the Laplacian matrix ^according to the dataflow graphs . In the Laplacian matrix Ld, the ith diagonal element shows a number of operator symbols that are connected to the operator symbol V±, and the off-diagonal element denotes whether two operator symbols are connected. Therefore, the Laplacian matrix Ld can clearly express the dataflow graphs by a compact linear algebraic form. The set of dataflow graphs shown in Figure 3 may be expressed as follows . 1 0 0 -1 0 0 0
0 1 0 0 -1 0 0
0 0 1 0 0 -1 0
-1 0 0 2 0 0 -1
0 -1 0 0 1 0 0
0 0 -1 0 0 1 0
0 0 0 -1 0 0 1
The Laplacian matrix Lj represents connect ivity among the operator symbols Vx to V7, and the first column to the seventh column represent the operator symbols Vi to V7, respectively. For example, in the first column, the operator symbol Vi is connected to the operator symbbl V4, and thus the matrix element (1,4) is -1.
In step 13, the computer is configured to compute eigenvalues λ and eigenvectors ¾of the Laplacian matrix Ld. Regarding the Laplacian matrix Lj obtained in the above example, the eigenvalues λ and the eigenvectors
λ = [0 0 0 1 2 2 3] , and
Figure imgf000008_0001
In step 14, the computer is configured to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues λ and the eigenvectors Xd of the Laplacian matrix Ld- The set of information related to intrinsic parallelism is defined in a strict manner to recognize independent ones of the operation sets that are independent of each other and hence can be executed in parallel. The set of information related to strict-sense parallelism includes a degree of strict-sense parallelism representing a number of independent ones of the operation sets of the algorithm, and a set of compositions of strict-sense parallelism corresponding to the operation sets, respectively.
Based on spectral graph theory introduced by F. R. K. Chung (Regional Conferences Series in Mathematics, No. 92, 1997), a number of connected components in a graph is equal to a number of the eigenvalues of the Laplacian matrix that are equal to 0. The degree of strict-sense parallelism embedded within the algorithm is thus equal to a number of eigenvalues λ that are equal to 0. Besides, based on the spectral graph theory, the compositions of strict-sense parallelism may be identified according to the eigenvectors Xd associated with the eigenvalues λ that are equal to 0.
From the above example, it can be found that the set of dataflow graphs is composed of three independent operation sets, since there exist three Laplacian eigenvalues that are equal to 0. Thus, the degree of strict-sense parallelism embedded in the exemplified algorithm is equal to 3. Subsequently, the first, second and third ones of the eigenvectors Xd are associated with the eigenvalues that are equal to 0. By observing the first one of the eigenvectors Xd, it is clear that the values corresponding to the operator symbols Vi, V4 and V7 are non-zero, that is to say, the operator symbols Vi, V4 and V7 are dependent and form a connected one (Vi-V4-V7) of the dataflow graph. Similarly, from the second and third ones of the eigenvectors ¾ associated with the eigenvalues λ that are equal to 0, it can be appreciated that the operator symbols V2 , V5 and the operator symbol s V3 , V6 are dependent and form the remaining two connected ones (V2-V5 and V3~V6) of the dataflow graph, respectively. Therefore, the computer is configured to obtain the degree of strict-sense parallelism that is equal to 3, and the compositions of strict-sense parallelism that may be expressed in the form of a graph (shown in Figure 3) , a table, equations, or program codes.
In step 15, the computer is configured to obtain a plurality of sets of information related to multigrain parallelism of the algorithm according to the set of information related to strict-sense parallelism and at least one of a plurality of dependency depths of the algorithm. The sets of information related to multigrain parallelism include a set of information related to wide-sense parallelism of the algorithm that characterizes all possible parallelisms embedded in an independent operation set.
It should be noted that the dependency depths of an algorithm represent associated sequential steps essential for processing the algorithm, and thus are complementary to potential parallelism of the algorithm. Thus, information related to different intrinsic parallelisms of an algorithm may be obtained based on different dependency depths. In particular, the information related to strict-sense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a maximum one of the dependency depths of the algorithm, and the information related to wide-sense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a minimum one of the dependency depths .
For example, the above-mentioned algorithm includes two different compositions of strict-sense parallelism, i.e., Vi-V4-V7 and V2-V5 (V3-V6 is similar to V2-V5 and can be considered to be the same composition) . Regarding the composition of the strict-sense parallelism Vi~V4-V7, it can be known that the operator symbols Vi and V7 are independent of each other, i.e. , the operator symbols Vi and V7 can be processed in parallel . Therefore, the set of information related to wide-sense parallelism of the algorithm includes a degree of wide-sense parallelism that is equal to 4, and compositions of wide-sense parallelism are similar to the compositions of strict-sense parallelism.
According to the method of this embodiment, the degree of wide-sense parallelism of the above-mentioned algorithm is equal to 4. It is assumed that a processing element requires 7 processing cycles for implementing the algorithm, since the algorithm includes 7 operator symbols Vi~V7. According to the degree of strict-sense parallelism that is equal to 3, using 3 processing elements to implement the algorithm will take up 3 processing cycles. According to the degree of wide-sense parallelism that is equal to 4, using 4 processing elements to implement the algorithm will take up 2 processing cycles. Further, it can be known that at least 2 processing cycles are necessary for implementing the algorithm even though more processing elements are used. Therefore, an optimum number of processing elements used for implementing an algorithm may be obtained according to the method of this embodiment .
Taking a 4x4 discrete cosine transform (DCT) as an example, operation sets of the DCT algorithm are represented by dataflow graphs as shown in Figure 4.
Since the 4x4 DCT is well known to those skilled in the art, further details thereof will be omitted herein for the sake of brevity. From Figure 4, it can be known that the maximum one of the dependency depths of the
4x4 DCT algorithm is equal to 6. Regarding the maximum one of the dependency depths (i.e., 6), the composition of strict-sense parallelism of this algorithm may be obtained as shown in Figure 5, and the degree of strict-sense parallelism of this algorithm is equal to 4 according to the method of this embodiment. When analyzing the intrinsic parallelism of the 4x4 DCT algorithm with one of the dependency depths that is equal to 5, the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 6, and the degree of intrinsic parallelism is equal to 8. Further, when analyzing the intrinsic parallelism of the 4x4 DCT algorithm with one of the dependency depths that is equal to 3, the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 7, and the degree of intrinsic parallelism is equal to 16.
In summary, the method according to this invention may be used to evaluate the intrinsic parallelism of an algorithm.
While the present invention has been described in connection with what is considered the most practical and preferred embodiment, it is understood that this invention is not limited to the disclosed embodiment but is intended to cover various arrangements included within the spirit and scope of the broadest interpretation so as to encompass all suchmodifications and equivalent arrangements.

Claims

WHAT IS CLAIMED IS:
1. A method for quantifying and analyzing intrinsic parallelism of an algorithm, said method being adapted to be implemented by a computer and comprising the steps of :
a) configuring the computer to represent the algorithm by means of a plurality of operation sets; b) configuring the computer to obtain a Laplacian matrix according to the operation sets;
c) configuring the computer to compute eigenvalues and eigenvectors of the Laplacian matrix; and
d) configuring the computer to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues and the eigenvectors of the Laplacian matrix.
2. The method as claimed in Claim 1, wherein step b) includes the following sub-steps of:
bl) according to the operation sets , configuring the computer to obtain dataflow information related to the algorithm;
b2) according to the dataflow information, configuring the computer to obtain a dataflow graph composed of a plurality of vertexes that denote operations in the algorithm, and a plurality of directed edges that indicate interconnection between corresponding two of the vertexes and that represent sources and destinations of data in the algorithm; and b3) configuring the computer to obtain the Laplacian matrix according to the dataflow graph.
3. The method as claimed in Claim 1, wherein step d) includes the following sub-steps of:
dl ) according to the eigenvalues and the eigenvectors of the Laplacian matrix, configuring the computer to obtain a set of information related to strict-sense parallelism of the algorithm; and
d2) configuring the computer to obtain a set of information related to multigrain parallelism of the algorithm according to the set of information related to strict-sense parallelism and at least one of a plurality of dependency depths of the algorithm.
4. The method as claimed in Claim 3, wherein the set of information related to strict-sense parallelism includes a degree of strict-sense parallelism representing a number of independent ones of the operation sets of the algorithm, and a set of compositions of strict-sense parallelism corresponding to the operation sets, respectively.
5. The method as claimed in Claim 3 , wherein, in sub-step d2) , the computer is configured to obtain a plurality ofsetsof information related to multigrain parallelism of the algorithm according to the set of information related to strict-sense parallelism and the dependency depths, respectively.
6. The method as claimed in Claim 5, wherein each of the sets of information related to multigrain parallelism includes a degree of multigrain parallelism, and a set of compositions of multigrain parallelism.
7. The method as claimed in Claim 3, wherein the set of information related to multigrain parallelism includes a set of information related to wide-sense parallelism of the algorithm that is obtained according to the set of information related to strict-sense parallelism and a minimum one of the dependency depths.
8. The method as claimed in Claim 7, wherein the set of information related to wide-sense parallelism includes a degree of wide-sense parallelism characterizing all possible parallelism embedded in independent ones of the operation sets of the algorithm, and a set of compositions of wide-sense parallelism.
9. The method as claimed in Claim 3 , wherein, in sub-step dl), the degree of strict-sense parallelism is equal to a number of the eigenvalues that are equal to 0 based on spectral graph theory.
10. The method as claimed in Claim 3, wherein the information related to multigrain parallelism includes a degree of multigrain parallelism, and a set of compositions of multigrain parallelism.
11. A computer program product comprising a machine readable storage medium having program instructions stored therein which when executed cause a computer to perform a method for quantifying and analyzing intrinsic parallelism of an algorithm according to claim 1.
PCT/US2011/042962 2010-07-06 2011-07-05 Method for quantifying and analyzing intrinsic parallelism of an algorithm WO2012006285A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201180033554.6A CN103180821B (en) 2010-07-08 2011-07-05 The quantification of algorithm essence degree of parallelism and analytical approach
KR1020137001820A KR20130038903A (en) 2010-07-06 2011-07-05 Method for quantifying and analyzing intrinsic parallelism of an algorithm
JP2013518789A JP5925202B2 (en) 2010-07-06 2011-07-05 Method for quantifying and analyzing parallel processing of algorithms
EP11804255.5A EP2591414A4 (en) 2010-07-06 2011-07-05 Method for quantifying and analyzing intrinsic parallelism of an algorithm
HK13112720.7A HK1187121A1 (en) 2010-07-08 2013-11-13 Method for quantifying and analyzing intrinsic parallelism of an algorithm

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
TW099122162A TWI501168B (en) 2010-07-06 2010-07-06 An intrinsic parallelism of an algorithm quantification and analysis method
TW099122162 2010-07-06
US12/832,557 2010-07-08
US12/832,557 US20120011186A1 (en) 2010-07-08 2010-07-08 Method for quantifying and analyzing intrinsic parallelism of an algorithm

Publications (1)

Publication Number Publication Date
WO2012006285A1 true WO2012006285A1 (en) 2012-01-12

Family

ID=45441539

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/042962 WO2012006285A1 (en) 2010-07-06 2011-07-05 Method for quantifying and analyzing intrinsic parallelism of an algorithm

Country Status (4)

Country Link
EP (1) EP2591414A4 (en)
JP (1) JP5925202B2 (en)
KR (1) KR20130038903A (en)
WO (1) WO2012006285A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10593080B2 (en) 2017-04-27 2020-03-17 Daegu Gyeongbuk Institute Of Science And Technology Graph generating method and apparatus
KR101998020B1 (en) * 2017-04-27 2019-07-08 재단법인대구경북과학기술원 Method and apparatus for graph generation
CN111061150B (en) * 2019-10-23 2020-11-27 南京大学 Hardware implementation method of Laplace frequency response

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195938A1 (en) * 2000-06-26 2003-10-16 Howard Kevin David Parallel processing systems and method
US7171397B1 (en) * 2002-08-21 2007-01-30 Ncr Corp. Method and system for measuring parallelism of a database system execution step
US20090007116A1 (en) * 2007-06-27 2009-01-01 Microsoft Corporation Adjacent data parallel and streaming operator fusion
US20090028433A1 (en) * 2007-05-03 2009-01-29 David Allen Tolliver Method for partitioning combinatorial graphs
WO2011163223A1 (en) 2010-06-22 2011-12-29 National Cheng Kung University Method of analyzing intrinsic parallelism of algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5587922A (en) * 1993-06-16 1996-12-24 Sandia Corporation Multidimensional spectral load balancing
US6615211B2 (en) * 2001-03-19 2003-09-02 International Business Machines Corporation System and methods for using continuous optimization for ordering categorical data sets
US7724256B2 (en) * 2005-03-21 2010-05-25 Siemens Medical Solutions Usa, Inc. Fast graph cuts: a weak shape assumption provides a fast exact method for graph cuts segmentation
US7406200B1 (en) * 2008-01-08 2008-07-29 International Business Machines Corporation Method and system for finding structures in multi-dimensional spaces using image-guided clustering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195938A1 (en) * 2000-06-26 2003-10-16 Howard Kevin David Parallel processing systems and method
US7171397B1 (en) * 2002-08-21 2007-01-30 Ncr Corp. Method and system for measuring parallelism of a database system execution step
US20090028433A1 (en) * 2007-05-03 2009-01-29 David Allen Tolliver Method for partitioning combinatorial graphs
US20090007116A1 (en) * 2007-06-27 2009-01-01 Microsoft Corporation Adjacent data parallel and streaming operator fusion
WO2011163223A1 (en) 2010-06-22 2011-12-29 National Cheng Kung University Method of analyzing intrinsic parallelism of algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 15, no. 5, May 2005 (2005-05-01), pages 593 - 608
See also references of EP2591414A4 *

Also Published As

Publication number Publication date
EP2591414A1 (en) 2013-05-15
JP2013530477A (en) 2013-07-25
KR20130038903A (en) 2013-04-18
EP2591414A4 (en) 2014-08-06
JP5925202B2 (en) 2016-05-25

Similar Documents

Publication Publication Date Title
Shen et al. Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization
US9383982B2 (en) Data-parallel computation management
Komsta et al. Package ‘outliers’
US9003076B2 (en) Identifying anomalies in original metrics of a system
WO2012006285A1 (en) Method for quantifying and analyzing intrinsic parallelism of an algorithm
CN106933916B (en) JSON character string processing method and device
US10417422B2 (en) Method and apparatus for detecting application
CN106339372B (en) Method and device for optimizing search engine
KR101833220B1 (en) Deobfuscation assessing apparatus of application code and method of assessing deobfuscation of application code using the same
Macko et al. Local clustering in provenance graphs
CN112364185B (en) Method and device for determining characteristics of multimedia resources, electronic equipment and storage medium
US9569856B2 (en) Variable blocking artifact size and offset detection
Cheng et al. An efficient FPRAS type group testing procedure to approximate the number of defectives
US20120011186A1 (en) Method for quantifying and analyzing intrinsic parallelism of an algorithm
KR101609915B1 (en) Method and apparatus for multi dimension time gap analysis
CN111177506A (en) Classification storage method and system based on big data
CN106933933B (en) Data table information processing method and device
Chen et al. IoT malware dynamic analysis profiling system and family behavior analysis
CN110427541A (en) A kind of webpage content extracting method, system, electronic equipment and medium
CN103106283B (en) Duplicate removal treatment method and device
KR102074906B1 (en) Apparatus and method for analysing simility of program
CN109710813B (en) Data processing method and data processing device
US20130226904A1 (en) Determining distance between data sequences
CN112839257B (en) Video content detection method, device, server and storage medium
CN105786872A (en) Method and device for providing question-answer onebox based on user searches

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180033554.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11804255

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011804255

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2013518789

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20137001820

Country of ref document: KR

Kind code of ref document: A