WO2012006285A1 - Method for quantifying and analyzing intrinsic parallelism of an algorithm - Google Patents
Method for quantifying and analyzing intrinsic parallelism of an algorithm Download PDFInfo
- Publication number
- WO2012006285A1 WO2012006285A1 PCT/US2011/042962 US2011042962W WO2012006285A1 WO 2012006285 A1 WO2012006285 A1 WO 2012006285A1 US 2011042962 W US2011042962 W US 2011042962W WO 2012006285 A1 WO2012006285 A1 WO 2012006285A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- parallelism
- algorithm
- computer
- information related
- sense
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 239000011159 matrix material Substances 0.000 claims abstract description 21
- 239000000203 mixture Substances 0.000 claims description 18
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 description 10
- 230000001419 dependent effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/147—Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
Definitions
- the present invention relates to a method for quantifying and analyzing parallelism of an algorithm, more particularly to a method for quantifying and analyzing intrinsic parallelism of an algorithm.
- G. M. Amdahl introduced a method for parallelization of an algorithm according to a ratio of sequential portion of the algorithm ("Validity of single-processor approach to achieving large-scale computing capability, " Proc. of AFIPS Conference, pages 483-485, 1967) .
- a drawback of Amdahl's method is that a degree of parallel ism of the algorithm obtained using the method is dependent on a target platform executing the method, and is not necessarily dependent on the algorithm itself. Therefore, the degree of parallelism obtained using Amdahl's method is extrinsic to the algorithm and is biased by the target platform.
- A. Prihozhy et al. proposed a method for evaluating parallelization potential of an algorithm based on a ratio between complexity and a critical path length of the algorithm ("Evaluation of the parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path," IEEE Trans, on Circuits and Systems for Video Technology, pages 593-608, Vol.15, No.5, May2005) .
- the complexity is a total number of operations in the algorithm
- the critical path length is the largest number of operations that need to be sequentially executed due to computational data dependencies.
- the method may characterize an average degree of parallelism embedded in the algorithm, it is insufficient for exhaustively characterizing versatile multigrain parallelisms embedded in the algorithm.
- the object of the present invention is to provide a method for quantifying and analyzing intrinsic parallelism of an algorithm that is not susceptible to bias by a target hardware and/or software platform .
- a method of the present invention for quantifying and analyzing intrinsic parallelism of an algorithm is adapted to be implemented by a computer and comprises the steps of:
- Figure 1 is a flow chart illustrating a preferred embodiment of a method for quantifying and analyzing intrinsic parallelism of an algorithm according to the present invention
- Figure 2 is a schematic diagram illustrating dataflow information related to an exemplary algorithm
- Figure 3 is a schematic diagram of an exemplary set of dataflow graphs
- Figure 4 is a schematic diagram illustrating operation sets of a 4x4 discrete cosine transform algorithm
- Figure 5 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 6;
- Figure 6 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 5; and Figure 7 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 3 .
- the preferred embodiment of a method according to the present invention for evaluating intrinsic parallelism of an algorithm is adapted to be implemented by a computer, and includes the following steps.
- a degree of intrinsic parallelism indicates a degree of parallelism of an algorithm itself without considering designs and configuration of software and hardware, that is to say, the method according to this invention is not limited by software and hardware when it is used for analyzing an algorithm.
- the computer is configured to represent an algorithm by means of a plurality of operation sets.
- Each of the operation sets may be an equation, a program code, a flow chart, or any other form for expressing the algorithm.
- the algorithm includes three operation sets 01 , 02 and 03 that are expressed as
- Step 12 is to configure the computer to obtain a
- Laplacian matrix L d according to the operation sets, and includes the following sub-steps.
- the computer is configured to obtain dataflow information related to the algorithm.
- the dataflow information corresponding to the operation sets of the example may be expressed as follows .
- the computer is configured to obtain a dataflow graph according to the dataflow information.
- the dataflow graph is composed of a plurality of vertexes that denote operations in the algorithm, and a plurality of directed edges that indicate interconnection between corresponding two of the vertexes and that represent sources and destinations of data in the algorithm.
- operator symbols Vi to V 7 i.e., the vertexes
- addition operators and arrows i.e., the directed edges
- the operator symbol Vi represents the addition operation for Ai+Bi
- the operator symbol V 2 represents the addition operation for A 2 +B 2
- the operator symbol V 3 represents the addition operation for A3+B3
- the operator symbol V 4 represents the addition operation for Datal+Data7
- the operator symbol V 5 represents the addition operation for Data2+C 2
- the operator symbol V 6 represents the addition operation for Data3+C 3
- the operator symbol V 7 represents the addition operation for D1+C1.
- the operator symbol V 4 is dependent on the operator symbols V x and V 7 .
- the operator symbol V 5 is dependent on the operator symbol V 2
- the operator symbol V 6 is dependent on the operator symbol V 3
- the operator symbols V 4 , V 5 and V 6 are independent of each other.
- the computer is configured to obtain the Laplacian matrix ⁇ according to the dataflow graphs .
- the i th diagonal element shows a number of operator symbols that are connected to the operator symbol V ⁇ , and the off-diagonal element denotes whether two operator symbols are connected. Therefore, the Laplacian matrix L d can clearly express the dataflow graphs by a compact linear algebraic form.
- the set of dataflow graphs shown in Figure 3 may be expressed as follows . 1 0 0 -1 0 0 0 0
- the Laplacian matrix Lj represents connect ivity among the operator symbols V x to V 7 , and the first column to the seventh column represent the operator symbols Vi to V 7 , respectively.
- the operator symbol Vi is connected to the operator symbbl V 4 , and thus the matrix element (1,4) is -1.
- step 13 the computer is configured to compute eigenvalues ⁇ and eigenvectors 3 ⁇ 4of the Laplacian matrix Ld.
- the eigenvalues ⁇ and the eigenvectors are configured to compute eigenvalues ⁇ and eigenvectors 3 ⁇ 4of the Laplacian matrix Ld.
- the computer is configured to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues ⁇ and the eigenvectors X d of the Laplacian matrix Ld-
- the set of information related to intrinsic parallelism is defined in a strict manner to recognize independent ones of the operation sets that are independent of each other and hence can be executed in parallel.
- the set of information related to strict-sense parallelism includes a degree of strict-sense parallelism representing a number of independent ones of the operation sets of the algorithm, and a set of compositions of strict-sense parallelism corresponding to the operation sets, respectively.
- a number of connected components in a graph is equal to a number of the eigenvalues of the Laplacian matrix that are equal to 0.
- the degree of strict-sense parallelism embedded within the algorithm is thus equal to a number of eigenvalues ⁇ that are equal to 0.
- the compositions of strict-sense parallelism may be identified according to the eigenvectors X d associated with the eigenvalues ⁇ that are equal to 0.
- the set of dataflow graphs is composed of three independent operation sets, since there exist three Laplacian eigenvalues that are equal to 0.
- the degree of strict-sense parallelism embedded in the exemplified algorithm is equal to 3.
- the first, second and third ones of the eigenvectors Xd are associated with the eigenvalues that are equal to 0.
- the values corresponding to the operator symbols Vi, V 4 and V 7 are non-zero, that is to say, the operator symbols Vi, V 4 and V 7 are dependent and form a connected one (Vi-V 4 -V 7 ) of the dataflow graph.
- the computer is configured to obtain the degree of strict-sense parallelism that is equal to 3, and the compositions of strict-sense parallelism that may be expressed in the form of a graph (shown in Figure 3) , a table, equations, or program codes.
- the computer is configured to obtain a plurality of sets of information related to multigrain parallelism of the algorithm according to the set of information related to strict-sense parallelism and at least one of a plurality of dependency depths of the algorithm.
- the sets of information related to multigrain parallelism include a set of information related to wide-sense parallelism of the algorithm that characterizes all possible parallelisms embedded in an independent operation set.
- the dependency depths of an algorithm represent associated sequential steps essential for processing the algorithm, and thus are complementary to potential parallelism of the algorithm.
- information related to different intrinsic parallelisms of an algorithm may be obtained based on different dependency depths.
- the information related to strict-sense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a maximum one of the dependency depths of the algorithm
- the information related to wide-sense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a minimum one of the dependency depths .
- the above-mentioned algorithm includes two different compositions of strict-sense parallelism, i.e., Vi-V 4 -V 7 and V 2 -V 5 (V 3 -V 6 is similar to V 2 -V 5 and can be considered to be the same composition) .
- the composition of the strict-sense parallelism Vi ⁇ V 4 -V 7 it can be known that the operator symbols Vi and V 7 are independent of each other, i.e. , the operator symbols Vi and V 7 can be processed in parallel . Therefore, the set of information related to wide-sense parallelism of the algorithm includes a degree of wide-sense parallelism that is equal to 4, and compositions of wide-sense parallelism are similar to the compositions of strict-sense parallelism.
- the degree of wide-sense parallelism of the above-mentioned algorithm is equal to 4. It is assumed that a processing element requires 7 processing cycles for implementing the algorithm, since the algorithm includes 7 operator symbols Vi ⁇ V 7 . According to the degree of strict-sense parallelism that is equal to 3, using 3 processing elements to implement the algorithm will take up 3 processing cycles. According to the degree of wide-sense parallelism that is equal to 4, using 4 processing elements to implement the algorithm will take up 2 processing cycles. Further, it can be known that at least 2 processing cycles are necessary for implementing the algorithm even though more processing elements are used. Therefore, an optimum number of processing elements used for implementing an algorithm may be obtained according to the method of this embodiment .
- the composition of strict-sense parallelism of this algorithm may be obtained as shown in Figure 5, and the degree of strict-sense parallelism of this algorithm is equal to 4 according to the method of this embodiment.
- the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 6, and the degree of intrinsic parallelism is equal to 8.
- the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 7, and the degree of intrinsic parallelism is equal to 16.
- the method according to this invention may be used to evaluate the intrinsic parallelism of an algorithm.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Discrete Mathematics (AREA)
- Complex Calculations (AREA)
- Stored Programmes (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201180033554.6A CN103180821B (en) | 2010-07-08 | 2011-07-05 | The quantification of algorithm essence degree of parallelism and analytical approach |
KR1020137001820A KR20130038903A (en) | 2010-07-06 | 2011-07-05 | Method for quantifying and analyzing intrinsic parallelism of an algorithm |
JP2013518789A JP5925202B2 (en) | 2010-07-06 | 2011-07-05 | Method for quantifying and analyzing parallel processing of algorithms |
EP11804255.5A EP2591414A4 (en) | 2010-07-06 | 2011-07-05 | Method for quantifying and analyzing intrinsic parallelism of an algorithm |
HK13112720.7A HK1187121A1 (en) | 2010-07-08 | 2013-11-13 | Method for quantifying and analyzing intrinsic parallelism of an algorithm |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW099122162A TWI501168B (en) | 2010-07-06 | 2010-07-06 | An intrinsic parallelism of an algorithm quantification and analysis method |
TW099122162 | 2010-07-06 | ||
US12/832,557 | 2010-07-08 | ||
US12/832,557 US20120011186A1 (en) | 2010-07-08 | 2010-07-08 | Method for quantifying and analyzing intrinsic parallelism of an algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012006285A1 true WO2012006285A1 (en) | 2012-01-12 |
Family
ID=45441539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2011/042962 WO2012006285A1 (en) | 2010-07-06 | 2011-07-05 | Method for quantifying and analyzing intrinsic parallelism of an algorithm |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP2591414A4 (en) |
JP (1) | JP5925202B2 (en) |
KR (1) | KR20130038903A (en) |
WO (1) | WO2012006285A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10593080B2 (en) | 2017-04-27 | 2020-03-17 | Daegu Gyeongbuk Institute Of Science And Technology | Graph generating method and apparatus |
KR101998020B1 (en) * | 2017-04-27 | 2019-07-08 | 재단법인대구경북과학기술원 | Method and apparatus for graph generation |
CN111061150B (en) * | 2019-10-23 | 2020-11-27 | 南京大学 | Hardware implementation method of Laplace frequency response |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195938A1 (en) * | 2000-06-26 | 2003-10-16 | Howard Kevin David | Parallel processing systems and method |
US7171397B1 (en) * | 2002-08-21 | 2007-01-30 | Ncr Corp. | Method and system for measuring parallelism of a database system execution step |
US20090007116A1 (en) * | 2007-06-27 | 2009-01-01 | Microsoft Corporation | Adjacent data parallel and streaming operator fusion |
US20090028433A1 (en) * | 2007-05-03 | 2009-01-29 | David Allen Tolliver | Method for partitioning combinatorial graphs |
WO2011163223A1 (en) | 2010-06-22 | 2011-12-29 | National Cheng Kung University | Method of analyzing intrinsic parallelism of algorithm |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5587922A (en) * | 1993-06-16 | 1996-12-24 | Sandia Corporation | Multidimensional spectral load balancing |
US6615211B2 (en) * | 2001-03-19 | 2003-09-02 | International Business Machines Corporation | System and methods for using continuous optimization for ordering categorical data sets |
US7724256B2 (en) * | 2005-03-21 | 2010-05-25 | Siemens Medical Solutions Usa, Inc. | Fast graph cuts: a weak shape assumption provides a fast exact method for graph cuts segmentation |
US7406200B1 (en) * | 2008-01-08 | 2008-07-29 | International Business Machines Corporation | Method and system for finding structures in multi-dimensional spaces using image-guided clustering |
-
2011
- 2011-07-05 JP JP2013518789A patent/JP5925202B2/en active Active
- 2011-07-05 EP EP11804255.5A patent/EP2591414A4/en not_active Withdrawn
- 2011-07-05 KR KR1020137001820A patent/KR20130038903A/en not_active Application Discontinuation
- 2011-07-05 WO PCT/US2011/042962 patent/WO2012006285A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195938A1 (en) * | 2000-06-26 | 2003-10-16 | Howard Kevin David | Parallel processing systems and method |
US7171397B1 (en) * | 2002-08-21 | 2007-01-30 | Ncr Corp. | Method and system for measuring parallelism of a database system execution step |
US20090028433A1 (en) * | 2007-05-03 | 2009-01-29 | David Allen Tolliver | Method for partitioning combinatorial graphs |
US20090007116A1 (en) * | 2007-06-27 | 2009-01-01 | Microsoft Corporation | Adjacent data parallel and streaming operator fusion |
WO2011163223A1 (en) | 2010-06-22 | 2011-12-29 | National Cheng Kung University | Method of analyzing intrinsic parallelism of algorithm |
Non-Patent Citations (2)
Title |
---|
IE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 15, no. 5, May 2005 (2005-05-01), pages 593 - 608 |
See also references of EP2591414A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP2591414A1 (en) | 2013-05-15 |
JP2013530477A (en) | 2013-07-25 |
KR20130038903A (en) | 2013-04-18 |
EP2591414A4 (en) | 2014-08-06 |
JP5925202B2 (en) | 2016-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shen et al. | Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization | |
US9383982B2 (en) | Data-parallel computation management | |
Komsta et al. | Package ‘outliers’ | |
US9003076B2 (en) | Identifying anomalies in original metrics of a system | |
WO2012006285A1 (en) | Method for quantifying and analyzing intrinsic parallelism of an algorithm | |
CN106933916B (en) | JSON character string processing method and device | |
US10417422B2 (en) | Method and apparatus for detecting application | |
CN106339372B (en) | Method and device for optimizing search engine | |
KR101833220B1 (en) | Deobfuscation assessing apparatus of application code and method of assessing deobfuscation of application code using the same | |
Macko et al. | Local clustering in provenance graphs | |
CN112364185B (en) | Method and device for determining characteristics of multimedia resources, electronic equipment and storage medium | |
US9569856B2 (en) | Variable blocking artifact size and offset detection | |
Cheng et al. | An efficient FPRAS type group testing procedure to approximate the number of defectives | |
US20120011186A1 (en) | Method for quantifying and analyzing intrinsic parallelism of an algorithm | |
KR101609915B1 (en) | Method and apparatus for multi dimension time gap analysis | |
CN111177506A (en) | Classification storage method and system based on big data | |
CN106933933B (en) | Data table information processing method and device | |
Chen et al. | IoT malware dynamic analysis profiling system and family behavior analysis | |
CN110427541A (en) | A kind of webpage content extracting method, system, electronic equipment and medium | |
CN103106283B (en) | Duplicate removal treatment method and device | |
KR102074906B1 (en) | Apparatus and method for analysing simility of program | |
CN109710813B (en) | Data processing method and data processing device | |
US20130226904A1 (en) | Determining distance between data sequences | |
CN112839257B (en) | Video content detection method, device, server and storage medium | |
CN105786872A (en) | Method and device for providing question-answer onebox based on user searches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180033554.6 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11804255 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011804255 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2013518789 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20137001820 Country of ref document: KR Kind code of ref document: A |