[go: nahoru, domu]

Jump to content

Loop optimization: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎References: add template
PrimeBOT (talk | contribs)
m →‎top: Task 44: clean up shortdesc whitespace issues
 
(24 intermediate revisions by 20 users not shown)
Line 1: Line 1:
In [[compiler theory]], '''loop optimization''' is the process of the increasing execution speed and reducing the overheads associated of [[Control flow#Loops|loops]]. It plays an important role in improving [[cache (computing)|cache]] performance and making effective use of [[parallel computing|parallel processing]] capabilities. Most execution time of a [[scientific computing|scientific program]] is spent on loops, so a lot of [[compiler optimization]] techniques have been developed to make them faster.
{{short description|Increasing execution speed and reducing the overheads associated with loops}}
{{about|compiler optimization for loops|the more specific technique of overhead reduction of loop nests|Loop nest optimization}}
In [[compiler theory]], '''loop optimization''' is the process of increasing execution speed and reducing the overheads associated with [[Control flow#Loops|loops]]. It plays an important role in improving [[cache (computing)|cache]] performance and making effective use of [[parallel computing|parallel processing]] capabilities. Most execution time of a [[scientific computing|scientific program]] is spent on loops; as such, many [[compiler optimization]] techniques have been developed to make them faster.


== Representation of computation and transformations ==
== Representation of computation and transformations ==


Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed.<ref name=Collard>Jean-Francois Collard, ''Reasoning About Program Transformations,'', 2003 Springer-Verlag. Discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization.</ref>
Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed.<ref name=Collard>In the book ''[https://books.google.com/books?id=8FbTZc4pa-cC&q=%22loop+optimization%22 Reasoning About Program Transformations]'', Jean-Francois Collard discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization.</ref>


== Optimization via a sequence of loop transformations ==
== Optimization via a sequence of loop transformations ==


Loop optimization can be viewed as the application of a sequence of specific '''loop transformations''' (listed below or in <ref name=UCB>David F. Bacon, [[Susan L. Graham]], and Oliver J. Sharp. ''Compiler transformations for high-performance computing.'' Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at [[CiteSeer]] [http://citeseer.ist.psu.edu/bacon93compiler.html]). Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations</ref>) to the source code or [[intermediate representation]], with each transformation having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all [[data dependency|dependencies]] if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance.
Loop optimization can be viewed as the application of a sequence of specific '''loop transformations''' (listed below or in ''Compiler transformations for high-performance computing''<ref name=UCB>David F. Bacon, [[Susan L. Graham]], and Oliver J. Sharp. ''Compiler transformations for high-performance computing.'' Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at [[CiteSeer]] [http://citeseer.ist.psu.edu/bacon93compiler.html]). Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations</ref>) to the source code or [[intermediate representation]], with each [[Program transformation|transformation]] having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all [[data dependency|dependencies]] if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance.


=== Common loop transformations ===
Common loop transformations include:
* [[loop fission|Fission]] or distribution – loop fission attempts to break a loop into multiple loops over the same index range, but each new loop takes only part of the original loop's body. This can improve [[locality of reference]], both of the data being accessed in the loop and the code in the loop's body.

* [[loop fusion|Fusion]] or combining this combines the bodies of two adjacent loops that would iterate the same number of times (whether or not that number is known at compile time), as long as they make no reference to each other's data.
* [[loop fission|fission/distribution]] : Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve [[locality of reference]], both of the data being accessed in the loop and the code in the loop's body.
* [[loop interchange|Interchange]] or permutation – these optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout.
* [[loop fusion|fusion/combining]] : Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data.
* [[loop inversion|Inversion]] this technique changes a standard ''while'' loop into a ''do/while'' (a.k.a. ''repeat/until''&thinsp;) loop wrapped in an ''if'' conditional, reducing the number of jumps by two for cases where the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a [[pipeline stall]]. Additionally, if the initial condition is known at compile-time and is known to be [[Side-effect (computer science)|side-effect]]-free, the initial ''if''-guard can be skipped.
* [[loop interchange|interchange/permutation]] : These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve [[locality of reference]], depending on the array's layout.
* [[Loop-invariant code motion]] this can vastly improve efficiency by moving a computation from inside the loop to outside of it, computing a value just once before the loop begins, if the resultant quantity of the calculation will be the same for every loop iteration (i.e., a loop-invariant quantity). This is particularly important with address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with inversion, because not all code is safe to be moved outside the loop.
* [[loop inversion|inversion]] : This technique changes a standard ''while'' loop into a ''do/while'' (a.k.a. ''repeat/until'') loop wrapped in an ''if'' conditional, reducing the number of jumps by two for cases where the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a [[pipeline stall]]. Additionally, if the initial condition is known at compile-time and is known to be [[Side-effect (computer science)|side-effect]]-free, the ''if'' guard can be skipped.
* [[Loop-level parallelism|Parallelization]] – this is a special case of [[automatic parallelization]] focusing on loops, restructuring them to run efficiently on multiprocessor systems. It can be done automatically by compilers ([[automatic parallelization]]) or manually (inserting parallel directives like [[OpenMP]]).
* [[loop-invariant code motion]] : If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with [[loop inversion]], because not all code is safe to be hoisted outside the loop.
* [[loop reversal|Reversal]] a subtle optimization that reverses the order in which values are assigned to the index variable. This can help eliminate [[dependence analysis|dependencies]] and thus enable other optimizations. Certain architectures utilize looping constructs at [[Assembly language|assembly]] level that count in a single direction only (e.g., decrement-jump-if-not-zero [DJNZ]<ref>{{Cite web|url=https://www.win.tue.nl/~aeb/comp/8051/set8051.html-orig#51djnz|title=8051 Instruction Set|website=www.win.tue.nl|access-date=2019-12-09}}</ref>).
* [[loop parallelization|parallelization]] : A special case for [[Automatic parallelization]] focusing on loops, restructuring them to run efficiently on multiprocessor systems. It can be done automatically by compilers (named [[automatic parallelization]]) or manually (inserting parallel directives like [[OpenMP]]).
* [[loop scheduling|Scheduling]] this divides a loop into multiple parts that may be run concurrently on multiple processors.
* [[loop reversal|reversal]] : Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate [[dependence analysis|dependencies]] and thus enable other optimizations. Also, certain architectures utilize looping constructs at [[Assembly language]] level that count in a single direction only (e.g. decrement-jump-if-not-zero (DJNZ)).
* [[loop skewing|Skewing]] this technique is applied to a nested loop iterating over a multidimensional array, where each iteration of the [[inner loop]] depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop.
* [[loop scheduling|scheduling]] : Loop scheduling divides a loop into multiple parts that may be run concurrently on multiple processors.
* [[Software pipelining]] a type of [[out-of-order execution]] of loop iterations to hide the latencies of processor function units.
* skewing : Loop skewing takes a nested loop iterating over a multidimensional array, where each iteration of the [[inner loop]] depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop.
* [[loop splitting|Splitting]] or peeling this attempts to simplify a loop or eliminate [[dependence analysis|dependencies]] by breaking it into multiple loops which have the same bodies but iterate over different portions of the index range. A special case is ''loop peeling'', which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
* [[software pipelining]] : A type of [[out-of-order execution]] of loop iterations to hide the latencies of processor function units.
* [[loop tiling|Tiling]] or blocking reorganizes a loop to iterate over blocks of data sized to fit in the cache.
* [[loop splitting|splitting/peeling]] : Loop splitting attempts to simplify a loop or eliminate [[dependence analysis|dependencies]] by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is ''loop peeling'', which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
* [[Automatic vectorization|Vectorization]] attempts to run as many of the loop iterations as possible at the same time on a [[SIMD]] system.
* [[loop tiling|tiling/blocking]] : Loop tiling reorganizes a loop to iterate over blocks of data sized to fit in the cache.
* [[loop unrolling|Unrolling]] – duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches and increased program load time), but requires that the number of iterations be known at compile time (except in the case of [[Just-in-time compilation]]). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop.
* [[Vectorization (parallel computing)|vectorization]] : Vectorization attempts to run as many of the loop iterations as possible at the same time on a multiple-processor system.
* [[loop unswitching|Unswitching]] moves a conditional from inside a loop to outside of it by duplicating the loop's body, and placing a version of it inside each of the ''if'' and ''else'' clauses of the conditional.
* [[loop unrolling|unrolling]]: Duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches & increased program load time), but requires that the number of iterations be known at compile time (except in the case of [[Just-in-time compilation|JIT]] compilers). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop.
* [[loop sectioning|Sectioning]] or strip-mining – introduced for [[vector processor]]s, loop-sectioning is a loop-transformation technique for enabling [[SIMD]] (single instruction, multiple data)-encodings of loops and improving memory performance. This involves each vector operation being done for a size less-than or equal-to the maximum vector length on a given vector machine.<ref>{{Cite web|url=http://software.intel.com/en-us/articles/strip-mining-to-optimize-memory-use-on-32-bit-intel-architecture/|title=Intel Developer Zone}}</ref><ref>{{Cite web|url=http://download.oracle.com/docs/cd/E19205-01/819-5262/aeugr/index.html|title=7.6.3.1 Strip-Mining (Sun Studio 12: Fortran Programming Guide)}}</ref>
* [[loop unswitching|unswitching]] : Unswitching moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional.

==== Other loop optimizations ====

* [[loop sectioning|sectioning]] : First introduced for [[vectorizer]]s, ''loop-sectioning'' (also known as ''strip-mining'') is a loop-transformation technique for enabling SIMD-encodings of loops and improving memory performance. This technique involves each vector operation being done for a size less than or equal to the maximum vector length on a given vector machine. [http://software.intel.com/en-us/articles/strip-mining-to-optimize-memory-use-on-32-bit-intel-architecture/] [http://download.oracle.com/docs/cd/E19205-01/819-5262/aeugr/index.html]


=== The unimodular transformation framework ===
=== The unimodular transformation framework ===


The '''unimodular transformation''' approach <ref name=Muchnick>Steven S. Muchnick, ''Advanced Compiler Design and Implementation,'' 1997 Morgan-Kauffman. Section 20.4.2 discusses loop optimization.</ref> uses a single [[unimodular matrix]] to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within ''n'' loops as a set of integer points in an ''n''-dimensional space, with the points being executed in [[lexicographical order]]. For example, the executions of a statement nested inside an outer loop with index ''i'' and an inner loop with index ''j'' can be associated with the pairs of integers ''(i, j)''. The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix <math>\left[\begin{array}{cc}0&1\\1&0\end{array}\right]</math>.
The unimodular transformation approach<ref name=Muchnick>Steven S. Muchnick, ''[https://books.google.com/books?id=Pq7pHwG1_OkC&q=%22unimodular+transformation%22 Advanced Compiler Design and Implementation],'' 1997 Morgan Kaufmann. Section 20.4.2 discusses loop optimization.</ref> uses a single [[unimodular matrix]] to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within ''n'' loops as a set of integer points in an ''n''-dimensional space, with the points being executed in [[lexicographical order]]. For example, the executions of a statement nested inside an outer loop with index ''i'' and an inner loop with index ''j'' can be associated with the pairs of integers {{tmath|(i, j)}}. The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix <math>\begin{bmatrix}0&1\\1&0\end{bmatrix}</math>.


A unimodular transformation is legal if it preserves the temporal sequence of all [[data dependency|dependencies]]; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.
A unimodular transformation is legal if it preserves the temporal sequence of all [[data dependency|dependencies]]; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.
Line 39: Line 37:
=== The polyhedral or constraint-based framework ===
=== The polyhedral or constraint-based framework ===


The [[polytope model|polyhedral model]] <ref name=AK>R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan and Kaufman, 2002.</ref> handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements. [[Affine transformation]]s are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a '''constraint-based''' approach to loop optimization. For example, a single statement within an outer loop 'for i := 0 to n' and an inner loop 'for j := 0 to i+2' is executed once for each ''(i, j)'' pair such that ''0 <= i <= n and 0 <= j <= i+2''.
The [[polytope model|polyhedral model]]<ref name=AK>R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002.</ref> handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements. [[Affine transformation]]s are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a '''constraint-based''' approach to loop optimization. For example, a single statement within an outer loop '{{mono|1=for i := 0 to n}}' and an inner loop '{{mono|1=for j := 0 to i+2}}' is executed once for each {{mono|(i, j)}} pair such that {{mono|1=0 <= i <= n and 0 <= j <= i+2}}.


Once again, a transformation is legal if it preserves the temporal sequence of all [[data dependency|dependencies]]. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing (2010).
Once again, a transformation is legal if it preserves the temporal sequence of all [[data dependency|dependencies]]. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing (2010).


==See also==

== See also ==
*[[Loop nest optimization]]
*[[Loop nest optimization]]
*[[Polytope model]]
*[[Polytope model]]
Line 54: Line 51:


{{Compiler optimizations}}
{{Compiler optimizations}}

[[Category:Compiler optimizations]]
[[Category:Compiler optimizations]]

Latest revision as of 16:39, 6 April 2024

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

Representation of computation and transformations[edit]

Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed.[1]

Optimization via a sequence of loop transformations[edit]

Loop optimization can be viewed as the application of a sequence of specific loop transformations (listed below or in Compiler transformations for high-performance computing[2]) to the source code or intermediate representation, with each transformation having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all dependencies if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance.

Common loop transformations include:

  • Fission or distribution – loop fission attempts to break a loop into multiple loops over the same index range, but each new loop takes only part of the original loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body.
  • Fusion or combining – this combines the bodies of two adjacent loops that would iterate the same number of times (whether or not that number is known at compile time), as long as they make no reference to each other's data.
  • Interchange or permutation – these optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout.
  • Inversion – this technique changes a standard while loop into a do/while (a.k.a. repeat/until ) loop wrapped in an if conditional, reducing the number of jumps by two for cases where the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the initial if-guard can be skipped.
  • Loop-invariant code motion – this can vastly improve efficiency by moving a computation from inside the loop to outside of it, computing a value just once before the loop begins, if the resultant quantity of the calculation will be the same for every loop iteration (i.e., a loop-invariant quantity). This is particularly important with address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with inversion, because not all code is safe to be moved outside the loop.
  • Parallelization – this is a special case of automatic parallelization focusing on loops, restructuring them to run efficiently on multiprocessor systems. It can be done automatically by compilers (automatic parallelization) or manually (inserting parallel directives like OpenMP).
  • Reversal – a subtle optimization that reverses the order in which values are assigned to the index variable. This can help eliminate dependencies and thus enable other optimizations. Certain architectures utilize looping constructs at assembly level that count in a single direction only (e.g., decrement-jump-if-not-zero [DJNZ][3]).
  • Scheduling – this divides a loop into multiple parts that may be run concurrently on multiple processors.
  • Skewing – this technique is applied to a nested loop iterating over a multidimensional array, where each iteration of the inner loop depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop.
  • Software pipelining – a type of out-of-order execution of loop iterations to hide the latencies of processor function units.
  • Splitting or peeling – this attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different portions of the index range. A special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
  • Tiling or blocking – reorganizes a loop to iterate over blocks of data sized to fit in the cache.
  • Vectorization – attempts to run as many of the loop iterations as possible at the same time on a SIMD system.
  • Unrolling – duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches and increased program load time), but requires that the number of iterations be known at compile time (except in the case of Just-in-time compilation). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop.
  • Unswitching – moves a conditional from inside a loop to outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional.
  • Sectioning or strip-mining – introduced for vector processors, loop-sectioning is a loop-transformation technique for enabling SIMD (single instruction, multiple data)-encodings of loops and improving memory performance. This involves each vector operation being done for a size less-than or equal-to the maximum vector length on a given vector machine.[4][5]

The unimodular transformation framework[edit]

The unimodular transformation approach[6] uses a single unimodular matrix to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within n loops as a set of integer points in an n-dimensional space, with the points being executed in lexicographical order. For example, the executions of a statement nested inside an outer loop with index i and an inner loop with index j can be associated with the pairs of integers . The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix .

A unimodular transformation is legal if it preserves the temporal sequence of all dependencies; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.

The polyhedral or constraint-based framework[edit]

The polyhedral model[7] handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements. Affine transformations are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a constraint-based approach to loop optimization. For example, a single statement within an outer loop 'for i := 0 to n' and an inner loop 'for j := 0 to i+2' is executed once for each (i, j) pair such that 0 <= i <= n and 0 <= j <= i+2.

Once again, a transformation is legal if it preserves the temporal sequence of all dependencies. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing (2010).

See also[edit]

References[edit]

  1. ^ In the book Reasoning About Program Transformations, Jean-Francois Collard discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization.
  2. ^ David F. Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler transformations for high-performance computing. Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at CiteSeer [1]). Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations
  3. ^ "8051 Instruction Set". www.win.tue.nl. Retrieved 2019-12-09.
  4. ^ "Intel Developer Zone".
  5. ^ "7.6.3.1 Strip-Mining (Sun Studio 12: Fortran Programming Guide)".
  6. ^ Steven S. Muchnick, Advanced Compiler Design and Implementation, 1997 Morgan Kaufmann. Section 20.4.2 discusses loop optimization.
  7. ^ R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002.