[go: nahoru, domu]

Bug 59121 - [4.9/5/6 Regression] endless loop with -O2 -floop-parallelize-all
Summary: [4.9/5/6 Regression] endless loop with -O2 -floop-parallelize-all
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.1
: P4 normal
Target Milestone: 4.9.4
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks: graphite 57732
  Show dependency treegraph
 
Reported: 2013-11-13 21:22 UTC by janus
Modified: 2015-10-09 14:59 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.5.3, 4.6.2, 4.7.2
Known to fail: 4.8.1, 4.9.0
Last reconfirmed: 2013-11-13 00:00:00


Attachments
sampling of f951 (26.10 KB, text/plain)
2013-11-13 22:07 UTC, Dominique d'Humieres
Details

Note You need to log in before you can comment on or make changes to this bug.
Description janus 2013-11-13 21:22:31 UTC
Consider the following simple Fortran code:


subroutine smooth1

  implicit none

  integer :: j, k, l, nbin1(3)
  real, dimension(55,48,49) :: atemp, acc
  common /ACC_INPUT/ nbin1

  do j=1,nbin1(1)
    do k=1,nbin1(2)
      do l=1,nbin1(3)
        acc(j,k,l) = atemp(j,k,l)
      end do
    end do
  end do

end


Compiling this with "gfortran -c -O2 -floop-parallelize-all" apparently never finishes.



$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.8.1-10ubuntu8' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
Comment 1 Dominique d'Humieres 2013-11-13 22:07:03 UTC
Created attachment 31211 [details]
sampling of f951

I am attaching a sampling of f951 in case someone can figure out what is going on.
Comment 2 Dominique d'Humieres 2013-11-13 22:07:22 UTC
Confirmed.
Comment 3 janus 2013-11-14 10:09:24 UTC
The problem also occurs with current trunk.


$ gfortran-4.9 -v
Using built-in specs.
COLLECT_GCC=gfortran-4.9
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /home/jweil/gcc49/trunk/configure --program-suffix=-4.9 --enable-checking=release --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --enable-languages=c,c++,fortran --disable-bootstrap --disable-multilib
Thread model: posix
gcc version 4.9.0 20131114 (experimental) [trunk revision 204778] (GCC)
Comment 4 Dominique d'Humieres 2013-11-14 10:25:25 UTC
> The problem also occurs with current trunk.

Indeed!
Comment 5 janus 2013-11-14 21:04:02 UTC
Seems to work with 4.5, though.


> gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.5/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.5 --enable-ssp --disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.5 --enable-linux-futex --without-system-libunwind --enable-gold --with-plugin-ld=/usr/bin/gold --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.5.3 20110428 [gcc-4_5-branch revision 173117] (SUSE Linux)
Comment 6 janus 2013-11-14 21:10:30 UTC
(In reply to janus from comment #5)
> Seems to work with 4.5, though.

... same with 4.6.


> gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.6/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.6 --enable-ssp --disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.6 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.6.2 (SUSE Linux)
Comment 7 janus 2013-11-14 21:19:38 UTC
(In reply to janus from comment #6)
> > Seems to work with 4.5, though.
> 
> ... same with 4.6.

... and also with 4.7.

Conclusion: Seems to be a regression in 4.8 and above.


> gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.7/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.7 --enable-ssp --disable-libssp --disable-libitm --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --program-suffix=-4.7 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.7.2 20130108 [gcc-4_7-branch revision 195012] (SUSE Linux)
Comment 8 Richard Biener 2013-11-15 09:22:37 UTC
-floop-parallelize-all ends up using GRAPHITE dependence analysis which is
what never finishes (is stuck in ISL).  I'd say avoid it at all cost (as
the rest of GRAPHITE).

Confirmed.
Comment 9 janus 2013-11-17 12:27:30 UTC
(In reply to Richard Biener from comment #8)
> -floop-parallelize-all ends up using GRAPHITE dependence analysis which is
> what never finishes (is stuck in ISL).  I'd say avoid it at all cost (as
> the rest of GRAPHITE).

If you're saying this is a case of WONTFIX and -floop-parallelize-all should not be used at all, maybe one should document the flag in the manual as 'experimental' (or 'broken' or whatever) ... ?
Comment 10 Jeffrey A. Law 2014-01-16 21:52:40 UTC
IMHO, it's high time we move graphite to an unsupported state and closed all related PRs as WONTFIX.
Comment 11 Richard Biener 2014-01-17 12:32:18 UTC
(In reply to Jeffrey A. Law from comment #10)
> IMHO, it's high time we move graphite to an unsupported state and closed all
> related PRs as WONTFIX.

As per the last Cauldron BOF we should rather move graphite to a supported state
and fix all related PRs.  (just created a meta-bug, please link other GRAPHITE
bugs you find properly)

The discussion ended up as if we kill GRAPHITE and invent sth new then we'll
end up inventing another GRPAHITE which is a waste of resources.
Comment 12 Jeffrey A. Law 2014-01-17 17:16:42 UTC
The problem Richard is nobody is maintaining the code.  What makes this any different than a port which has become unmaintained and thus isn't being fixed in a timely manner?  I'm not in a position to own the code and unless someone steps in to own it/maintain it, I'll formally call for its removal after 4.9 is released.  You called the code out as unmaintained two years ago.  I called it out again in the 2013 Cauldron.  At some point we have to face reality and take appropriate action.

In the mean time I want to see all the graphite stuff move to a P4/P5 priority.  There's no way we should have graphite on the critical path for a release if it's not being maintained.  Creating a meta-bug at that time would be reasonable as well.


We wouldn't be throwing away the code, so if someone wanted to rebuild a LNO, they could always pull the code out of an old release/branch and use that to jumpstart the process.  Again, it's no different than a port  that has become unmaintained -- someone can always pick it up from an old release/branch and use it to jumpstart development of the port again.
Comment 13 Mircea Namolaru 2014-02-03 15:23:06 UTC
(In reply to Jeffrey A. Law from comment #12)
> The problem Richard is nobody is maintaining the code.  What makes this any
> different than a port which has become unmaintained and thus isn't being
> fixed in a timely manner?  I'm not in a position to own the code and unless
> someone steps in to own it/maintain it, I'll formally call for its removal
> after 4.9 is released.  You called the code out as unmaintained two years
> ago.  I called it out again in the 2013 Cauldron.  At some point we have to
> face reality and take appropriate action.
> 

I just joined INRIA - and at least for the next year I will maintain the Graphite code. Started to look at the P1 issue: 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58028
Comment 14 Mircea Namolaru 2014-03-10 16:35:15 UTC
Confirmed.

Start looking at it. This test also enters in an endless loop with the 
options -fgraphite-identiy -floop-nest-optimize -O2 -c.
Comment 15 Jeffrey A. Law 2014-03-10 16:41:02 UTC
Mircea, thanks.  I'm definitely looking forward to seeing Graphite in a better state!  With you on board at INRIA and working on Graphite, I will not be calling for Graphite's removal after the 4.9 release.

Thanks again,
jeff
Comment 16 Tobias Grosser 2014-03-10 22:18:55 UTC
(In reply to Jeffrey A. Law from comment #15)
> Mircea, thanks.  I'm definitely looking forward to seeing Graphite in a
> better state!  With you on board at INRIA and working on Graphite, I will
> not be calling for Graphite's removal after the 4.9 release.
> 
> Thanks again,
> jeff

This bug is probably the perfect case to use isl's new compute out facility, which allows us to give up after a certain number of compute iterations.
This will give us the assurance that we do not hang any more in the graphite dependence analysis, but that we instead bail out smoothly.

We should probably also work on the efficiency of the analysis (and there is a lot that could be done), but in my opinion the number one priority is that we avoid unpredictable compile time. After this issue is fixed, we can address cases where we bail out as missed optimizations. Something, that is a lot less intrusive.
Comment 17 Mircea Namolaru 2014-03-25 17:03:00 UTC
Yes, data dependencies computation is expensive in the polyehdral model
and it could take considerable time - but it is worrying that in too many
cases fails to provide (after a few hours left running, when I stop it) an answer for very simple programs. I will ckeck with the isl people if this is the expected behaviour of the isl_union_map_compute_flow (this is the function where the data dependency computation is stuck) and how (and if) they could help us.

Many Fortran programs with loops having no-constants bounds and n-dimensional arrays (n>=3) may be affected by this problem and may work only for small
dimensions of the arrays. The problem touches especially Fortran, that uses
linearized accesses to multi-dimensional arrays - this creates patterns 
leading to this problem (in this example we have an array acc of dimension 55,48,49 and the array access acc(j,k,l) is transformed to acc(j + 55*k + 2640*l).

I've checked the constraints passed to isl_union_map_compute - see that wrapping is perfromed. But wrapping requires modulo operation, expressed by constraints with existential quantifier that may be harder to solve. By disabling the wrapping, some simple examples that before were stuck in data dependency computation finish immediately. In what measure is wrapping necessary ? - as a side-effect it may increase compilation time (that may be already considerable).
Comment 18 Tobias Grosser 2014-03-25 17:27:51 UTC
(In reply to Mircea Namolaru from comment #17)
> Yes, data dependencies computation is expensive in the polyehdral model
> and it could take considerable time - but it is worrying that in too many
> cases fails to provide (after a few hours left running, when I stop it) an
> answer for very simple programs. I will ckeck with the isl people if this is
> the expected behaviour of the isl_union_map_compute_flow (this is the
> function where the data dependency computation is stuck) and how (and if)
> they could help us.

I would like to state that data-dependence calculation is not inherently more complex in the polyhedral mode. In fact, any simple data-dependence test that
exists could be implemented in our polyhedral checker to speed up the common case if really necessary. We did not do this, as for such simple cases the dependence analysis is normally very fast. In the general case, dependence solving is inherently complex (equivalent to ILP programming -> NP-complete), and we can
not do better anyway.

Though for the most cases of slowdown, we could probably do better. For this it is
necessary to understand why a certain case is slow. Hence, it is important to look into the dependence problem we are trying to solve and see why we can not solve it quickly. The problem may be inherently complex, but there may also be
a simple fix.

> Many Fortran programs with loops having no-constants bounds and
> n-dimensional arrays (n>=3) may be affected by this problem and may work
> only for small
> dimensions of the arrays. 
> The problem touches especially Fortran, that uses
> linearized accesses to multi-dimensional arrays - this creates patterns 
> leading to this problem (in this example we have an array acc of dimension
> 55,48,49 and the array access acc(j,k,l) is transformed to acc(j + 55*k +
> 2640*l).

So what exactly makes the problem difficult? Without understanding this, it is very difficult to predict in which situations the dependence analysis will be slow.

> I've checked the constraints passed to isl_union_map_compute

You mean isl_union_map_compute_flow()

> - see that
> wrapping is perfromed. But wrapping requires modulo operation, expressed by
> constraints with existential quantifier that may be harder to solve.

This is an interesting observation. Instead of modeling the modulo wrapping,
we could assume it does not exist and just add a run-time check around
the kernel that ensures that we only execute kernels where we know this is the case.

> By
> disabling the wrapping, some simple examples that before were stuck in data
> dependency computation finish immediately.

Nice.

> In what measure is wrapping
> necessary ? - as a side-effect it may increase compilation time (that may be
> already considerable).

In some cases additions/multiplications have defined the semantics in case of
overflow as the computation being performed modulo the size of the type. This
case rarely happens, but in case we want to be correct, we can not just ignore it.

Cheers,
Tobias
Comment 19 Mircea Namolaru 2014-04-10 22:01:34 UTC
The problem for many of these simple cases is with Graphite formulation of memory accesses constraints. For Fortran, or C (if arrays are declared as pointers), a memory access is not constrained enough (basically it is expressed as a function of a single induction variable). This may increase dramatically the number of the constraint solutions. The computation time for them could become prohibitive as well. But even worse, even if the computation finishes, part of the solutions found are false dependencies restricting possible legal transformations.

The solution is to constrain more the memory access - as it is done for C arrays
(similar code in C as this Fortran example don't create any problem).
A possible solution is to express the access in terms of basic induction variable as in the original source code - maybe by modifying the front end.
But other solutions are possible. For sure not a work for GCC 4.9.
Comment 20 Richard Biener 2014-04-11 10:45:08 UTC
Another issue for Fortran arrays is that when using array descriptors all accesses will have symbolical (non-INTEGER_CST) stride.  And graphite doesn't
support parameters in CHREC_RIGHT.
Comment 21 Richard Biener 2014-05-22 09:04:22 UTC
GCC 4.8.3 is being released, adjusting target milestone.
Comment 22 Jakub Jelinek 2014-12-19 13:33:29 UTC
GCC 4.8.4 has been released.
Comment 23 Richard Biener 2015-06-23 08:16:03 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 24 Jakub Jelinek 2015-06-26 19:57:40 UTC
GCC 4.9.3 has been released.
Comment 25 Sebastian Pop 2015-10-09 14:58:43 UTC
(In reply to Richard Biener from comment #20)
> Another issue for Fortran arrays is that when using array descriptors all
> accesses will have symbolical (non-INTEGER_CST) stride.  And graphite doesn't
> support parameters in CHREC_RIGHT.

Correct.  PR66981 keeps track of this issue: by delinearizing array access functions we would avoid parameters in CHREC_RIGHT.
Comment 26 Sebastian Pop 2015-10-09 14:59:59 UTC
(In reply to Tobias Grosser from comment #16)
> This bug is probably the perfect case to use isl's new compute out facility,
> which allows us to give up after a certain number of compute iterations.
> This will give us the assurance that we do not hang any more in the graphite
> dependence analysis, but that we instead bail out smoothly.
> 

We implemented the ISL compute out: fixed in trunk when using a recent enough ISL-0.15 or so.