103100 – [11 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align

Bug 103100 - [11 Regression] unaligned access generated with memset or {} and -O2 -mstrict-align

Summary: [11 Regression] unaligned access generated with memset or {} and -O2 -mstrict...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	11.2.0

Importance:	P2 normal
Target Milestone:	12.5
Assignee:	Wilco

URL:
Keywords:	wrong-code

Duplicates (2):	105886 109273 (view as bug list)
Depends on:
Blocks:	105886
	Show dependency tree / graph

Reported:	2021-11-05 12:52 UTC by felix
Modified:	2024-07-05 13:07 UTC (History)
CC List:	6 users (show)

See Also:	101934 111121
Host:
Target:	aarch64
Build:
Known to work:	14.0
Known to fail:	13.3.1
Last reconfirmed:	2021-11-05 00:00:00

Attachments
source code that generates the faulty assembly (97 bytes, text/plain) 2021-11-05 12:52 UTC, felix	Details
New simplier patch which does the right thing too (911 bytes, patch) 2021-11-18 08:50 UTC, Andrew Pinski	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description felix 2021-11-05 12:52:19 UTC

Created attachment 51738 [details]
source code that generates the faulty assembly

when zero-intializing large local variables, gcc 11.2 (with -O2 and -O3) uses SIMD registers to store a pair of 16-byte registers at once into memory. When doing so, gcc can generate code that does not access memory on a 16-byte aligned boundary, even though the aarch64 architecture requires memory accesses to be 16-byte aligned when using the full 16-byte SIMD registers. This happens with -mstrict-align enabled.

For example:

static void (*use)(unsigned char*); // to suppress optimizations

extern "C" void _start() {
    unsigned char t2[216]={};
    use(t2);
}

when compiled with "gcc -save-temps -O2 -mstrict-align" generates the following assembly:
_start:
	stp	x29, x30, [sp, #-240]!// assuming sp is aligned to 16-bytes here 
	mov	x1, #0x0
	movi	v0.4s, #0x0
	add	x2, sp, #0x28 // the value in x2 is 8-byte aligned, but not 
                                 16-byte aligned
	mov	x29, sp
	stp	xzr, xzr, [sp, #24]
	add	x0, sp, #0x18
	stp	q0, q0, [x2] // x2 is not 16-byte aligned, so the store is not
                                aligned
	add	x2, sp, #0x48
	str	xzr, [sp, #232]
	stp	q0, q0, [x2]
	add	x2, sp, #0x68
	stp	q0, q0, [x2]
	add	x2, sp, #0x88
	stp	q0, q0, [x2]
	add	x2, sp, #0xa8
	stp	q0, q0, [x2]
	add	x2, sp, #0xc8
	stp	q0, q0, [x2]
	blr	x1
	ldp	x29, x30, [sp], #240
	ret

I have seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727 and even though that is marked as fixed, this issue persists in gcc 11.2

Comment 1 Andrew Pinski 2021-11-05 12:58:08 UTC

Dup of bug 101934, fixed for gcc 11.3.0.

*** This bug has been marked as a duplicate of bug 101934 ***

Comment 2 Andrew Pinski 2021-11-05 13:17:59 UTC

Actually it is not a dup.

Comment 3 Andrew Pinski 2021-11-05 13:21:34 UTC

Confirmed, another simplier testcase:
void use(unsigned char*);

void g() {
    unsigned char t2[216];
    __builtin_memset(t2, 0, sizeof(t2));
    use(t2);
}

Comment 4 Andrew Pinski 2021-11-05 13:26:46 UTC

Mine.
This should fix it (untested):
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..5d0872be4ef 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23584,7 +23584,9 @@ aarch64_expand_setmem (rtx *operands)
         over writing.  */
       opt_scalar_int_mode mode_iter;
       FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
-       if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
+       if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)
+           && (!STRICT_ALIGNMENT
+               || MEM_ALIGN (dst) >= GET_MODE_ALIGNMENT (mode_iter.require ())))
          cur_mode = mode_iter.require ();

       gcc_assert (cur_mode != BLKmode);

Comment 5 Andrew Pinski 2021-11-05 13:27:24 UTC

Also it was caused by r11-4973.

Comment 6 Andrew Pinski 2021-11-05 13:38:19 UTC

(In reply to Andrew Pinski from comment #4)
> Mine.
> This should fix it (untested):
And yes it works.

Comment 7 Andrew Pinski 2021-11-06 05:26:36 UTC

Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583554.html

Comment 8 Andrew Pinski 2021-11-18 08:50:12 UTC

Created attachment 51830 [details]
New simplier patch which does the right thing too

Comment 9 Andrew Pinski 2021-11-19 01:55:11 UTC

New patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584927.html

Comment 10 Andrew Pinski 2022-01-26 02:30:17 UTC

Updated patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html

Comment 11 Richard Biener 2022-04-21 07:50:44 UTC

GCC 11.3 is being released, retargeting bugs to GCC 11.4.

Comment 12 Andrew Pinski 2022-06-09 10:28:59 UTC

*** Bug 105886 has been marked as a duplicate of this bug. ***

Comment 13 Richard Biener 2023-01-19 13:28:47 UTC

(In reply to Andrew Pinski from comment #10)
> Updated patch submitted:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html

I think you need to ping your patches more aggressively ...

Comment 14 Richard Earnshaw 2023-01-20 13:59:58 UTC

(In reply to Richard Biener from comment #13)
> (In reply to Andrew Pinski from comment #10)
> > Updated patch submitted:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html
> 
> I think you need to ping your patches more aggressively ...

Richard Sandiford reviewed it here:| https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589581.html
So the problem is that the review wasn't followed up by the submitter.

Comment 15 felix 2023-01-23 07:29:12 UTC

(In reply to Richard Earnshaw from comment #14)
> (In reply to Richard Biener from comment #13)
> > (In reply to Andrew Pinski from comment #10)
> > > Updated patch submitted:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html
> > 
> > I think you need to ping your patches more aggressively ...
> 
> Richard Sandiford reviewed it here:|
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589581.html
> So the problem is that the review wasn't followed up by the submitter.

I did not know that I have any further obligation on this past submitting the bug, I never submitted a bug before. Anyway, the since the patch works (at least for my use case), do I have to mark this as resolved, fixed?

Comment 16 Sam James 2023-01-23 07:37:04 UTC

(In reply to felix from comment #15)

He means apinski who submitted a patch, not you.

Comment 17 Andrew Pinski 2023-01-31 17:49:10 UTC

Another testcase which is now affecting us at Marvell in our early firmware:
```
void f(const char*);

void g(void)
{
  char t[32] = "0l2345678";
  f(t);
}

```
So I am planning on getting back on this patch starting today.

Comment 18 Andrew Pinski 2023-01-31 17:51:23 UTC

(In reply to Andrew Pinski from comment #17)
> Another testcase which is now affecting us at Marvell in our early firmware:
> ```
> void f(const char*);
> 
> void g(void)
> {
>   char t[32] = "0l2345678";
>   f(t);
> }
> 
> ```
> So I am planning on getting back on this patch starting today.

I should say that testcase happens at `-Os -mstrict-align`, at `-O2 -mstrict-align` it works.

Comment 19 Richard Earnshaw 2023-01-31 18:24:42 UTC

(In reply to Andrew Pinski from comment #18)
> I should say that testcase happens at `-Os -mstrict-align`, at `-O2
> -mstrict-align` it works.

Because for -Os we don't forcibly align arrays - see 
AARCH64_EXPAND_ALIGNMENT and the macros that use it.

Comment 20 Andrew Pinski 2023-02-10 03:54:10 UTC

Submitted a version new patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611684.html

Comment 21 Andrew Pinski 2023-03-24 12:09:04 UTC

*** Bug 109273 has been marked as a duplicate of this bug. ***

Comment 22 Andrew Pinski 2023-04-05 19:25:37 UTC

I am no longer working on this.

Comment 23 Jakub Jelinek 2023-05-29 10:06:02 UTC

GCC 11.4 is being released, retargeting bugs to GCC 11.5.

Comment 24 Wilco 2023-09-20 13:53:35 UTC

Patch to avoid emitting unaligned LDP/STP with -mstrict-align: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631022.html

Comment 25 GCC Commits 2023-11-30 13:49:32 UTC

The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

https://gcc.gnu.org/g:318f5232cfb3e0c9694889565e1f5424d0354463

commit r14-6012-g318f5232cfb3e0c9694889565e1f5424d0354463
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date:   Wed Oct 25 16:28:04 2023 +0100

    AArch64: Fix strict-align cpymem/setmem [PR103100]
    
    The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
    Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
    Clean up the condition when to use MOPS.
    
    gcc/ChangeLog/
            PR target/103100
            * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
            (setmemdi): Likewise.
            * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
            strict-align.  Cleanup condition for using MOPS.
            (aarch64_expand_setmem): Likewise.

Comment 26 Hans-Peter Nilsson 2024-06-26 23:46:02 UTC

(In reply to GCC Commits from comment #25)
> The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

Are you considering back-porting this to other branches (like gcc-13) or is there another reason this PR is still open?  The commit applies cleanly there, but is untested.   Asking for a friend. :)

Comment 27 GCC Commits 2024-06-27 17:23:42 UTC

The releases/gcc-13 branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

https://gcc.gnu.org/g:5aa9ed0f353f835005c3df8932c7bc6e26f53904

commit r13-8874-g5aa9ed0f353f835005c3df8932c7bc6e26f53904
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date:   Wed Oct 25 16:28:04 2023 +0100

    AArch64: Fix strict-align cpymem/setmem [PR103100]
    
    The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
    Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
    Clean up the condition when to use MOPS.
    
    gcc/ChangeLog/
            PR target/103100
            * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
            (setmemdi): Likewise.
            * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
            strict-align.  Cleanup condition for using MOPS.
            (aarch64_expand_setmem): Likewise.
    
    (cherry picked from commit 318f5232cfb3e0c9694889565e1f5424d0354463)

Comment 28 Hans-Peter Nilsson 2024-06-27 23:13:33 UTC

(In reply to GCC Commits from comment #27)
> The releases/gcc-13 branch has been updated by Wilco Dijkstra
> <wilco@gcc.gnu.org>:

Thanks!

Comment 29 GCC Commits 2024-07-05 12:40:02 UTC

The releases/gcc-12 branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

https://gcc.gnu.org/g:b9d16d8361a9e3a82a2f21e759e760d235d43322

commit r12-10603-gb9d16d8361a9e3a82a2f21e759e760d235d43322
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date:   Wed Oct 25 16:28:04 2023 +0100

    AArch64: Fix strict-align cpymem/setmem [PR103100]
    
    The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
    Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
    Clean up the condition when to use MOPS.
    
    gcc/ChangeLog/
            PR target/103100
            * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
            (setmemdi): Likewise.
            * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
            strict-align.  Cleanup condition for using MOPS.
            (aarch64_expand_setmem): Likewise.
    
    (cherry picked from commit 318f5232cfb3e0c9694889565e1f5424d0354463)

Comment 30 Wilco 2024-07-05 13:02:55 UTC

Fixed on GCC12 branch too. It doesn't apply to GCC11, so it's unlikely to be worth fixing since GCC11 branch will be closed soon.