Created attachment 51738 [details] source code that generates the faulty assembly when zero-intializing large local variables, gcc 11.2 (with -O2 and -O3) uses SIMD registers to store a pair of 16-byte registers at once into memory. When doing so, gcc can generate code that does not access memory on a 16-byte aligned boundary, even though the aarch64 architecture requires memory accesses to be 16-byte aligned when using the full 16-byte SIMD registers. This happens with -mstrict-align enabled. For example: static void (*use)(unsigned char*); // to suppress optimizations extern "C" void _start() { unsigned char t2[216]={}; use(t2); } when compiled with "gcc -save-temps -O2 -mstrict-align" generates the following assembly: _start: stp x29, x30, [sp, #-240]!// assuming sp is aligned to 16-bytes here mov x1, #0x0 movi v0.4s, #0x0 add x2, sp, #0x28 // the value in x2 is 8-byte aligned, but not 16-byte aligned mov x29, sp stp xzr, xzr, [sp, #24] add x0, sp, #0x18 stp q0, q0, [x2] // x2 is not 16-byte aligned, so the store is not aligned add x2, sp, #0x48 str xzr, [sp, #232] stp q0, q0, [x2] add x2, sp, #0x68 stp q0, q0, [x2] add x2, sp, #0x88 stp q0, q0, [x2] add x2, sp, #0xa8 stp q0, q0, [x2] add x2, sp, #0xc8 stp q0, q0, [x2] blr x1 ldp x29, x30, [sp], #240 ret I have seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727 and even though that is marked as fixed, this issue persists in gcc 11.2
Dup of bug 101934, fixed for gcc 11.3.0. *** This bug has been marked as a duplicate of bug 101934 ***
Actually it is not a dup.
Confirmed, another simplier testcase: void use(unsigned char*); void g() { unsigned char t2[216]; __builtin_memset(t2, 0, sizeof(t2)); use(t2); }
Mine. This should fix it (untested): diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 699c105a42a..5d0872be4ef 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -23584,7 +23584,9 @@ aarch64_expand_setmem (rtx *operands) over writing. */ opt_scalar_int_mode mode_iter; FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) - if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)) + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit) + && (!STRICT_ALIGNMENT + || MEM_ALIGN (dst) >= GET_MODE_ALIGNMENT (mode_iter.require ()))) cur_mode = mode_iter.require (); gcc_assert (cur_mode != BLKmode);
Also it was caused by r11-4973.
(In reply to Andrew Pinski from comment #4) > Mine. > This should fix it (untested): And yes it works.
Patch submitted: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583554.html
Created attachment 51830 [details] New simplier patch which does the right thing too
New patch submitted: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584927.html
Updated patch submitted: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
*** Bug 105886 has been marked as a duplicate of this bug. ***
(In reply to Andrew Pinski from comment #10) > Updated patch submitted: > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html I think you need to ping your patches more aggressively ...
(In reply to Richard Biener from comment #13) > (In reply to Andrew Pinski from comment #10) > > Updated patch submitted: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html > > I think you need to ping your patches more aggressively ... Richard Sandiford reviewed it here:| https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589581.html So the problem is that the review wasn't followed up by the submitter.
(In reply to Richard Earnshaw from comment #14) > (In reply to Richard Biener from comment #13) > > (In reply to Andrew Pinski from comment #10) > > > Updated patch submitted: > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589254.html > > > > I think you need to ping your patches more aggressively ... > > Richard Sandiford reviewed it here:| > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589581.html > So the problem is that the review wasn't followed up by the submitter. I did not know that I have any further obligation on this past submitting the bug, I never submitted a bug before. Anyway, the since the patch works (at least for my use case), do I have to mark this as resolved, fixed?
(In reply to felix from comment #15) He means apinski who submitted a patch, not you.
Another testcase which is now affecting us at Marvell in our early firmware: ``` void f(const char*); void g(void) { char t[32] = "0l2345678"; f(t); } ``` So I am planning on getting back on this patch starting today.
(In reply to Andrew Pinski from comment #17) > Another testcase which is now affecting us at Marvell in our early firmware: > ``` > void f(const char*); > > void g(void) > { > char t[32] = "0l2345678"; > f(t); > } > > ``` > So I am planning on getting back on this patch starting today. I should say that testcase happens at `-Os -mstrict-align`, at `-O2 -mstrict-align` it works.
(In reply to Andrew Pinski from comment #18) > I should say that testcase happens at `-Os -mstrict-align`, at `-O2 > -mstrict-align` it works. Because for -Os we don't forcibly align arrays - see AARCH64_EXPAND_ALIGNMENT and the macros that use it.
Submitted a version new patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611684.html
*** Bug 109273 has been marked as a duplicate of this bug. ***
I am no longer working on this.
GCC 11.4 is being released, retargeting bugs to GCC 11.5.
Patch to avoid emitting unaligned LDP/STP with -mstrict-align: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631022.html
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>: https://gcc.gnu.org/g:318f5232cfb3e0c9694889565e1f5424d0354463 commit r14-6012-g318f5232cfb3e0c9694889565e1f5424d0354463 Author: Wilco Dijkstra <wilco.dijkstra@arm.com> Date: Wed Oct 25 16:28:04 2023 +0100 AArch64: Fix strict-align cpymem/setmem [PR103100] The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS. gcc/ChangeLog/ PR target/103100 * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition. (setmemdi): Likewise. * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support strict-align. Cleanup condition for using MOPS. (aarch64_expand_setmem): Likewise.
(In reply to GCC Commits from comment #25) > The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>: Are you considering back-porting this to other branches (like gcc-13) or is there another reason this PR is still open? The commit applies cleanly there, but is untested. Asking for a friend. :)
The releases/gcc-13 branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>: https://gcc.gnu.org/g:5aa9ed0f353f835005c3df8932c7bc6e26f53904 commit r13-8874-g5aa9ed0f353f835005c3df8932c7bc6e26f53904 Author: Wilco Dijkstra <wilco.dijkstra@arm.com> Date: Wed Oct 25 16:28:04 2023 +0100 AArch64: Fix strict-align cpymem/setmem [PR103100] The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS. gcc/ChangeLog/ PR target/103100 * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition. (setmemdi): Likewise. * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support strict-align. Cleanup condition for using MOPS. (aarch64_expand_setmem): Likewise. (cherry picked from commit 318f5232cfb3e0c9694889565e1f5424d0354463)
(In reply to GCC Commits from comment #27) > The releases/gcc-13 branch has been updated by Wilco Dijkstra > <wilco@gcc.gnu.org>: Thanks!
The releases/gcc-12 branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>: https://gcc.gnu.org/g:b9d16d8361a9e3a82a2f21e759e760d235d43322 commit r12-10603-gb9d16d8361a9e3a82a2f21e759e760d235d43322 Author: Wilco Dijkstra <wilco.dijkstra@arm.com> Date: Wed Oct 25 16:28:04 2023 +0100 AArch64: Fix strict-align cpymem/setmem [PR103100] The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS. gcc/ChangeLog/ PR target/103100 * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition. (setmemdi): Likewise. * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support strict-align. Cleanup condition for using MOPS. (aarch64_expand_setmem): Likewise. (cherry picked from commit 318f5232cfb3e0c9694889565e1f5424d0354463)
Fixed on GCC12 branch too. It doesn't apply to GCC11, so it's unlikely to be worth fixing since GCC11 branch will be closed soon.