28637 – Rust characters will be encoded using DW_ATE_UTF

Bug 28637 - Rust characters will be encoded using DW_ATE_UTF

Summary: Rust characters will be encoded using DW_ATE_UTF

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	rust (show other bugs)
Version:	11.1

Importance:	P2 normal
Target Milestone:	11.2
Assignee:	Tom Tromey

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-11-29 18:09 UTC by Tom Tromey
Modified:	2021-11-29 20:46 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Tromey 2021-11-29 18:09:29 UTC

The rust compiler is going to start emitting
the char type using DW_ATE_UTF.
See https://github.com/rust-lang/rust/pull/89887
This PR tracks this for the 11.x branch so that
we can backport the patch.

Comment 1 Sourceware Commits 2021-11-29 20:25:57 UTC

The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1c0e43634cfdd0ad7ef9ac3dd7d208dddeb80f5e

commit 1c0e43634cfdd0ad7ef9ac3dd7d208dddeb80f5e
Author: Tom Tromey <tom@tromey.com>
Date:   Sun Oct 31 10:34:50 2021 -0600

    Allow DW_ATE_UTF for Rust characters
    
    The Rust compiler plans to change the encoding of a Rust 'char' type
    to use DW_ATE_UTF.  You can see the discussion here:
    
        https://github.com/rust-lang/rust/pull/89887
    
    However, this fails in gdb.  I looked into this, and it turns out that
    the handling of DW_ATE_UTF is currently fairly specific to C++.  In
    particular, the code here assumes the C++ type names, and it creates
    an integer type.
    
    This comes from commit 53e710acd ("GDB thinks char16_t and char32_t
    are signed in C++").  The message says:
    
        Both places need fixing.  But since I couldn't tell why dwarf2read.c
        needs to create a new type, I've made it use the per-arch built-in
        types instead, so that the types are only created once per arch
        instead of once per objfile.  That seems to work fine.
    
    ... which is fine, but it seems to me that it's also correct to make a
    new character type; and this approach is better because it preserves
    the type name as well.  This does use more memory, but first we
    shouldn't be too concerned about the memory use of types coming from
    debuginfo; and second, if we are, we should implement type interning
    anyway.
    
    Changing this code to use a character type revealed a couple of
    oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
    these as well.
    
    I filed PR rust/28637 for this issue, so that this patch can be
    backported to the gdb 11 branch.

Comment 2 Sourceware Commits 2021-11-29 20:45:00 UTC

The gdb-11-branch branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=29b161c9be240da341910f0206ffdd881daacd96

commit 29b161c9be240da341910f0206ffdd881daacd96
Author: Tom Tromey <tom@tromey.com>
Date:   Sun Oct 31 10:34:50 2021 -0600

    Allow DW_ATE_UTF for Rust characters
    
    The Rust compiler plans to change the encoding of a Rust 'char' type
    to use DW_ATE_UTF.  You can see the discussion here:
    
        https://github.com/rust-lang/rust/pull/89887
    
    However, this fails in gdb.  I looked into this, and it turns out that
    the handling of DW_ATE_UTF is currently fairly specific to C++.  In
    particular, the code here assumes the C++ type names, and it creates
    an integer type.
    
    This comes from commit 53e710acd ("GDB thinks char16_t and char32_t
    are signed in C++").  The message says:
    
        Both places need fixing.  But since I couldn't tell why dwarf2read.c
        needs to create a new type, I've made it use the per-arch built-in
        types instead, so that the types are only created once per arch
        instead of once per objfile.  That seems to work fine.
    
    ... which is fine, but it seems to me that it's also correct to make a
    new character type; and this approach is better because it preserves
    the type name as well.  This does use more memory, but first we
    shouldn't be too concerned about the memory use of types coming from
    debuginfo; and second, if we are, we should implement type interning
    anyway.
    
    Changing this code to use a character type revealed a couple of
    oddities in the C/C++ handling of TYPE_CODE_CHAR.  This patch fixes
    these as well.
    
    I filed PR rust/28637 for this issue, so that this patch can be
    backported to the gdb 11 branch.
    
    (cherry picked from commit 1c0e43634cfdd0ad7ef9ac3dd7d208dddeb80f5e)

Comment 3 Tom Tromey 2021-11-29 20:46:18 UTC

Fixed now.