[go: nahoru, domu]

Jump to content

Union type: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Pascal: Wording changes
→‎Anonymous union: added return for int main
 
(40 intermediate revisions by 29 users not shown)
Line 1: Line 1:
{{Short description|Variable able to hold different data types}}
{{distinguish|Union (set theory)|Union (SQL)}}
{{distinguish|Union (set theory)|Union (SQL)}}
{{refimprove|date=August 2009}}
{{more citations needed|date=August 2009}}
In [[computer science]], a '''union''' is a [[value (computer science)|value]] that may have any of several representations or formats within the same position in [[computer memory|memory]]; that consists of a [[variable (computer science)|variable]] that may hold such a [[data structure]]. Some [[programming languages]] support special [[data type]]s, called '''union types''', to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a [[record (computer science)|record]] (or structure), which could be defined to contain a float ''and'' an integer; in a union, there is only one value at any given time.
In [[computer science]], a '''union''' is a [[value (computer science)|value]] that may have any of several representations or formats within the same position in [[computer memory|memory]]; that consists of a [[variable (computer science)|variable]] that may hold such a [[data structure]]. Some [[programming languages]] support special [[data type]]s, called '''union types''', to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a [[record (computer science)|record]] (or structure), which could be defined to contain both a float ''and'' an integer; in a union, there is only one value at any given time.


A union can be pictured as a chunk of memory that is used to store variables of different data types. Once a new value is assigned to a field, the existing data is overwritten with the new data. The memory area storing the value has no intrinsic type (other than just [[byte]]s or [[word (computer architecture)|words]] of memory), but the value can be treated as one of several [[abstract data type]]s, having the type of the value that was last written to the memory area.
A union can be pictured as a chunk of memory that is used to store variables of different data types. Once a new value is assigned to a field, the existing data is overwritten with the new data. The memory area storing the value has no intrinsic type (other than just [[byte]]s or [[word (computer architecture)|words]] of memory), but the value can be treated as one of several [[abstract data type]]s, having the type of the value that was last written to the memory area.
Line 19: Line 20:
===ALGOL 68===
===ALGOL 68===


[[ALGOL 68]] has tagged unions, and uses a case clause to distinguish and extract the constituent type at runtime. A union containing another union is treated as the set of all its constituent possibilities.
[[ALGOL 68]] has tagged unions, and uses a case clause to distinguish and extract the constituent type at runtime. A union containing another union is treated as the set of all its constituent possibilities, and if the context requires it a union is automatically coerced into the wider union. A union can explicitly contain no value, which can be distinguished at runtime. An example is:


'''mode''' '''node''' = '''union''' ('''real''', '''int''', '''string''', '''void''');
The syntax of the C/C++ union type and the notion of casts was derived from ALGOL 68, though in an untagged form.<ref name="sigplan">{{cite journal | first = Dennis M.| last = Ritchie | author-link = Dennis Ritchie | title = The Development of the C Language | date = March 1993 | journal = ACM SIGPLAN Notices | volume = 28 | issue = 3 | pages = 201–208 | url = http://www.bell-labs.com/usr/dmr/www/chist.html | doi = 10.1145/155360.155580 | quote = The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of. The central notion I captured from Algol was a type structure based on atomic types (including structures), composed into arrays, pointers (references), and functions (procedures). Algol 68's concept of unions and casts also had an influence that appeared later.}}</ref>
'''node''' n := "abc";
'''case''' n '''in'''
('''real''' r): print(("real:", r)),
('''int''' i): print(("int:", i)),
('''string''' s): print(("string:", s)),
('''void'''): print(("void:", "EMPTY")),
'''out''' print(("?:", n))
'''esac'''

The syntax of the C/C++ union type and the notion of casts was derived from ALGOL 68, though in an untagged form.<ref name="sigplan">{{cite journal | first = Dennis M.| last = Ritchie | author-link = Dennis Ritchie | title = The Development of the C Language | date = March 1993 | journal = ACM SIGPLAN Notices | volume = 28 | issue = 3 | pages = 201–208 | url = http://www.bell-labs.com/usr/dmr/www/chist.html | doi = 10.1145/155360.155580 | quote = The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of. The central notion I captured from Algol was a type structure based on atomic types (including structures), composed into arrays, pointers (references), and functions (procedures). Algol 68's concept of unions and casts also had an influence that appeared later.| doi-access = free }}</ref>


===C/C++===
===C/C++===
In [[C (programming language)|C]] and [[C++]], untagged unions are expressed nearly exactly like structures ([[Struct (C programming language)|struct]]s), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. C++ (since [[C++11]]) also allows for a data member to be any type that has a full-fledged constructor/destructor and/or copy constructor, or a non-trivial copy assignment operator. For example, it is possible to have the standard C++ [[String (C++)|string]] as a member of a union.
In [[C (programming language)|C]] and [[C++]], untagged unions are expressed nearly exactly like structures ([[Struct (C programming language)|struct]]s), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. C++ (since [[C++11]]) also allows for a data member to be any type that has a full-fledged constructor/destructor and/or copy constructor, or a non-trivial copy assignment operator. For example, it is possible to have the standard C++ [[String (C++)|string]] as a member of a union.

Like a structure, all of the members of a union are by default public. The keywords <code>private</code>, <code>public</code>, and <code>protected</code> may be used inside a structure or a union in exactly the same way they are used inside a class for defining private, public, and protected member access.


The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, bitfield and word sharing, or [[type punning]]. Unions can also provide low-level [[polymorphism (computer science)|polymorphism]]. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.
The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, bitfield and word sharing, or [[type punning]]. Unions can also provide low-level [[polymorphism (computer science)|polymorphism]]. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.


One common C programming idiom uses unions to perform what C++ calls a '''reinterpret_cast''', by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. A practical example is the [[Methods of computing square roots#Approximations that depend on the floating point representation|method of computing square roots using the IEEE representation]]. This is not, however, a safe use of unions in general.
One common C programming idiom uses unions to perform what C++ calls a <code>[[reinterpret_cast]]</code>, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. A practical example is the [[Methods of computing square roots#Approximations that depend on the floating point representation|method of computing square roots using the IEEE representation]]. This is not, however, a safe use of unions in general.


{{Quote|Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union [[Object (computer science)|object]] at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.|ANSI/ISO 9899:1990 (the ANSI C standard) Section 6.5.2.1}}
{{Blockquote|Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union [[Object (computer science)|object]] at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.|ANSI/ISO 9899:1990 (the ANSI C standard) Section 6.5.2.1}}


====Anonymous union====
====Anonymous union====
Line 44: Line 55:


int main() {
int main() {
using namespace std;

union {
union {
float f;
float f;
Line 52: Line 61:


f = 3.14f;
f = 3.14f;
cout << "Binary representation of 3.14 = " << hex << d << endl;
std::cout << "Hexadecimal representation of 3.14f:"
<< std::hex << d << '\n';

return 0;
return 0;
}
}
</syntaxhighlight>
</syntaxhighlight>

Anonymous unions are also useful in C <code>struct</code> definitions to provide a sense of namespacing.<ref>{{cite web |last1=Siebenmann. |first1=Chris |title=CUnionsForNamespaces |url=https://utcc.utoronto.ca/~cks/space/blog/programming/CUnionsForNamespaces |website=utcc.utoronto.ca}}</ref>


==== Transparent union ====
==== Transparent union ====
In Unix-like compilers such as GCC, Clang, and IBM XL C for AIX, a {{code|transparent_union}} attribute is available for union types. Types contained in the union can be converted transparently to the union type itself in a function call, provided that all types have the same size. It is mainly intended for function with multiple parameter interfaces, a use necessitated by early Unix extensions and later re-standarisation.<ref>{{cite web |title=Common Type Attributes: transparent_union |url=https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute |website=Using the GNU Compiler Collection (GCC)}}</ref>
In compilers such as GCC, Clang, and IBM XL C for AIX, a {{code|transparent_union}} attribute is available for union types. Types contained in the union can be converted transparently to the union type itself in a function call, provided that all types have the same size. It is mainly intended for function with multiple parameter interfaces, a use necessitated by early Unix extensions and later re-standarisation.<ref>{{cite web |title=Common Type Attributes: transparent_union |url=https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute |website=Using the GNU Compiler Collection (GCC)}}</ref>


===COBOL===
===COBOL===
In [[COBOL]], union data items are defined in two ways. The first uses the <tt>RENAMES</tt> (66 level) keyword, which effectively maps a second alphanumeric data item on top of the same memory location as a preceding data item. In the example code below, data item <tt>PERSON-REC</tt> is defined as a group containing another group and a numeric data item. <tt>PERSON-DATA</tt> is defined as an alphanumeric data item that renames <tt>PERSON-REC</tt>, treating the data bytes continued within it as character data.
In [[COBOL]], union data items are defined in two ways. The first uses the {{mono|RENAMES}} (66 level) keyword, which effectively maps a second alphanumeric data item on top of the same memory location as a preceding data item. In the example code below, data item {{mono|PERSON-REC}} is defined as a group containing another group and a numeric data item. {{mono|PERSON-DATA}} is defined as an alphanumeric data item that renames {{mono|PERSON-REC}}, treating the data bytes continued within it as character data.


<syntaxhighlight lang="cobol">
<syntaxhighlight lang="cobol">
Line 74: Line 85:
01 PERSON-DATA RENAMES PERSON-REC.
01 PERSON-DATA RENAMES PERSON-REC.
</syntaxhighlight>
</syntaxhighlight>
The second way to define a union type is by using the <tt>REDEFINES</tt> keyword. In the example code below, data item <tt>VERS-NUM</tt> is defined as a 2-byte binary integer containing a version number. A second data item <tt>VERS-BYTES</tt> is defined as a two-character alphanumeric variable. Since the second item is ''redefined'' over the first item, the two items share the same address in memory, and therefore share the same underlying data bytes. The first item interprets the two data bytes as a binary value, while the second item interprets the bytes as character values.
The second way to define a union type is by using the {{mono|REDEFINES}} keyword. In the example code below, data item {{mono|VERS-NUM}} is defined as a 2-byte binary integer containing a version number. A second data item {{mono|VERS-BYTES}} is defined as a two-character alphanumeric variable. Since the second item is ''redefined'' over the first item, the two items share the same address in memory, and therefore share the same underlying data bytes. The first item interprets the two data bytes as a binary value, while the second item interprets the bytes as character values.


<syntaxhighlight lang="cobol">
<syntaxhighlight lang="cobol">
Line 86: Line 97:
In [[Pascal (programming language)|Pascal]], there are two ways to create unions. One is the standard way through a variant record. The second is a nonstandard means of declaring a variable as absolute, meaning it is placed at the same memory location as another variable or at an absolute address. While all Pascal compilers support variant records, only some support absolute variables.
In [[Pascal (programming language)|Pascal]], there are two ways to create unions. One is the standard way through a variant record. The second is a nonstandard means of declaring a variable as absolute, meaning it is placed at the same memory location as another variable or at an absolute address. While all Pascal compilers support variant records, only some support absolute variables.


For the purposes of this example, the following are all integer types: a '''byte''' is 8-bits, a '''word''' is 16-bits, and an '''integer''' is 32-bits.
For the purposes of this example, the following are all integer types: a '''byte''' consists of 8 bits, a '''word''' is 16 bits, and an '''integer''' is 32 bits.


The following example shows the non-standard absolute form:
The following example shows the non-standard absolute form:
<syntaxhighlight lang="pascal">
<syntaxhighlight lang="pascal">
var
VAR
A: Integer;
A: Integer;
B: Array[1..4] of Byte absolute A;
B: array[1..4] of Byte absolute A;
C: Integer absolute 0;
C: Integer absolute 0;
</syntaxhighlight>
</syntaxhighlight>
Line 99: Line 110:
In the following example, a record has variants, some of which share the same location as others:
In the following example, a record has variants, some of which share the same location as others:
<syntaxhighlight lang="pascal">
<syntaxhighlight lang="pascal">
type
TYPE
Shape = (Circle, Square, Triangle);
TSystemTime = record
Dimensions = record
Year, Month, DayOfWeek, Day : word;
Hour, Minute, Second, MilliSecond: word;
case Figure: Shape of
Circle: (Diameter: real);
end ;
Square: (Width: real);
TGender = (Male, Female, TransFemale, TransMale, Other);
Triangle: (Side: real; Angle1, Angle2: 0..360)
TPerson = RECORD
FirstName,Lastname: String;
end;
Birthdate: TSystemTime;
Dependents: Integer;
HourlyRate: Currency;
Case Gender:TGender of
Female,
TransMale: (isPregnant: Boolean;
DateDue:TSystemTime);
Male, TransFemale:
(HasPartner,
isPartnerExpecting:Boolean;
PartnerDate: TSystemTime);
END;
</syntaxhighlight>
</syntaxhighlight>
In the above example, a Tperson record has the tag field {{mono|Gender}}, and the tag divides people among two classes: female or trans male (a person with a gender identity of male, but was born with a female body), and male or transfemale (a person with a gender identity of female, but born in a male body). In this record, {{mono|hasPartner}} and {{mono|isPregnant}} occupy the same location, while {{mono|DateDue}} and {{mono|isPartnerExpecting}} share the same location. While the record has a tag field {{mono|Gender}}, the compiler does not enforce access according to the tag's value: one may access any of the variant fields notwithstanding the value of the tag, e.g., if the gender {{mono|other}} is the value of the tag field {{mono|Gender}}, any of the variant fields may still be accessed.


===PL/I===
===PL/I===
In [[PL/I]] then original term for a union was ''cell'',<ref>{{cite book|last1=IBM Corporation|title=IBM System/360 PL/I Language Specifications|date=March 1968|page=52|url=http://bitsavers.org/pdf/ibm/360/pli/Y33-6003-0_PL1LangSpecMar68.pdf|access-date=Jan 22, 2018}}</ref> which is still accepted as a synonym for union by several compilers. The union declaration is similar to the structure definition, where elements at the same level within the union declaration occupy the same storage. Elements of the union can be any data type, including structures and array.<ref name=IBMPLI>{{cite book|last1=IBM Corporation|title=Enterprise PL/I for z/OS PL/I for AIX IBM Developer for z Systems PL/I for windows Language Reference|date=Dec 2017|url=http://publibz.boulder.ibm.com/epubs/pdf/c2789401.pdf|access-date=Jan 22, 2018}}</ref>{{rp|pp192–193}} Here
In [[PL/I]] the original term for a union was ''cell'',<ref>{{cite book|last1=IBM Corporation|title=IBM System/360 PL/I Language Specifications|date=March 1968|page=52|url=http://bitsavers.org/pdf/ibm/360/pli/Y33-6003-0_PL1LangSpecMar68.pdf|access-date=Jan 22, 2018}}</ref> which is still accepted as a synonym for union by several compilers. The union declaration is similar to the structure definition, where elements at the same level within the union declaration occupy the same storage. Elements of the union can be any data type, including structures and array.<ref name=IBMPLI>{{cite book|last1=IBM Corporation|title=Enterprise PL/I for z/OS PL/I for AIX IBM Developer for z Systems PL/I for windows Language Reference|date=Dec 2017|url=http://publibz.boulder.ibm.com/epubs/pdf/c2789401.pdf|access-date=Jan 22, 2018}}</ref>{{rp|pp192–193}} Here
vers_num and vers_bytes occupy the same storage locations.
vers_num and vers_bytes occupy the same storage locations.


Line 133: Line 131:


An alternative to a union declaration is the DEFINED attribute, which allows alternative declarations of storage, however the data types of the base and defined variables must match.<ref name=IBMPLI />{{rp|pp.289–293}}
An alternative to a union declaration is the DEFINED attribute, which allows alternative declarations of storage, however the data types of the base and defined variables must match.<ref name=IBMPLI />{{rp|pp.289–293}}

===Rust===

[[Rust (programming language)|Rust]] implements both tagged and untagged unions. In Rust, tagged unions are implemented using the {{code|enum|rust}} keyword. Unlike [[enumerated type]]s in most other languages, enum variants in Rust can contain additional data in the form of a tuple or struct, making them tagged unions rather than simple enumerated types.<ref>{{Cite web |title=How Rust Implements Tagged Unions - Pat Shaughnessy |url=https://patshaughnessy.net/2018/3/15/how-rust-implements-tagged-unions |access-date=2023-04-25 |website=patshaughnessy.net}}</ref>

Rust also supports untagged unions using the {{code|union|rust}} keyword. The memory layout of unions in Rust is undefined by default,<ref>{{Cite web |title=Union types - The Rust Reference |url=https://doc.rust-lang.org/reference/types/union.html |access-date=2023-04-25 |website=doc.rust-lang.org}}</ref> but a union with the {{code|#[repr(C)]|rust}} attribute will be laid out in memory exactly like the equivalent union in C.<ref>{{Cite web |title=Type layout - The Rust Reference |url=https://doc.rust-lang.org/reference/type-layout.html#reprc-unions |access-date=2023-04-25 |website=doc.rust-lang.org}}</ref> Reading the fields of a union can only be done within an {{code|unsafe|rust}} function or block, as the compiler cannot guarantee that the data in the union will be valid for the type of the field; if this is not the case, it will result in [[undefined behavior]].<ref>{{Cite web |title=Unions - The Rust Reference |url=https://doc.rust-lang.org/reference/items/unions.html |access-date=2023-04-25 |website=doc.rust-lang.org}}</ref>


==Syntax and example==
==Syntax and example==
Line 181: Line 185:


===PHP===
===PHP===
Union types were introduced in PHP 8.0.<ref>{{cite web |last1=Karunaratne |first1=Ayesh |title=PHP 8.0: Union Types |url=https://php.watch/versions/8.0/union-types |website=PHP.Watch |access-date=30 November 2020 |language=en}}</ref>
Union types were introduced in PHP 8.0.<ref>{{cite web |last1=Karunaratne |first1=Ayesh |title=PHP 8.0: Union Types |url=https://php.watch/versions/8.0/union-types |website=PHP.Watch |access-date=30 November 2020 |language=en}}</ref> The values are implicitly "tagged" with a type by the language, and may be retrieved by "gettype()".
<syntaxhighlight lang="php">
<syntaxhighlight lang="php">
class Example
class Example
Line 189: Line 193:
public function squareAndAdd(float|int $bar): int|float
public function squareAndAdd(float|int $bar): int|float
{
{
return $bar ** 2 + $foo;
return $bar ** 2 + $this->foo;
}
}
}
}
</syntaxhighlight>

===Python===
Support for typing was introduced in Python 3.5.<ref>{{cite web |title=typing — Support for type hints — Python 3.9.7 documentation |url=https://docs.python.org/3/library/typing.html#typing.Union |website=docs.python.org |access-date=8 September 2021}}</ref> The new syntax for union types were introduced in Python 3.10.<ref>{{cite web |title=PEP 604 -- Allow writing union types as X {{!}} Y |url=https://www.python.org/dev/peps/pep-0604/ |website=Python.org |access-date=8 September 2021 |language=en}}</ref>
<syntaxhighlight lang="python">
class Example:
foo = 0

def square_and_add(self, bar: int | float) -> int | float:
return bar ** 2 + self.foo
</syntaxhighlight>
</syntaxhighlight>


===TypeScript===
===TypeScript===
Union types are supported in TypeScript.<ref>{{cite web |title=Handbook - Unions and Intersection Types |url=https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html |website=www.typescriptlang.org |access-date=30 November 2020 |language=en}}</ref>
Union types are supported in TypeScript.<ref>{{cite web |title=Handbook - Unions and Intersection Types |url=https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html |website=www.typescriptlang.org |access-date=30 November 2020 |language=en}}</ref> The values are implicitly "tagged" with a type by the language, and may be retrieved using a <code>typeof</code> call for primitive values and an <code>instanceof</code> comparison for complex data types. Types with overlapping usage (e.g. a slice method exists on both strings and arrays, the plus operator works on both strings and numbers) don't need additional narrowing to use these features.
<syntaxhighlight lang="typescript">
<syntaxhighlight lang="typescript">
function successor(n: number | bigint): number | bigint {
function successor(n: number | bigint): number | bigint {
// types that support the same operations don't need narrowing
return ++n
return ++n;
}

function dependsOnParameter(v: string | Array<string> | number) {
// distinct types need narrowing
if (v instanceof Array) {
// do something
} else if (typeof(v) === "string") {
// do something else
} else {
// has to be a number
}
}
}
</syntaxhighlight>
</syntaxhighlight>


===Rust===
== Difference between union and structure ==
Tagged unions in Rust use the {{code|enum|rust}} keyword, and can contain tuple and struct variants:
A union is a class all of whose data members are mapped to the same address within its object. The size of an object of a union is, therefore, the size of its largest data member.


<syntaxhighlight lang="rust">
In a structure, all of its data members are stored in contiguous memory locations. The size of an object of a struct is, therefore, the size of the sum of all its data members.
enum Foo {
Bar(i32),
Baz { x: String, y: i32 },
}
</syntaxhighlight>


Untagged unions in Rust use the {{code|union|rust}} keyword:
This gain in space efficiency, while valuable in certain circumstances, comes at a great cost of safety: the program logic must ensure that it only reads the field most recently written along all possible execution paths. The exception is when unions are used for [[type conversion]]: in this case, a certain field is written and the subsequently read field is deliberately different.


<syntaxhighlight lang="rust">
As an example illustrating this point, the declaration
union Foo {
<syntaxhighlight lang="c">
bar: i32,
struct foo { int a; float b; }
baz: bool,
}
</syntaxhighlight>
</syntaxhighlight>
defines a data object with two members occupying consecutive memory locations:
┌─────┬─────┐
foo │ a │ b │
└─────┴─────┘
↑ ↑
Memory address: 0150 0154


Reading from the fields of an untagged union results in [[undefined behavior]] if the data in the union is not valid as the type of the field, and thus requires an {{code|unsafe|rust}} block:
In contrast, the declaration
<syntaxhighlight lang="c">
union bar { int a; float b; }
</syntaxhighlight>
defines a data object with two members occupying the same memory location:
┌─────┐
bar │ a │
│ b │
└─────┘
Memory address: 0150


<syntaxhighlight lang="rust">
Structures are used where an "object" is composed of other objects, like a point object consisting of two integers, those being the x and y coordinates:
let x = Foo { bar: 10 };
<syntaxhighlight lang="c">
let y = unsafe { x.bar }; // This will set y to 10, and does not result in undefined behavior.
typedef struct {
int x; // x and y are separate
let z = unsafe { x.baz }; // This results in undefined behavior, as the value stored in x is not a valid bool.
int y;
} tPoint;
</syntaxhighlight>
Unions are typically used in situation where an object can be one of many things but only one at a time, such as a type-less storage system:
<syntaxhighlight lang="c">
typedef enum { STR, INT } tType;
typedef struct {
tType typ; // typ is separate.
union {
int ival; // ival and sval occupy same memory.
char *sval;
};
} tVal;
</syntaxhighlight>
</syntaxhighlight>


==See also==
==See also==
* [[Tagged union]]
* [[Tagged union]]
* [[Set operations (SQL)#UNION operator|UNION operator]]
* [[Variant type]]


==References==
==References==
Line 263: Line 265:
==External links==
==External links==
* [http://boost.org/doc/html/variant.html boost::variant], a type-safe alternative to C++ unions
* [http://boost.org/doc/html/variant.html boost::variant], a type-safe alternative to C++ unions
* [https://docs.microsoft.com/en-us/dotnet/framework/interop/marshaling-classes-structures-and-unions MSDN: Classes,Structures & Unions], for examples and syntax
* [https://docs.microsoft.com/en-us/dotnet/framework/interop/marshaling-classes-structures-and-unions MSDN: Classes, Structures & Unions], for examples and syntax
* [https://stackoverflow.com/a/346541 differences], differences between union & structure
* [https://stackoverflow.com/a/346541 differences], differences between union & structure
* [http://bobobobo.wordpress.com/2008/01/25/c-difference-between-struct-and-union/ Difference between struct and union in C++]
* [http://bobobobo.wordpress.com/2008/01/25/c-difference-between-struct-and-union/ Difference between struct and union in C++]
Line 273: Line 275:
[[Category:Composite data types]]
[[Category:Composite data types]]
[[Category:C (programming language)]]
[[Category:C (programming language)]]
[[Category:Articles with example C code]]


[[de:Verbund (Datentyp)#Unions]]
[[de:Verbund (Datentyp)#Unions]]

Latest revision as of 11:17, 29 April 2024

In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a record (or structure), which could be defined to contain both a float and an integer; in a union, there is only one value at any given time.

A union can be pictured as a chunk of memory that is used to store variables of different data types. Once a new value is assigned to a field, the existing data is overwritten with the new data. The memory area storing the value has no intrinsic type (other than just bytes or words of memory), but the value can be treated as one of several abstract data types, having the type of the value that was last written to the memory area.

In type theory, a union has a sum type; this corresponds to disjoint union in mathematics.

Depending on the language and type, a union value may be used in some operations, such as assignment and comparison for equality, without knowing its specific type. Other operations may require that knowledge, either by some external information, or by the use of a tagged union.

Untagged unions[edit]

Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in a type-unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store a data type tag.

The name "union" stems from the type's formal definition. If a type is considered as the set of all values that that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one field of the union can take on a single common value, it is impossible to tell from the value alone which field was last written.

However, one useful programming function of unions is to map smaller data elements to larger ones for easier manipulation. A data structure consisting, for example, of 4 bytes and a 32-bit integer, can form a union with an unsigned 64-bit integer, and thus be more readily accessed for purposes of comparison etc.

Unions in various programming languages[edit]

ALGOL 68[edit]

ALGOL 68 has tagged unions, and uses a case clause to distinguish and extract the constituent type at runtime. A union containing another union is treated as the set of all its constituent possibilities, and if the context requires it a union is automatically coerced into the wider union. A union can explicitly contain no value, which can be distinguished at runtime. An example is:

 mode node = union (real, int, string, void);
 
 node n := "abc";
 
 case n in
   (real r):   print(("real:", r)),
   (int i):    print(("int:", i)),
   (string s): print(("string:", s)),
   (void):     print(("void:", "EMPTY")),
   out         print(("?:", n))
 esac

The syntax of the C/C++ union type and the notion of casts was derived from ALGOL 68, though in an untagged form.[1]

C/C++[edit]

In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. C++ (since C++11) also allows for a data member to be any type that has a full-fledged constructor/destructor and/or copy constructor, or a non-trivial copy assignment operator. For example, it is possible to have the standard C++ string as a member of a union.

The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, bitfield and word sharing, or type punning. Unions can also provide low-level polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.

One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. A practical example is the method of computing square roots using the IEEE representation. This is not, however, a safe use of unions in general.

Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

— ANSI/ISO 9899:1990 (the ANSI C standard) Section 6.5.2.1

Anonymous union[edit]

In C++, C11, and as a non-standard extension in many compilers, unions can also be anonymous. Their data members do not need to be referenced, are instead accessed directly. They have some restrictions as opposed to traditional unions: in C11, they must be a member of another structure or union,[2] and in C++, they can not have methods or access specifiers.

Simply omitting the class-name portion of the syntax does not make a union an anonymous union. For a union to qualify as an anonymous union, the declaration must not declare an object. Example:

#include <iostream>
#include <cstdint>

int main() {
   union {
      float f;
      uint32_t d; // Assumes float is 32 bits wide
   };

   f = 3.14f;
   std::cout << "Hexadecimal representation of 3.14f:" 
             << std::hex << d << '\n';
return 0;
}

Anonymous unions are also useful in C struct definitions to provide a sense of namespacing.[3]

Transparent union[edit]

In compilers such as GCC, Clang, and IBM XL C for AIX, a transparent_union attribute is available for union types. Types contained in the union can be converted transparently to the union type itself in a function call, provided that all types have the same size. It is mainly intended for function with multiple parameter interfaces, a use necessitated by early Unix extensions and later re-standarisation.[4]

COBOL[edit]

In COBOL, union data items are defined in two ways. The first uses the RENAMES (66 level) keyword, which effectively maps a second alphanumeric data item on top of the same memory location as a preceding data item. In the example code below, data item PERSON-REC is defined as a group containing another group and a numeric data item. PERSON-DATA is defined as an alphanumeric data item that renames PERSON-REC, treating the data bytes continued within it as character data.

  01  PERSON-REC.
      05  PERSON-NAME.
          10  PERSON-NAME-LAST    PIC X(12).
          10  PERSON-NAME-FIRST   PIC X(16).
          10  PERSON-NAME-MID     PIC X.
      05  PERSON-ID               PIC 9(9) PACKED-DECIMAL.
  
  01  PERSON-DATA                 RENAMES PERSON-REC.

The second way to define a union type is by using the REDEFINES keyword. In the example code below, data item VERS-NUM is defined as a 2-byte binary integer containing a version number. A second data item VERS-BYTES is defined as a two-character alphanumeric variable. Since the second item is redefined over the first item, the two items share the same address in memory, and therefore share the same underlying data bytes. The first item interprets the two data bytes as a binary value, while the second item interprets the bytes as character values.

  01  VERS-INFO.
      05  VERS-NUM        PIC S9(4) COMP.
      05  VERS-BYTES      PIC X(2)
                          REDEFINES VERS-NUM

Pascal[edit]

In Pascal, there are two ways to create unions. One is the standard way through a variant record. The second is a nonstandard means of declaring a variable as absolute, meaning it is placed at the same memory location as another variable or at an absolute address. While all Pascal compilers support variant records, only some support absolute variables.

For the purposes of this example, the following are all integer types: a byte consists of 8 bits, a word is 16 bits, and an integer is 32 bits.

The following example shows the non-standard absolute form:

var
    A: Integer;
    B: array[1..4] of Byte absolute A;
    C: Integer absolute 0;

In the first example, each of the elements of the array B maps to one of the specific bytes of the variable A. In the second example, the variable C is assigned to the exact machine address 0.

In the following example, a record has variants, some of which share the same location as others:

type
     Shape = (Circle, Square, Triangle);
     Dimensions = record
        case Figure: Shape of 
           Circle: (Diameter: real);
           Square: (Width: real);
           Triangle: (Side: real; Angle1, Angle2: 0..360)
        end;

PL/I[edit]

In PL/I the original term for a union was cell,[5] which is still accepted as a synonym for union by several compilers. The union declaration is similar to the structure definition, where elements at the same level within the union declaration occupy the same storage. Elements of the union can be any data type, including structures and array.[6]: pp192–193  Here vers_num and vers_bytes occupy the same storage locations.

  1  vers_info         union,
     5 vers_num        fixed binary,
     5 vers_bytes      pic '(2)A';

An alternative to a union declaration is the DEFINED attribute, which allows alternative declarations of storage, however the data types of the base and defined variables must match.[6]: pp.289–293 

Rust[edit]

Rust implements both tagged and untagged unions. In Rust, tagged unions are implemented using the enum keyword. Unlike enumerated types in most other languages, enum variants in Rust can contain additional data in the form of a tuple or struct, making them tagged unions rather than simple enumerated types.[7]

Rust also supports untagged unions using the union keyword. The memory layout of unions in Rust is undefined by default,[8] but a union with the #[repr(C)] attribute will be laid out in memory exactly like the equivalent union in C.[9] Reading the fields of a union can only be done within an unsafe function or block, as the compiler cannot guarantee that the data in the union will be valid for the type of the field; if this is not the case, it will result in undefined behavior.[10]

Syntax and example[edit]

C/C++[edit]

In C and C++, the syntax is:

union <name>
{
    <datatype>  <1st variable name>;
    <datatype>  <2nd variable name>;
    .
    .
    .
    <datatype>  <nth variable name>;
} <union variable name>;

A structure can also be a member of a union, as the following example shows:

union name1
{
    struct name2
    {  
        int     a;
        float   b;
        char    c;
    } svar;
    int     d;
} uvar;

This example defines a variable uvar as a union (tagged as name1), which contains two members, a structure (tagged as name2) named svar (which in turn contains three members), and an integer variable named d.

Unions may occur within structures and arrays, and vice versa:

struct
{  
    int flags;
    char *name;
    int utype;
    union {
        int ival;
        float fval;
        char *sval;
    } u;
} symtab[NSYM];

The number ival is referred to as symtab[i].u.ival and the first character of string sval by either of *symtab[i].u.sval or symtab[i].u.sval[0].

PHP[edit]

Union types were introduced in PHP 8.0.[11] The values are implicitly "tagged" with a type by the language, and may be retrieved by "gettype()".

class Example
{
    private int|float $foo;

    public function squareAndAdd(float|int $bar): int|float
    {
        return $bar ** 2 + $this->foo;
    }
}

Python[edit]

Support for typing was introduced in Python 3.5.[12] The new syntax for union types were introduced in Python 3.10.[13]

class Example:
    foo = 0

    def square_and_add(self, bar: int | float) -> int | float:
        return bar ** 2 + self.foo

TypeScript[edit]

Union types are supported in TypeScript.[14] The values are implicitly "tagged" with a type by the language, and may be retrieved using a typeof call for primitive values and an instanceof comparison for complex data types. Types with overlapping usage (e.g. a slice method exists on both strings and arrays, the plus operator works on both strings and numbers) don't need additional narrowing to use these features.

function successor(n: number | bigint): number | bigint {
    // types that support the same operations don't need narrowing
    return ++n;
}

function dependsOnParameter(v: string | Array<string> | number) {
    // distinct types need narrowing
    if (v instanceof Array) {
        // do something
    } else if (typeof(v) === "string") {
        // do something else
    } else {
        // has to be a number
    }
}

Rust[edit]

Tagged unions in Rust use the enum keyword, and can contain tuple and struct variants:

enum Foo {
	Bar(i32),
	Baz { x: String, y: i32 },
}

Untagged unions in Rust use the union keyword:

union Foo {
	bar: i32,
	baz: bool,
}

Reading from the fields of an untagged union results in undefined behavior if the data in the union is not valid as the type of the field, and thus requires an unsafe block:

let x = Foo { bar: 10 };
let y = unsafe { x.bar }; // This will set y to 10, and does not result in undefined behavior.
let z = unsafe { x.baz }; // This results in undefined behavior, as the value stored in x is not a valid bool.

See also[edit]

References[edit]

  1. ^ Ritchie, Dennis M. (March 1993). "The Development of the C Language". ACM SIGPLAN Notices. 28 (3): 201–208. doi:10.1145/155360.155580. The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of. The central notion I captured from Algol was a type structure based on atomic types (including structures), composed into arrays, pointers (references), and functions (procedures). Algol 68's concept of unions and casts also had an influence that appeared later.
  2. ^ "6.63 Unnamed Structure and Union Fields". Retrieved 2016-12-29.
  3. ^ Siebenmann., Chris. "CUnionsForNamespaces". utcc.utoronto.ca.
  4. ^ "Common Type Attributes: transparent_union". Using the GNU Compiler Collection (GCC).
  5. ^ IBM Corporation (March 1968). IBM System/360 PL/I Language Specifications (PDF). p. 52. Retrieved Jan 22, 2018.
  6. ^ a b IBM Corporation (Dec 2017). Enterprise PL/I for z/OS PL/I for AIX IBM Developer for z Systems PL/I for windows Language Reference (PDF). Retrieved Jan 22, 2018.
  7. ^ "How Rust Implements Tagged Unions - Pat Shaughnessy". patshaughnessy.net. Retrieved 2023-04-25.
  8. ^ "Union types - The Rust Reference". doc.rust-lang.org. Retrieved 2023-04-25.
  9. ^ "Type layout - The Rust Reference". doc.rust-lang.org. Retrieved 2023-04-25.
  10. ^ "Unions - The Rust Reference". doc.rust-lang.org. Retrieved 2023-04-25.
  11. ^ Karunaratne, Ayesh. "PHP 8.0: Union Types". PHP.Watch. Retrieved 30 November 2020.
  12. ^ "typing — Support for type hints — Python 3.9.7 documentation". docs.python.org. Retrieved 8 September 2021.
  13. ^ "PEP 604 -- Allow writing union types as X | Y". Python.org. Retrieved 8 September 2021.
  14. ^ "Handbook - Unions and Intersection Types". www.typescriptlang.org. Retrieved 30 November 2020.

External links[edit]