You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that if you define your own float4 type as a simple struct, it will have insufficient alignment to qualify for vector loads on the device side, which may reduce performance. CUDA’s built-in float4 type is implemented as a struct with added alignment attributes on both host and device.
What won’t work (in the general case) is using CUDA’s aligned float4 for device code and interface it to your own unaligned float4 on the host side. This kind of mix-and-match might work under carefully constrained circumstances, but generally speaking you definitely want to use the same type for both host and device code. So your original concerns along those lines were justified.
Note that CUDA does not provide a built-in float3 type so you have no choice but to define your own.
No description provided.
The text was updated successfully, but these errors were encountered: