identify: ASCII-only version strings #491

lidel · 2022-12-06T20:27:57Z

This PR formalizes normalization and length limit per string field ( agentVersion, protocolVersion and protocols array).
The goal is to reduce surprises and unify behavior across implementations.

Kubo PR: ipfs/kubo#9465

Specify length limit to unify version string handling across implementations. License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>

marten-seemann

Can we make this UTF-8? I'd really like to not introduce any surprises here.

lidel · 2022-12-06T22:57:22Z

@marten-seemann a̹̖̪͔͖̝͘̕͞r̛̛̫̞̬̝͎̘̹͞ę̴͉͉̼̀͟ ̛͏̴͜͏͉̣͉̮͖̳̝͍̭̮̮͍͔͈̲͉͙̰ͅy̞̠̮̥̯̞̦͈̮̣̗̤͚͓̻͝ǫ̶̸̻̫͍̞͎̘͡u̦̮̭̬͔̝͇̠͟͡ ̦͇̘̤͔̝̥̜̘̙̭͙̱́͢s̸̛̠̝͓͚̬̦͇͘͜u͠͏̧̪͇̙͖̳̘̘͓͞͝ͅr̶̷̛̪̩̭̮̱͖̼̙̬̣͔̫͇̳̳̦͢ͅe̸̡̧̯̝͖̮̱͖̲̜̙̹͜͝ͅ ̧̰̟̻̳̗̱͕̰̜̠͇̘͔́͜͟y̡̢͉͙͎͎̹̖̲͇̰͟͢o̶̷͍̼̙̦̫̹̯̭͓̺̞͢ù͏̡̠̭̤̗͖͈͕͙̭͢ ̡̨̛͇͚̦̠̗͢͢w͏̦̗͍̫̀á̢͔͚̰̬͔͕̼̕ņ̴̛̲̜̮͈͙̰̀͘t̶̤̜̯͙̝̺̜̠̘̼͙̞͍̖̟͓̝̤͘͞ ͜͞͏͏͕̣͍̝͍̜͉̳̙̘̥͜t̛̪͔̯̱͓͝ǫ̸̨̢̤̘̰͉̬̤͓̙̼̖̞͖͟ ̷̨̢̳̣͈̪͙̦̻͉̗̹͓̤͖͕͘͢a̢̛͈̲͉͎̲͕͘l͠҉̧̛̞̥̟̜̠̯̰͙͎̬͜ͅl̸̵͍͓͓̼̬͙͍̰̭̟̖̪͍̀o͡͏̡̜̩̗̤͚̪̟̘̥̲̘̥͕̘͎̗̬͉͘͟w̡̡͖̼̟̣͙͕̥͈̮̜̩̟͈̫͘̕͠ ̘̻͕̬͓̗̠̕͞u̸̗͎̜̗͍̝͔̘͎̙̠̭̗͕t̴̤̺̫̮̩̀́͟͞ͅf̸͙̞̞̣̦͟8͏̞̥̮̙̙̲͉̰͖͉͚̤͇͠ ?

marten-seemann · 2022-12-06T23:11:36Z

Nice UTF-8 art! :)

I don't really see reason not to. Limiting ourselves to ASCII is so 1990s style.

Winterhuman · 2022-12-07T13:39:25Z

Perhaps this could be phrased as: "Implementations should discard non-ASCII characters and trim the string to 64 characters, but may choose to allow UTF-8 characters if potential for UTF-8 art/mimicry is acceptable"

I definitely wouldn't want UTF-8 support to be outright gone, using UTF-8 in protocol names (maybe containing CIDs with UTF-8 encodings) could have a lot of potential use-cases. For the agent and version strings though, that's perfectly understandable

Winterhuman · 2022-12-08T19:47:34Z

In fact, maybe better idea, what about:

"Implementations should trim the string to 64 characters. Implementations MAY allow UTF-8 characters in the string, however, these strings should be visible to users as both UTF-8 and ASCII punycode (per IETF RFC 3492) to protect against UTF-8 mimicry."

marten-seemann · 2022-12-08T20:52:50Z

I'd say let's fully embrace UTF-8. This is 2022, and we finally have a standard encoding that's universally supported.

Building on @Winterhuman's proposal:

"Strings are UTF-8 encode. Implementations MAY trim the string to 64 characters. When made visible to users, implementations MAY output both UTF-8 and ASCII punycode (per IETF RFC 3492) to protect against UTF-8 mimicry."

lidel requested a review from mxinden December 6, 2022 20:27

identify: ASCII-only version strings

9f717e2

Specify length limit to unify version string handling across implementations. License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>

lidel force-pushed the identify-version-string-limits branch from 7142a33 to 9f717e2 Compare December 6, 2022 20:37

lidel requested a review from marten-seemann December 6, 2022 20:48

marten-seemann reviewed Dec 6, 2022

View reviewed changes

lidel mentioned this pull request Dec 6, 2022

fix(id): ascii-only identify versions ipfs/kubo#9465

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identify: ASCII-only version strings #491

identify: ASCII-only version strings #491

identify: ASCII-only version strings #491

Are you sure you want to change the base?

identify: ASCII-only version strings #491

Conversation

Choose a reason for hiding this comment