[go: nahoru, domu]

Showing posts with label UAX #31. Show all posts
Showing posts with label UAX #31. Show all posts

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

stopwatch image The beta review period for Unicode 11.0 and related technical standards will close on April 23, 2018. This is the last opportunity for technical comments before version 11.0 is released in Q2 2018. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments.

Unicode 11.0 adds seven new scripts, including Hanifi Rohingya, 66 additional emoji characters, including four new components for hair color (for a total of 157 emoj sequences). The set of Georgian Mtavruli capital letters has been added to support modern casing practices.
In addition to the Unicode core specification, five Unicode Standard Annexes and two Unicode Technical Standards have significant specification and/or data file updates that are correlated with the new additions for Unicode 11.0.0. Review of those changes is strongly encouraged during the beta review period.

UAX #14, Unicode Line Breaking Algorithm
  • Uses Extended_Pictographic property for future-proofing
UAX #29, Unicode Text Segmentation
  • New support for Indic virama handling
  • Uses Extended_Pictographic property for future-proofing
  • A new table of formal regex definitions
UAX #31, Unicode Identifier and Pattern Syntax
  • Refines the use of ZWJ in identifiers
  • Broadens the definition of hashtag identifiers
UAX #38, Unicode Han Database (Unihan)
  • Five new fields and improved regular expressions.
  • Document extension of Unihan properties to non-Unihan
UAX #44, Unicode Character Database
  • New property Equivalent_Unified_Ideograph
  • New regular expressions Bidi_Paired_Bracket & Equivalent_Unified_Ideograph
  • More discussion of emoji variation sequences
  • Clarification of values allowed for the Age property
UTS #10, Unicode Collation Algorithm
  • Updates data to Unicode 11.0
  • Clarification of search tailoring in visual-order scripts
UTS #39, Unicode Security Mechanisms
  • Updates data to Unicode 11.0
  • Enhances discussions of joining controls & combining sequences
UTS #46, Unicode IDNA Compatibility Processing
  • Updates data to Unicode 11.0
  • Changes the format of the test file for arbitrary input settings
  • Updates input setting for Transitional_Processing
UTS #51, Unicode Emoji
  • Supplies Extended_Pictographic property for future-proofing
  • Simplifies emoji sequence definitions
  • EBNF and Regex expressions for loose matches
  • More proposed guidelines: gender-neutral emoji, skin-tone modifiers, ZWJ visible fallbacks, hair-style components
  • Mechanism for changing the “facing” direction for emoji
Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are in each public review page. For more information, see the open public review issues.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Monday, October 5, 2015

Proposed Update UAX #31, Unicode Identifier and Pattern Syntax

Hash DonutUnicode Standard Annex #31, Unicode Identifier and Pattern Syntax, will be updated for Unicode 9.0. The proposed update is now available for general public review and comment.

A major change in the proposed update is the addition of a new section with recommended syntax for Unicode hashtags, also including emoji characters.

The draft also makes it clearer that XID_Start/Continue properties are preferred over ID_Start/Continue, and modifies the syntax of the definition to customization cleaner, and allow for medial-only characters in identifiers.