Wednesday, October 21, 2009

[Unicode Announcement] Unicode Collation Algorithm Version 5.2 Released

Version 5.2 of the Unicode Collation Algorithm has been released.
This version resynchronizes the Unicode Collation Algorithm with all
of the updates for the Unicode Standard, Version 5.2. Please note
the following changes and issues for implementations:

* The text of UTS #10 has been updated. Among other changes, the
revised text for UTS #10 makes it clear that the BASE for
implicit generation of weights for Han characters does not
include unassigned code points.
* There are small changes in Gujarati, Telugu, Malayalam
(including weighting for chillus), Tamil, and Sinhala. While
these changes move in the direction of expected behavior, good
results will only come from tailoring for particular languages,
such as with CLDR.
* There have been significant changes to the ordering of many
combining marks. Many combining marks that are not in customary
use in modern languages now have the same secondary weight, and
will only be distinguished on a fourth level, by code point
ordering. This can be seen by looking at the Unicode Collation
Charts ( In 5.2, many
characters now have a white background, indicating that they
sort exactly the same as the previous character, unless a 4th
(codepoint) level is used.
* Implementations of UCA should take note that the increased
number of characters may cause overflows if the implementing
code makes certain assumptions or optimizations. This can result
either from the new character additions (which increase the
number of distinct weights in the table) or because of changes
in the way the weights, particularly for secondary weight
values, are assigned in the table. The latter change may result
in unexpected numbers of characters having the same weight.

All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please

Tuesday, October 20, 2009

[Unicode Announcement] Public Review Issue #150: Draft UTS #46 Updated

The draft UTS#46 Unicode IDNA Compatible Preprocessing has been updated.
There are a number of new review notes pointing out issues and asking
for feedback. There are also new tables: one comparing behavior of
compatibility and escaped versions of FULL STOP in delimiting labels
between different browsers, and one comparing the allowed and disallowed
repertoires when processing IDNs according to the IDNA2003, IDNA2008,
and UTS #46 specifications. There are also many improvements and
clarifications of the text.


Review period closes October 26, 2009.

If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please

Thursday, October 1, 2009

[Unicode Announcement] Unicode 5.2.0 Released

Unicode 5.2 has been released! The data files, code charts, and Unicode
Standard Annexes for this version are final and are posted on the
Unicode site.

For Unicode 5.2, the core specification is no longer just a delta
document applied to the book; instead, the entire core specification,
with all textual changes integrated, will be available on the Unicode
site. As of this announcement, the first five chapters are available;
the other chapters will follow soon.

For full details about what is new or changed in this release, see the
version documentation for Unicode 5.2 at:

All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
