Some characters that render as a single symbol can span over a sequence
of several unicode code points (e.g., flag emojis, combination of a
letter and a diacritic, Hangul syllables, etc.).
Such composites are called grapheme clusters in the unicode standard,
and this patch introduces recognition of extended grapheme cluster
boundaries, allowing to iterate over rendered characters. Without this,
user may observe the cursor being "stuck" inside a character for several
keystrokes, while it's making its way through each code point in the
grapheme cluster.
The implementation follows the boundaries search algorithm outlined in
the technical report 29 of the Unicode standard[1]. The implementation was
tested against the set of test cases provided by the unicode character
database[2].
Additionally to the grapheme cluster boundaries search itself, this
patch adds `isExtendedPictographic` function, that answers whether the
given code point has a unicode "Extended_Pictographic" property, which
is required to correctly determine grapheme cluster boundaries. This
method is implemented natively in the JDK 21 and can be removed once we
start targeting that version.
Extended_Pictographic property is stored as a bitmap. I was considering
making a similar map for the code point classification in the grapheme
cluster boundary search implementation, which could yield better
performance, but that would require adding another half a megabyte (at
least) of data into the JAR and I've settled for the bunch of `if`s way.
That is something that can be reconsidered and shouldn't be difficult to
change if the impact on performance would be noticeable (in my simple
tests it didn't show).
A few functions in the vim-engine were adjusted to handle grapheme
clusters (such as getting the horizontal offset and adjusting the cursor
to not reach over the end of the line).
[1]: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
[2]: https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.txt
There was an issue that when we enter visual block, move up, then try to switch to the visual line, we get a disposed caret issue.
This was caused by the fact that we get the list of carets, then process them one by one. However, as we update the first caret, the second gets disposed.
It's not enough that the document is writable, the editor needs to be non-read-only, too.
Fixes VIM-2313, fixes VIM-2318, fixes VIM-2666, fixes VIM-2951
Note that this temporarily changes the semantics of `:set` to always set the local option, instead of setting the global option (because we now eagerly initialise local values). Neither is correct, but we don't yet have a way to support the proper behaviour.