Some characters that render as a single symbol can span over a sequence
of several unicode code points (e.g., flag emojis, combination of a
letter and a diacritic, Hangul syllables, etc.).
Such composites are called grapheme clusters in the unicode standard,
and this patch introduces recognition of extended grapheme cluster
boundaries, allowing to iterate over rendered characters. Without this,
user may observe the cursor being "stuck" inside a character for several
keystrokes, while it's making its way through each code point in the
grapheme cluster.
The implementation follows the boundaries search algorithm outlined in
the technical report 29 of the Unicode standard[1]. The implementation was
tested against the set of test cases provided by the unicode character
database[2].
Additionally to the grapheme cluster boundaries search itself, this
patch adds `isExtendedPictographic` function, that answers whether the
given code point has a unicode "Extended_Pictographic" property, which
is required to correctly determine grapheme cluster boundaries. This
method is implemented natively in the JDK 21 and can be removed once we
start targeting that version.
Extended_Pictographic property is stored as a bitmap. I was considering
making a similar map for the code point classification in the grapheme
cluster boundary search implementation, which could yield better
performance, but that would require adding another half a megabyte (at
least) of data into the JAR and I've settled for the bunch of `if`s way.
That is something that can be reconsidered and shouldn't be difficult to
change if the impact on performance would be noticeable (in my simple
tests it didn't show).
A few functions in the vim-engine were adjusted to handle grapheme
clusters (such as getting the horizontal offset and adjusting the cursor
to not reach over the end of the line).
[1]: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
[2]: https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.txt
There was an issue that when we enter visual block, move up, then try to switch to the visual line, we get a disposed caret issue.
This was caused by the fact that we get the list of carets, then process them one by one. However, as we update the first caret, the second gets disposed.
Note that this temporarily changes the semantics of `:set` to always set the local option, instead of setting the global option (because we now eagerly initialise local values). Neither is correct, but we don't yet have a way to support the proper behaviour.
Hard wraps require figuring out the width of the panel, and all we have is the width of the associated editor, which excludes gutter, etc. Easier to let the UI toolkit handle it
Helper functions now take the editor rather than the text, ready for search to rely on per-editor options (i.e. '`iskeyword'`). Also standardises on `Int` for search parameters. While the file size is a `Long`, the editor returns a `CharSequence`, which is indexed by `Int`.
Update method signatures and return types:
- Getting rid of "magic constants" (e.g. -1) and replacing them with nullable
- Replacing direction Int with Enum
- JetBrains annotations
Also fixes some incorrect usages of local options as global, e.g. 'ideajoin' and 'scroll'. There are some options that should be local that are only ever accessed at global scope. These need fixing in the future, e.g. 'iskeyword', 'matchpairs' and 'virtualedit'
While it is conceptually very similar to StringOption, the implementation of list vs not-list operations are very different, and having a separate type will allow us to do more interesting things with overloading when we introduce delegate properties