cont/xkbabuse.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

<!--
123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
一二三四-->[TITLE]                              [DATE]
--------------------------------------------------------------------------------
[SETTITLE]Abusing X11's xkb for fun and profit
[SETDATE]08-19-2025

Yesterday I was playing around with xkb keyboard layouts before I discovered
that compose functionality for European (and other languages that use accents)
keyboard layouts is implemented as a 5227 line long table that maps key symbols
to unicode strings.

ex: /usr/share/X11/locale/en_US.UTF-8/Compose on my computer
# UTF-8 (Unicode) Compose sequences
#
# Spacing versions of accents (mostly)
<dead_tilde> <space>                    : "~"   asciitilde # TILDE
<dead_tilde> <dead_tilde>               : "~"   asciitilde # TILDE
<Multi_key> <minus> <space>             : "~"   asciitilde # TILDE
<Multi_key> <space> <minus>             : "~"   asciitilde # TILDE
<dead_acute> <space>                    : "'"   apostrophe # APOSTROPHE
...
<dead_circumflex> <o>                   : "ồ   ocircumflex # LATIN SMALL LETTER O WITH CIRCUMFLEX
...
<Multi_key> <colon> <U2395>             : "⍠́   U2360 # : ⎕ APL FUNCTIONAL SYMBOL QUAD COLON

For those of you who are unaware, if you use for example a German keyboard
layout, pressing "^" followed by "o" will produce "ô". The circumflex "^" key is
called a "dead key" in xkb terminology because it does not produce any
characters by itself. In addition if you were to bind any key on your keyboard
to "compose" it would allow you to type basically a wide range of unicode
characters via various sequences of key presses. 

This got me thinking, this functionality is identical to how ibus-table IMs work
and would allow me to implement Chinese IMs in a way that requires no extra
software and which would presumbaly be compatible with a far greater range of
software since the functionality is built into X11. The fact that the default
file is 5000 lines long tells me that X11 is more than capable of handling long
tables.

My first step was to take the <a href="https://github.com/jdh8/ibus-boshiamy">ibus boshiamy implementation</a> I already have on my
computer and mutilate it into the above format using convoluted regex commands
and a lot of whack a mole to turn 46000 lines of:

aaa     100     鑫
aaa     99      龘
aaa     98      鑆

	into

<a> <a> <a>  <space>    : "鑫"
<a> <a> <a>  <1>        : "龘"
<a> <a> <a>  <2>        : "鑆"


To my surprise, after moving this file to ~/.XCompose it worked exactly as I
expected with no lag. The only issue then, is that there's no way to switch
between compose sets in xkb. This explains why the en_US.UTF-8 compose set was
so long, it had to essentially handle every possible dead-key or compose
sequence for every keyboard layout.

There's an easy solution to this though, which is to create a custom keyboard
layout where the keys are mapped to custom key symbols (xkb's layer of
abstraction above a physical keycode and below a text string) and have my
compose table use those as the inputs instead of qwerty keys.

Since I started this whole thing by messing with xkb layouts, it didn't take
long for me to edit the us layout into something like this:

    key <AD01>  {[       U9AD8,  Q              ]}; #　高
    key <AD02>  {[       U4E94,  W              ]}; #　五
    key <AD03>  {[       U4E00,  E              ]}; #　一
    key <AD04>  {[       U4E8C,  R              ]}; #　二
    key <AD05>  {[       U901A,  T              ]}; #　通
    key <AD06>  {[       U76CA,  Y              ]}; #　益
    key <AD07>  {[       U4EE5,  U              ]}; #　以
    key <AD08>  {[       U5F8C,  I              ]}; #　後
    key <AD09>  {[       U3007,  O              ]}; #　〇
    key <AD10>  {[       U5099,  P              ]}; #　備

And my compose table to look something like this:

<U5C0D> <U5C0D> <U5C0D> <U4E8C>  <space>        : "鑆"
<U5C0D> <U5C0D> <U5C0D>  <space>        : "鑫"
<U5C0D> <U5C0D> <U5C0D> <U8981>  <space>        : "龘"

Now if I set my keyboard layout to "boshiamy", it will be sending these custom
key symbols which will be interpreted by my custom compose rules, and if I
switch it back to the us layout the compose rules don't apply.

The only issues there are with this method is that the functionality for user
specific keyboard layouts is incredibly broken in xkb, so I had to add my
custom layout to the system xkb data directory. Otherwise this whole
implementation would consist entirely of two config files in the home directory.

Also if you have ibus installed make sure to check "use system keyboard layout"
in settings or else it'll keep switching your keyboard layout around.

Also for those unfamilair with component based input methods, unlike phonetic
input methods like Pinyin (derogatory) or Zhuyin (derogatory), the mapping for
key presses to characters has very few if any conflicts, and therefore the
system works in an open loop way. You can easily use somethign like CangJie or
Boshiamy without the preview window or without any sort of predictive text.


The files I've created and further reading are in a git repo <a href="https://hairydiode.xyz/cgit/xkb-boshiamy">here</a>