From df5df87632439a47d28214d3b155535259eec2ec Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Haoran=20S=2E=20Diao=20=28=E5=88=81=E6=B5=A9=E7=84=B6=29?=
<0@hairydiode.xyz>
Date: Tue, 19 Aug 2025 14:10:12 -0700
Subject: xbabuse.html
---
index.html | 1 +
xkbabuse.html | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 133 insertions(+)
create mode 100644 xkbabuse.html
diff --git a/index.html b/index.html
index 5cb5df9..6a07b63 100644
--- a/index.html
+++ b/index.html
@@ -43,6 +43,7 @@ Where's all the other stuff you host from this domain?
My Mastodon Instance
Where's all the content?
Scroll Down
+[Abusing X11's xkb for fun and profit] 08-19-2025
[Four Corners Input Method for ibus-table] 11-25-2023
[We Have Unicode at Home] 6-30-2023
[Janky IME] 6-29-2023
diff --git a/xkbabuse.html b/xkbabuse.html
new file mode 100644
index 0000000..ae8ff5c
--- /dev/null
+++ b/xkbabuse.html
@@ -0,0 +1,132 @@
+
+
+Abusing X11's xkb for fun and profit
+
+
+
+
+
+
+
+--------------------------------------------------------------------------------
+
+>HairyDiode
+
+--------------------------------------------------------------------------------
+Abusing X11's xkb for fun and profit 08-19-2025
+--------------------------------------------------------------------------------
+
+Yesterday I was playing around with xkb keyboard layouts before I discovered
+that compose functionality for European (and other languages that use accents)
+keyboard layouts is implemented as a 5227 line long table that maps key symbols
+to unicode strings.
+
+ex: /usr/share/X11/locale/en_US.UTF-8/Compose on my computer
+# UTF-8 (Unicode) Compose sequences
+#
+# Spacing versions of accents (mostly)
+ : "~" asciitilde # TILDE
+ : "~" asciitilde # TILDE
+ : "~" asciitilde # TILDE
+ : "~" asciitilde # TILDE
+ : "'" apostrophe # APOSTROPHE
+...
+ : "ồ ocircumflex # LATIN SMALL LETTER O WITH CIRCUMFLEX
+...
+ : "⍠́ U2360 # : ⎕ APL FUNCTIONAL SYMBOL QUAD COLON
+
+For those of you who are unaware, if you use for example a German keyboard
+layout, pressing "^" followed by "o" will produce "ô". The circumflex "^" key is
+called a "dead key" in xkb terminology because it does not produce any
+characters by itself. In addition if you were to bind any key on your keyboard
+to "compose" it would allow you to type basically a wide range of unicode
+characters via various sequences of key presses.
+
+This got me thinking, this functionality is identical to how ibus-table IMs work
+and would allow me to implement Chinese IMs in a way that requires no extra
+software and which would presumbaly be compatible with a far greater range of
+software since the functionality is built into X11. The fact that the default
+file is 5000 lines long tells me that X11 is more than capable of handling long
+tables.
+
+My first step was to take the ibus boshiamy implementation I already have on my
+computer and mutilate it into the above format using convoluted regex commands
+and a lot of whack a mole to turn 46000 lines of:
+
+aaa 100 鑫
+aaa 99 龘
+aaa 98 鑆
+
+ into
+
+ : "鑫"
+ <1> : "龘"
+ <2> : "鑆"
+
+
+To my surprise, after moving this file to ~/.XCompose it worked exactly as I
+expected with no lag. The only issue then, is that there's no way to switch
+between compose sets in xkb. This explains why the en_US.UTF-8 compose set was
+so long, it had to essentially handle every possible dead-key or compose
+sequence for every keyboard layout.
+
+There's an easy solution to this though, which is to create a custom keyboard
+layout where the keys are mapped to custom key symbols (xkb's layer of
+abstraction above a physical keycode and below a text string) and have my
+compose table use those as the inputs instead of qwerty keys.
+
+Since I started this whole thing by messing with xkb layouts, it didn't take
+long for me to edit the us layout into something like this:
+
+ key {[ U9AD8, Q ]}; # 高
+ key {[ U4E94, W ]}; # 五
+ key {[ U4E00, E ]}; # 一
+ key {[ U4E8C, R ]}; # 二
+ key {[ U901A, T ]}; # 通
+ key {[ U76CA, Y ]}; # 益
+ key {[ U4EE5, U ]}; # 以
+ key {[ U5F8C, I ]}; # 後
+ key {[ U3007, O ]}; # 〇
+ key {[ U5099, P ]}; # 備
+
+And my compose table to look something like this:
+
+ : "鑆"
+ : "鑫"
+ : "龘"
+
+Now if I set my keyboard layout to "boshiamy", it will be sending these custom
+key symbols which will be interpreted by my custom compose rules, and if I
+switch it back to the us layout the compose rules don't apply.
+
+The only issues there are with this method is that the functionality for user
+specific keyboard layouts is incredibly broken in xkb, so I had to add my
+custom layout to the system xkb data directory. Otherwise this whole
+implementation would consist entirely of two config files in the home directory.
+
+Also if you have ibus installed make sure to check "use system keyboard layout"
+in settings or else it'll keep switching your keyboard layout around.
+
+Also for those unfamilair with component based input methods, unlike phonetic
+input methods like Pinyin (derogatory) or Zhuyin (derogatory), the mapping for
+key presses to characters has very few if any conflicts, and therefore the
+system works in an open loop way. You can easily use somethign like CangJie or
+Boshiamy without the preview window or without any sort of predictive text.
+
+
+The files I've created and further reading are in a git repo here
+
+
+
+
+
+
+
--
cgit v1.1