From df5df87632439a47d28214d3b155535259eec2ec Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Haoran=20S=2E=20Diao=20=28=E5=88=81=E6=B5=A9=E7=84=B6=29?= <0@hairydiode.xyz> Date: Tue, 19 Aug 2025 14:10:12 -0700 Subject: xbabuse.html --- index.html | 1 + xkbabuse.html | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 133 insertions(+) create mode 100644 xkbabuse.html diff --git a/index.html b/index.html index 5cb5df9..6a07b63 100644 --- a/index.html +++ b/index.html @@ -43,6 +43,7 @@ Where's all the other stuff you host from this domain? My Mastodon Instance Where's all the content? Scroll Down +[Abusing X11's xkb for fun and profit] 08-19-2025 [Four Corners Input Method for ibus-table] 11-25-2023 [We Have Unicode at Home] 6-30-2023 [Janky IME] 6-29-2023 diff --git a/xkbabuse.html b/xkbabuse.html new file mode 100644 index 0000000..ae8ff5c --- /dev/null +++ b/xkbabuse.html @@ -0,0 +1,132 @@ + + +Abusing X11's xkb for fun and profit + + + + + +
+
+--------------------------------------------------------------------------------
+
+>HairyDiode                                             
+
+--------------------------------------------------------------------------------
+Abusing X11's xkb for fun and profit                              08-19-2025
+--------------------------------------------------------------------------------
+
+Yesterday I was playing around with xkb keyboard layouts before I discovered
+that compose functionality for European (and other languages that use accents)
+keyboard layouts is implemented as a 5227 line long table that maps key symbols
+to unicode strings.
+
+ex: /usr/share/X11/locale/en_US.UTF-8/Compose on my computer
+# UTF-8 (Unicode) Compose sequences
+#
+# Spacing versions of accents (mostly)
+                     : "~"   asciitilde # TILDE
+                : "~"   asciitilde # TILDE
+               : "~"   asciitilde # TILDE
+               : "~"   asciitilde # TILDE
+                     : "'"   apostrophe # APOSTROPHE
+...
+                    : "ồ   ocircumflex # LATIN SMALL LETTER O WITH CIRCUMFLEX
+...
+               : "⍠́   U2360 # : ⎕ APL FUNCTIONAL SYMBOL QUAD COLON
+
+For those of you who are unaware, if you use for example a German keyboard
+layout, pressing "^" followed by "o" will produce "ô". The circumflex "^" key is
+called a "dead key" in xkb terminology because it does not produce any
+characters by itself. In addition if you were to bind any key on your keyboard
+to "compose" it would allow you to type basically a wide range of unicode
+characters via various sequences of key presses. 
+
+This got me thinking, this functionality is identical to how ibus-table IMs work
+and would allow me to implement Chinese IMs in a way that requires no extra
+software and which would presumbaly be compatible with a far greater range of
+software since the functionality is built into X11. The fact that the default
+file is 5000 lines long tells me that X11 is more than capable of handling long
+tables.
+
+My first step was to take the ibus boshiamy implementation I already have on my
+computer and mutilate it into the above format using convoluted regex commands
+and a lot of whack a mole to turn 46000 lines of:
+
+aaa     100     鑫
+aaa     99      龘
+aaa     98      鑆
+
+	into
+
+        : "鑫"
+    <1>        : "龘"
+    <2>        : "鑆"
+
+
+To my surprise, after moving this file to ~/.XCompose it worked exactly as I
+expected with no lag. The only issue then, is that there's no way to switch
+between compose sets in xkb. This explains why the en_US.UTF-8 compose set was
+so long, it had to essentially handle every possible dead-key or compose
+sequence for every keyboard layout.
+
+There's an easy solution to this though, which is to create a custom keyboard
+layout where the keys are mapped to custom key symbols (xkb's layer of
+abstraction above a physical keycode and below a text string) and have my
+compose table use those as the inputs instead of qwerty keys.
+
+Since I started this whole thing by messing with xkb layouts, it didn't take
+long for me to edit the us layout into something like this:
+
+    key   {[       U9AD8,  Q              ]}; # 高
+    key   {[       U4E94,  W              ]}; # 五
+    key   {[       U4E00,  E              ]}; # 一
+    key   {[       U4E8C,  R              ]}; # 二
+    key   {[       U901A,  T              ]}; # 通
+    key   {[       U76CA,  Y              ]}; # 益
+    key   {[       U4EE5,  U              ]}; # 以
+    key   {[       U5F8C,  I              ]}; # 後
+    key   {[       U3007,  O              ]}; # 〇
+    key   {[       U5099,  P              ]}; # 備
+
+And my compose table to look something like this:
+
+             : "鑆"
+            : "鑫"
+             : "龘"
+
+Now if I set my keyboard layout to "boshiamy", it will be sending these custom
+key symbols which will be interpreted by my custom compose rules, and if I
+switch it back to the us layout the compose rules don't apply.
+
+The only issues there are with this method is that the functionality for user
+specific keyboard layouts is incredibly broken in xkb, so I had to add my
+custom layout to the system xkb data directory. Otherwise this whole
+implementation would consist entirely of two config files in the home directory.
+
+Also if you have ibus installed make sure to check "use system keyboard layout"
+in settings or else it'll keep switching your keyboard layout around.
+
+Also for those unfamilair with component based input methods, unlike phonetic
+input methods like Pinyin (derogatory) or Zhuyin (derogatory), the mapping for
+key presses to characters has very few if any conflicts, and therefore the
+system works in an open loop way. You can easily use somethign like CangJie or
+Boshiamy without the preview window or without any sort of predictive text.
+
+
+The files I've created and further reading are in a git repo here
+
+
+
+
+
+ + -- cgit v1.1