summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorHaoran S. Diao (刁浩然) <0@hairydiode.xyz>2023-06-29 11:10:16 -0700
committerHaoran S. Diao (刁浩然) <0@hairydiode.xyz>2023-06-29 11:10:16 -0700
commit4c3eafd541061856c965121c3105b462d4c60b1c (patch)
treef9b07aeaeb12da20505be3dbe5b030618f915f94
parent2d9995e39ba06ad3ab743ab8061b2bc33f096dde (diff)
Added page on my jank ime implementation
-rw-r--r--cont/jankime.html118
-rw-r--r--index.html11
-rw-r--r--jankime.html143
3 files changed, 267 insertions, 5 deletions
diff --git a/cont/jankime.html b/cont/jankime.html
new file mode 100644
index 0000000..96241dc
--- /dev/null
+++ b/cont/jankime.html
@@ -0,0 +1,118 @@
+<!--
+123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
+一二三四-->[TITLE] [DATE]
+--------------------------------------------------------------------------------
+[SETTITLE]Janky IME
+[SETDATE]6-29-2023
+
+A new python version came out, so of course that means every python package on
+my rolling-release system has broken. This includes ibus, which I need for my
+input method. I'm currently running some web-crawling scripts that I don't want
+to stop, so while I wait for my machine to finish downloading all <a href="https://www.rcsb.org/">PDB</a>, I decided
+to write a janky bashscript implementation of ibus table so that I can still use
+嘸蝦米.
+
+Background:
+ 嘸蝦米 (EN: Boshiamy) is a proprietary component-based input method for
+ Chinese. They offer paid software on iOS, Android, and Windows, but no
+ Linux version is available. On my phone I gladly pay for a Boshiamy
+ license, but on Linux I use this implementation of Boshiamy using ibus
+ table from <a href="https://github.com/jdh8/ibus-boshiamy">here</a>.
+
+ However, I urge that people pay for the license anyways as most of the
+ actual work in creating an IME is organizing ways to decompose
+ characters into componentd and compiling inputs codes off of that, and
+ not the technical implementation I'm doing here, which is fairly
+ trivial.
+
+ In technical terms, it's just a very large TSV file with the first
+ column being the input code, the 2nd being the character, and the 3rd
+ being a ranking for ordering which character comes first when selecting
+ them.
+
+ EX:
+ typing wso[SPACE] or wso1 inputs 浩, typing wso2 inputs 澢
+ wso 浩 100
+ wso 澢 99
+
+ It can also be used for non-chinese characters such as è
+ ,ne è 100
+
+The Implementation:
+ I want this to work in the terminal, and I want it to only require bash,
+ and xorg, and it needs to work on all programs running in the terminal
+ regardless of whether they used cooked input(bash) or raw input(vim)
+
+ So what I ended up on is a bash script that runs in a seperate st
+ terminal window, which uses grep to find the character, and inputs it
+ using xdotool.
+
+ First we have a script that called with a desktop environment shortcut
+ (in my case i3), that finds the current xorg window id, and launches the
+ ime in seperate st window, with the current xorg window id as the only
+ argument for the ime script
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/iml">CODE:</a>
+ #!/bin/bash
+ win=$(xdotool getactivewindow)
+ st -e ims "$win" &
+ exit
+
+ Input is read with read in a loop
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
+ OIFS=$IFS
+ export IFS=""; read -rsn1 i
+ IFS=$OIFS
+
+ IFS="" is done to make it read spaces as input, but this makes this
+ implementation very brittle and probably not portable depending on bash
+ versions. This also passes along control and special characters such as
+ delete and move left, but differences in how these sequences are
+ interpreted as xinput input for the terminal vs. for terminal programs,
+ as well as differences between terminal emulators means that this
+ doesn't work very well. For vim on cool-retro-term for exampple, tab,
+ escape and the arrow keys work, but backsapce is interpreted as a delete
+
+ I then simple run grep ^$code\s, rearrange the columns with awk, sort,
+ then take out the ranking column
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
+ opt=$(grep "^$code\s" ~/lang/zh/boshiamy/ibus-boshiamy/boshiamy.txt |\
+ #remove simplfied
+ grep -v 98|\
+ awk '{print $3" "$2}' |\
+ sort -nr|\
+ awk '{print $2}')
+
+ finally, it inputs the selected character if the input is 1-9 or Space,
+ using xdotool and the windowid of the original window. Note that bash
+ variables don't sort newlines, so the conversion of the input characters
+ from line seperated to space seperated was done for free. However this
+ makes the code less portable
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
+ char=$(echo $opt | awk "{print \$1}")
+ ...
+ xdotool type --window "$1" "$char"
+
+Downsides:
+ Obviously this implementation sucks, but it's meant to be a backup for
+ when all I have working are bash and xorg
+
+ One issue is that xdotool seems to have a fair bit of latency, and on
+ certain terminal combinations seems to skip input alltogether, I think
+ fiddling with the input delay argument in the future might fix this.
+
+ Another is that this method won't work on things like browsers, because
+ they only take input when in focus (at least on my desktop environment).
+
+The Future:
+ What I really wanted to make initially was essentially tmux with an
+ input method , where I would run a bash script in a terminal that would
+ itself pretend to be a terminal, and pass along input after going
+ through an input method. This would've had the added benefit of running
+ in the Linux console as well (although by default the linux console can
+ not display fonts with more than 512 characters). The output could've
+ also been processed such that it is displayed with brailled unicode
+ characters, which would've fixed the font issue
diff --git a/index.html b/index.html
index bca6907..8187e93 100644
--- a/index.html
+++ b/index.html
@@ -44,19 +44,20 @@ Where's all the other stuff you host from this domain?
<a href="https://social.hairydiode.xyz">My Mastodon Instance</a>
Where's all the content?
Scroll Down
-<a href="https://hairydiode.xyz/matrix">[Matrix Homeserver] 3-17-2019</a>
-<a href="https://hairydiode.xyz/mit">[MIT Decisions Countdown Clock] 3-09-2019</a>
+<a href="https://hairydiode.xyz/jankime">[Janky IME] 6-29-2023</a>
<a href="https://hairydiode.xyz/key">[PGP Public Key] 6-26-2018</a>
-<a href="https://hairydiode.xyz/csc-new">[Control Systems Club Web Controlled Servo Instructions] 1-24-2019</a>
<a href="https://hairydiode.xyz/csc-workflow">[Control Systems Club Project Workflow] 1-24-2019</a>
-<a href="https://hairydiode.xyz/iokalant">[𘤝𘤞𘤀𘤛・𘤌𘤛𘤧𘤁・𘤊𘤡・𘤈𘤝-Iokalant Writing System-优卡文字系] 1-26-2018</a>
+<a href="https://hairydiode.xyz/matrix">[Matrix Homeserver] 3-17-2019</a>
+<a href="https://hairydiode.xyz/orion">[Leaving ORION] 2-9-2018</a>
<a href="https://hairydiode.xyz/meta2">[Moving This Site] 6-26-2018</a>
<a href="https://hairydiode.xyz/meta">[Making This Site] 11-13-2017</a>
+<a href="https://hairydiode.xyz/mit">[MIT Decisions Countdown Clock] 3-09-2019</a>
<a href="https://hairydiode.xyz/omnicom">[Omnicom Writeup] 1-12-2018</a>
-<a href="https://hairydiode.xyz/orion">[Leaving ORION] 2-9-2018</a>
<a href="https://hairydiode.xyz/csc">[Control Systems Club] 2-21-2018</a>
+<a href="https://hairydiode.xyz/csc-new">[Control Systems Club Web Controlled Servo Instructions] 1-24-2019</a>
<a href="https://hairydiode.xyz/doodle">[Some Doodles] 2-27-2018</a>
<a href="https://hairydiode.xyz/earbud">[Earbud Holders] 3-7-2018</a>
+<a href="https://hairydiode.xyz/iokalant">[𘤝𘤞𘤀𘤛・𘤌𘤛𘤧𘤁・𘤊𘤡・𘤈𘤝-Iokalant Writing System-优卡文字系] 1-26-2018</a>
</pre>
</div>
<br>
diff --git a/jankime.html b/jankime.html
new file mode 100644
index 0000000..d65585a
--- /dev/null
+++ b/jankime.html
@@ -0,0 +1,143 @@
+<!DOCTYPE html>
+<head>
+<title>Janky IME</title>
+<meta charset="utf-8"/>
+<link rel="stylesheet" href="https://hairydiode.xyz/style.css"/>
+<link rel="icon" type="image/png" href="https://hairydiode.xyz/img/fav/logo.png"/>
+</head>
+<body>
+<div class="content">
+<pre>
+<!--
+123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
+一二三四
+-->--------------------------------------------------------------------------------
+
+<a href="https://hairydiode.xyz">>HairyDiode</a>
+
+--------------------------------------------------------------------------------
+<!--
+123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
+一二三四-->Janky IME 6-29-2023
+--------------------------------------------------------------------------------
+
+A new python version came out, so of course that means every python package on
+my rolling-release system has broken. This includes ibus, which I need for my
+input method. I'm currently running some web-crawling scripts that I don't want
+to stop, so while I wait for my machine to finish downloading all <a href="https://www.rcsb.org/">PDB</a>, I decided
+to write a janky bashscript implementation of ibus table so that I can still use
+嘸蝦米.
+
+Background:
+ 嘸蝦米 (EN: Boshiamy) is a proprietary component-based input method for
+ Chinese. They offer paid software on iOS, Android, and Windows, but no
+ Linux version is available. On my phone I gladly pay for a Boshiamy
+ license, but on Linux I use this implementation of Boshiamy using ibus
+ table from <a href="https://github.com/jdh8/ibus-boshiamy">here</a>.
+
+ However, I urge that people pay for the license anyways as most of the
+ actual work in creating an IME is organizing ways to decompose
+ characters into componentd and compiling inputs codes off of that, and
+ not the technical implementation I'm doing here, which is fairly
+ trivial.
+
+ In technical terms, it's just a very large TSV file with the first
+ column being the input code, the 2nd being the character, and the 3rd
+ being a ranking for ordering which character comes first when selecting
+ them.
+
+ EX:
+ typing wso[SPACE] or wso1 inputs 浩, typing wso2 inputs 澢
+ wso 浩 100
+ wso 澢 99
+
+ It can also be used for non-chinese characters such as è
+ ,ne è 100
+
+The Implementation:
+ I want this to work in the terminal, and I want it to only require bash,
+ and xorg, and it needs to work on all programs running in the terminal
+ regardless of whether they used cooked input(bash) or raw input(vim)
+
+ So what I ended up on is a bash script that runs in a seperate st
+ terminal window, which uses grep to find the character, and inputs it
+ using xdotool.
+
+ First we have a script that called with a desktop environment shortcut
+ (in my case i3), that finds the current xorg window id, and launches the
+ ime in seperate st window, with the current xorg window id as the only
+ argument for the ime script
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/iml">CODE:</a>
+ #!/bin/bash
+ win=$(xdotool getactivewindow)
+ st -e ims "$win" &
+ exit
+
+ Input is read with read in a loop
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
+ OIFS=$IFS
+ export IFS=""; read -rsn1 i
+ IFS=$OIFS
+
+ IFS="" is done to make it read spaces as input, but this makes this
+ implementation very brittle and probably not portable depending on bash
+ versions. This also passes along control and special characters such as
+ delete and move left, but differences in how these sequences are
+ interpreted as xinput input for the terminal vs. for terminal programs,
+ as well as differences between terminal emulators means that this
+ doesn't work very well. For vim on cool-retro-term for exampple, tab,
+ escape and the arrow keys work, but backsapce is interpreted as a delete
+
+ I then simple run grep ^$code\s, rearrange the columns with awk, sort,
+ then take out the ranking column
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
+ opt=$(grep "^$code\s" ~/lang/zh/boshiamy/ibus-boshiamy/boshiamy.txt |\
+ #remove simplfied
+ grep -v 98|\
+ awk '{print $3" "$2}' |\
+ sort -nr|\
+ awk '{print $2}')
+
+ finally, it inputs the selected character if the input is 1-9 or Space,
+ using xdotool and the windowid of the original window. Note that bash
+ variables don't sort newlines, so the conversion of the input characters
+ from line seperated to space seperated was done for free. However this
+ makes the code less portable
+
+ <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
+ char=$(echo $opt | awk "{print \$1}")
+ ...
+ xdotool type --window "$1" "$char"
+
+Downsides:
+ Obviously this implementation sucks, but it's meant to be a backup for
+ when all I have working are bash and xorg
+
+ One issue is that xdotool seems to have a fair bit of latency, and on
+ certain terminal combinations seems to skip input alltogether, I think
+ fiddling with the input delay argument in the future might fix this.
+
+ Another is that this method won't work on things like browsers, because
+ they only take input when in focus (at least on my desktop environment).
+
+The Future:
+ What I really wanted to make initially was essentially tmux with an
+ input method , where I would run a bash script in a terminal that would
+ itself pretend to be a terminal, and pass along input after going
+ through an input method. This would've had the added benefit of running
+ in the Linux console as well (although by default the linux console can
+ not display fonts with more than 512 characters). The output could've
+ also been processed such that it is displayed with brailled unicode
+ characters, which would've fixed the font issue
+</pre>
+</div>
+<br>
+<br>
+</body>
+<!--
+if you're digging in the src you might be interested in how this site works
+here: https://hairydiode.xyz/meta2
+-->