From 4c3eafd541061856c965121c3105b462d4c60b1c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Haoran=20S=2E=20Diao=20=28=E5=88=81=E6=B5=A9=E7=84=B6=29?= <0@hairydiode.xyz> Date: Thu, 29 Jun 2023 11:10:16 -0700 Subject: Added page on my jank ime implementation --- cont/jankime.html | 118 ++++++++++++++++++++++++++++++++++++++++++++ index.html | 11 +++-- jankime.html | 143 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 267 insertions(+), 5 deletions(-) create mode 100644 cont/jankime.html create mode 100644 jankime.html diff --git a/cont/jankime.html b/cont/jankime.html new file mode 100644 index 0000000..96241dc --- /dev/null +++ b/cont/jankime.html @@ -0,0 +1,118 @@ +[TITLE] [DATE] +-------------------------------------------------------------------------------- +[SETTITLE]Janky IME +[SETDATE]6-29-2023 + +A new python version came out, so of course that means every python package on +my rolling-release system has broken. This includes ibus, which I need for my +input method. I'm currently running some web-crawling scripts that I don't want +to stop, so while I wait for my machine to finish downloading all PDB, I decided +to write a janky bashscript implementation of ibus table so that I can still use +嘸蝦米. + +Background: + 嘸蝦米 (EN: Boshiamy) is a proprietary component-based input method for + Chinese. They offer paid software on iOS, Android, and Windows, but no + Linux version is available. On my phone I gladly pay for a Boshiamy + license, but on Linux I use this implementation of Boshiamy using ibus + table from here. + + However, I urge that people pay for the license anyways as most of the + actual work in creating an IME is organizing ways to decompose + characters into componentd and compiling inputs codes off of that, and + not the technical implementation I'm doing here, which is fairly + trivial. + + In technical terms, it's just a very large TSV file with the first + column being the input code, the 2nd being the character, and the 3rd + being a ranking for ordering which character comes first when selecting + them. + + EX: + typing wso[SPACE] or wso1 inputs 浩, typing wso2 inputs 澢 + wso 浩 100 + wso 澢 99 + + It can also be used for non-chinese characters such as è + ,ne è 100 + +The Implementation: + I want this to work in the terminal, and I want it to only require bash, + and xorg, and it needs to work on all programs running in the terminal + regardless of whether they used cooked input(bash) or raw input(vim) + + So what I ended up on is a bash script that runs in a seperate st + terminal window, which uses grep to find the character, and inputs it + using xdotool. + + First we have a script that called with a desktop environment shortcut + (in my case i3), that finds the current xorg window id, and launches the + ime in seperate st window, with the current xorg window id as the only + argument for the ime script + + CODE: + #!/bin/bash + win=$(xdotool getactivewindow) + st -e ims "$win" & + exit + + Input is read with read in a loop + + CODE: + OIFS=$IFS + export IFS=""; read -rsn1 i + IFS=$OIFS + + IFS="" is done to make it read spaces as input, but this makes this + implementation very brittle and probably not portable depending on bash + versions. This also passes along control and special characters such as + delete and move left, but differences in how these sequences are + interpreted as xinput input for the terminal vs. for terminal programs, + as well as differences between terminal emulators means that this + doesn't work very well. For vim on cool-retro-term for exampple, tab, + escape and the arrow keys work, but backsapce is interpreted as a delete + + I then simple run grep ^$code\s, rearrange the columns with awk, sort, + then take out the ranking column + + CODE: + opt=$(grep "^$code\s" ~/lang/zh/boshiamy/ibus-boshiamy/boshiamy.txt |\ + #remove simplfied + grep -v 98|\ + awk '{print $3" "$2}' |\ + sort -nr|\ + awk '{print $2}') + + finally, it inputs the selected character if the input is 1-9 or Space, + using xdotool and the windowid of the original window. Note that bash + variables don't sort newlines, so the conversion of the input characters + from line seperated to space seperated was done for free. However this + makes the code less portable + + CODE: + char=$(echo $opt | awk "{print \$1}") + ... + xdotool type --window "$1" "$char" + +Downsides: + Obviously this implementation sucks, but it's meant to be a backup for + when all I have working are bash and xorg + + One issue is that xdotool seems to have a fair bit of latency, and on + certain terminal combinations seems to skip input alltogether, I think + fiddling with the input delay argument in the future might fix this. + + Another is that this method won't work on things like browsers, because + they only take input when in focus (at least on my desktop environment). + +The Future: + What I really wanted to make initially was essentially tmux with an + input method , where I would run a bash script in a terminal that would + itself pretend to be a terminal, and pass along input after going + through an input method. This would've had the added benefit of running + in the Linux console as well (although by default the linux console can + not display fonts with more than 512 characters). The output could've + also been processed such that it is displayed with brailled unicode + characters, which would've fixed the font issue diff --git a/index.html b/index.html index bca6907..8187e93 100644 --- a/index.html +++ b/index.html @@ -44,19 +44,20 @@ Where's all the other stuff you host from this domain? My Mastodon Instance Where's all the content? Scroll Down -[Matrix Homeserver] 3-17-2019 -[MIT Decisions Countdown Clock] 3-09-2019 +[Janky IME] 6-29-2023 [PGP Public Key] 6-26-2018 -[Control Systems Club Web Controlled Servo Instructions] 1-24-2019 [Control Systems Club Project Workflow] 1-24-2019 -[𘤝𘤞𘤀𘤛・𘤌𘤛𘤧𘤁・𘤊𘤡・𘤈𘤝-Iokalant Writing System-优卡文字系] 1-26-2018 +[Matrix Homeserver] 3-17-2019 +[Leaving ORION] 2-9-2018 [Moving This Site] 6-26-2018 [Making This Site] 11-13-2017 +[MIT Decisions Countdown Clock] 3-09-2019 [Omnicom Writeup] 1-12-2018 -[Leaving ORION] 2-9-2018 [Control Systems Club] 2-21-2018 +[Control Systems Club Web Controlled Servo Instructions] 1-24-2019 [Some Doodles] 2-27-2018 [Earbud Holders] 3-7-2018 +[𘤝𘤞𘤀𘤛・𘤌𘤛𘤧𘤁・𘤊𘤡・𘤈𘤝-Iokalant Writing System-优卡文字系] 1-26-2018
diff --git a/jankime.html b/jankime.html new file mode 100644 index 0000000..d65585a --- /dev/null +++ b/jankime.html @@ -0,0 +1,143 @@ + + +Janky IME + + + + + +
+
+--------------------------------------------------------------------------------
+
+>HairyDiode                                             
+
+--------------------------------------------------------------------------------
+Janky IME                                                      6-29-2023
+--------------------------------------------------------------------------------
+
+A new python version came out, so of course that means every python package on
+my rolling-release system has broken. This includes ibus, which I need for my
+input method. I'm currently running some web-crawling scripts that I don't want
+to stop, so while I wait for my machine to finish downloading all PDB, I decided
+to write a janky bashscript implementation of ibus table so that I can still use
+嘸蝦米.
+
+Background:
+	嘸蝦米 (EN: Boshiamy) is a proprietary component-based input method for
+	Chinese. They offer paid software on iOS, Android, and Windows, but no
+	Linux version is available. On my phone I gladly pay for a Boshiamy
+	license, but on Linux I use this implementation of Boshiamy using ibus
+	table from here.
+
+	However, I urge that people pay for the license anyways as most of the
+	actual work in creating an IME is organizing ways to decompose
+	characters into componentd and compiling inputs codes off of that, and
+	not the technical implementation I'm doing here, which is fairly
+	trivial.
+
+	In technical terms, it's just a very large TSV file with the first
+	column being the input code, the 2nd being the character, and the 3rd
+	being a ranking for ordering which character comes first when selecting
+	them.
+	
+	EX: 
+		typing wso[SPACE] or wso1 inputs 浩, typing wso2 inputs 澢
+			wso     浩      100
+			wso     澢      99
+
+		It can also be used for non-chinese characters such as è
+			,ne     è       100
+
+The Implementation:
+	I want this to work in the terminal, and I want it to only require bash,
+	and xorg, and it needs to work on all programs running in the terminal
+	regardless of whether they used cooked input(bash) or raw input(vim)
+
+	So what I ended up on is a bash script that runs in a seperate st
+	terminal window, which uses grep to find the character, and inputs it
+	using xdotool. 
+
+	First we have a script that called with a desktop environment shortcut
+	(in my case i3), that finds the current xorg window id, and launches the
+	ime in seperate st window, with the current xorg window id as the only
+	argument for the ime script
+
+	CODE:
+		#!/bin/bash
+		win=$(xdotool getactivewindow)
+		st -e ims "$win" &
+		exit
+	
+	Input is read with read in a loop
+
+	CODE:
+		OIFS=$IFS
+		export IFS=""; read -rsn1 i
+		IFS=$OIFS
+
+	IFS="" is done to make it read spaces as input, but this makes this
+	implementation very brittle and probably not portable depending on bash
+	versions. This also passes along control and special characters such as
+	delete and move left, but differences in how these sequences are
+	interpreted as xinput input for the terminal vs.  for terminal programs,
+	as well as differences between terminal emulators means that this
+	doesn't work very well. For vim on cool-retro-term for exampple, tab,
+	escape and the arrow keys work, but backsapce is interpreted as a delete
+
+	I then simple run grep ^$code\s, rearrange the columns with awk, sort,
+	then take out the ranking column
+
+	CODE:
+		opt=$(grep "^$code\s" ~/lang/zh/boshiamy/ibus-boshiamy/boshiamy.txt |\
+					#remove simplfied
+					grep -v 98|\
+					awk '{print $3" "$2}' |\
+					sort -nr|\
+					awk '{print $2}')
+
+	finally, it inputs the selected character if the input is 1-9 or Space,
+	using xdotool and the windowid of the original window. Note that bash
+	variables don't sort newlines, so the conversion of the input characters
+	from line seperated to space seperated was done for free.  However this
+	makes the code less portable
+
+	CODE:
+		char=$(echo $opt | awk "{print \$1}")
+		...
+		xdotool type  --window "$1" "$char"
+
+Downsides:
+	Obviously this implementation sucks, but it's meant to be a backup for
+	when all I have working are bash and xorg
+
+	One issue is that xdotool seems to have a fair bit of latency, and on
+	certain terminal combinations seems to skip input alltogether, I think
+	fiddling with the input delay argument in the future might fix this.
+
+	Another is that this method won't work on things like browsers, because
+	they only take input when in focus (at least on my desktop environment).
+
+The Future:
+	What I really wanted to make initially was essentially tmux with an
+	input method , where I would run a bash script in a terminal that would
+	itself pretend to be a terminal, and pass along input after going
+	through an input method. This would've had the added benefit of running
+	in the Linux console as well (although by default the linux console can
+	not display fonts with more than 512 characters). The output could've
+	also been processed such that it is displayed with brailled unicode
+	characters, which would've fixed the font issue
+
+
+
+
+ + -- cgit v1.1