cont/jankime.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

<!--
123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
一二三四-->[TITLE]                                                      [DATE]
--------------------------------------------------------------------------------
[SETTITLE]Janky IME
[SETDATE]6-29-2023

A new python version came out, so of course that means every python package on
my rolling-release system has broken. This includes ibus, which I need for my
input method. I'm currently running some web-crawling scripts that I don't want
to stop, so while I wait for my machine to finish downloading all <a href="https://www.rcsb.org/">PDB</a>, I decided
to write a janky bashscript implementation of ibus table so that I can still use
嘸蝦米.

Background:
	嘸蝦米 (EN: Boshiamy) is a proprietary component-based input method for
	Chinese. They offer paid software on iOS, Android, and Windows, but no
	Linux version is available. On my phone I gladly pay for a Boshiamy
	license, but on Linux I use this implementation of Boshiamy using ibus
	table from <a href="https://github.com/jdh8/ibus-boshiamy">here</a>.

	However, I urge that people pay for the license anyways as most of the
	actual work in creating an IME is organizing ways to decompose
	characters into componentd and compiling inputs codes off of that, and
	not the technical implementation I'm doing here, which is fairly
	trivial.

	In technical terms, it's just a very large TSV file with the first
	column being the input code, the 2nd being the character, and the 3rd
	being a ranking for ordering which character comes first when selecting
	them.
	
	EX: 
		typing wso[SPACE] or wso1 inputs 浩, typing wso2 inputs 澢
			wso     浩      100
			wso     澢      99

		It can also be used for non-chinese characters such as è
			,ne     è       100

The Implementation:
	I want this to work in the terminal, and I want it to only require bash,
	and xorg, and it needs to work on all programs running in the terminal
	regardless of whether they used cooked input(bash) or raw input(vim)

	So what I ended up on is a bash script that runs in a seperate st
	terminal window, which uses grep to find the character, and inputs it
	using xdotool. 

	First we have a script that called with a desktop environment shortcut
	(in my case i3), that finds the current xorg window id, and launches the
	ime in seperate st window, with the current xorg window id as the only
	argument for the ime script

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/iml">CODE:</a>
		#!/bin/bash
		win=$(xdotool getactivewindow)
		st -e ims "$win" &
		exit
	
	Input is read with read in a loop

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
		OIFS=$IFS
		export IFS=""; read -rsn1 i
		IFS=$OIFS

	IFS="" is done to make it read spaces as input, but this makes this
	implementation very brittle and probably not portable depending on bash
	versions. This also passes along control and special characters such as
	delete and move left, but differences in how these sequences are
	interpreted as xinput input for the terminal vs.  for terminal programs,
	as well as differences between terminal emulators means that this
	doesn't work very well. For vim on cool-retro-term for exampple, tab,
	escape and the arrow keys work, but backsapce is interpreted as a delete

	I then simple run grep ^$code\s, rearrange the columns with awk, sort,
	then take out the ranking column

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
		opt=$(grep "^$code\s" ~/lang/zh/boshiamy/ibus-boshiamy/boshiamy.txt |\
					#remove simplfied
					grep -v 98|\
					awk '{print $3" "$2}' |\
					sort -nr|\
					awk '{print $2}')

	finally, it inputs the selected character if the input is 1-9 or Space,
	using xdotool and the windowid of the original window. Note that bash
	variables don't sort newlines, so the conversion of the input characters
	from line seperated to space seperated was done for free.  However this
	makes the code less portable

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">CODE:</a>
		char=$(echo $opt | awk "{print \$1}")
		...
		xdotool type  --window "$1" "$char"

Downsides:
	Obviously this implementation sucks, but it's meant to be a backup for
	when all I have working are bash and xorg

	One issue is that xdotool seems to have a fair bit of latency, and on
	certain terminal combinations seems to skip input alltogether, I think
	fiddling with the input delay argument in the future might fix this.

	Another is that this method won't work on things like browsers, because
	they only take input when in focus (at least on my desktop environment).

The Future:
	What I really wanted to make initially was essentially tmux with an
	input method , where I would run a bash script in a terminal that would
	itself pretend to be a terminal, and pass along input after going
	through an input method. This would've had the added benefit of running
	in the Linux console as well (although by default the linux console can
	not display fonts with more than 512 characters). The output could've
	also been processed such that it is displayed with brailled unicode
	characters, which would've fixed the font issue