jankime.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

<!DOCTYPE html>
<head>
<title>Janky IME</title>
<meta charset="utf-8"/>
<link rel="stylesheet" href="https://hairydiode.xyz/style.css"/>
<link rel="icon" type="image/png" href="https://hairydiode.xyz/img/fav/logo.png"/>
</head>
<body>
<div class="content">
<pre>
<!--
123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
一二三四
-->--------------------------------------------------------------------------------

<a href="https://hairydiode.xyz">>HairyDiode</a>                                             

--------------------------------------------------------------------------------
<!--
123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890
一二三四-->Janky IME                                                      6-29-2023
--------------------------------------------------------------------------------
UPDATE: This IME is now tmux based, old xdotool version is still <a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/ims">here</a>

UPDATE2: I have created the most cursed thing in existance. Full unicode display
and input support in the linux console using only userland bash/busybox and
tmux. See <a href="https://hairydiode.xyz/unihome/">we have unicode at home</a>

A new python version came out, so of course that means every python package on
my rolling-release system has broken. This includes ibus, which I need for my
input method. I'm currently running some web-crawling scripts that I don't want
to stop, so while I wait for my machine to finish downloading all of <a href="https://www.rcsb.org/">PDB</a>, I decided
to write a janky bashscript implementation of ibus table so that I can still use
嘸蝦米.

Background:
	嘸蝦米 (EN: Boshiamy) is a proprietary component-based input method for
	Chinese. They offer paid software on iOS, Android, and Windows, but no
	Linux version is available. On my phone I gladly pay for a Boshiamy
	license, but on Linux I use this implementation of Boshiamy using ibus
	table from <a href="https://github.com/jdh8/ibus-boshiamy">here</a>.

	However, I urge that people pay for the license anyways as most of the
	actual work in creating an IME is organizational. The technical aspects
	of IMEs are fairly trivial as you'll see here.

	In technical terms, it's just a very large TSV file with the first
	column being the input code, the 2nd being the character, and the 3rd
	being a ranking for ordering which character comes first when selecting
	them.
	
	EX: 
		typing wso[SPACE] or wso1 inputs 浩, typing wso2 inputs 澢
			wso     浩      100
			wso     澢      99

		It can also be used for non-chinese characters such as è
			,ne     è       100

The Implementation:
	I want this to work in the terminal, and I want it to only require bash,
	and tmux, and it needs to work on all programs running in the terminal
	regardless of whether they use cooked input(bash) or raw input(vim). 

	So what I ended up on is a bash script that runs in a seperate tmux
	panel and sends input to the previously active panel

	Input is read with read in a loop

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/imt">CODE:</a>
		OIFS=$IFS
		export IFS=""; read -rsn1 i
		IFS=$OIFS

	IFS="" is done to make it read spaces as input, but this makes this
	implementation very brittle and probably not portable depending on bash
	versions. This also passes along control and special characters such as
	delete and move left, and it seems tmux handles most of the differences
	between terminals. An older version of this ime using xdotool did not
	handle these control characters well.

	I then simply run grep ^$code\s, rearrange the columns with awk and sort,
	then take out the ranking column

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/imt">CODE:</a>
		opt=$(grep "^$code\s" ~/lang/zh/boshiamy/ibus-boshiamy/boshiamy.txt |\
					#remove simplfied
					grep -v 98|\
					awk '{print $3" "$2}' |\
					sort -nr|\
					awk '{print $2}')

	finally, it inputs the selected character into tmux if the ime input is
	1-9 or Space. NOTE: bash variables don't store newlines, so the
	conversion of the input characters from line seperated to space
	seperated was done for free.  However this makes the code less portable

	<a href="https://hairydiode.xyz/cgit/dotfiles.git/tree/scripts/imt">CODE:</a>
		char=$(echo $opt | awk "{print \$1}")
		...
		tmux send-key -t "!" "$char"

Downsides/The Future:
	This works in the linux console but obviously the linux console has
	limitations on what text it can display(by default the linux console can
	not display fonts with more than 512 characters). I think I'm gonna
	write a bash based cbrll implemntation and a character displayer as well
	so that I can get full userland unicode display and input support.
</pre>
</div>
<br>
<br>
</body>
<!-- 
if you're digging in the src you might be interested in how this site works
here: https://hairydiode.xyz/meta2 
-->