diff options
Diffstat (limited to 'cont/unihome.html')
-rw-r--r-- | cont/unihome.html | 187 |
1 files changed, 187 insertions, 0 deletions
diff --git a/cont/unihome.html b/cont/unihome.html new file mode 100644 index 0000000..30b8dd7 --- /dev/null +++ b/cont/unihome.html @@ -0,0 +1,187 @@ +<!-- +123456789-223456789-323456789-423456789-523456789-623456789-723456789-8234567890 +一二三四-->[TITLE] [DATE] +-------------------------------------------------------------------------------- +[SETTITLE]We Have Unicode at Home +[SETDATE]6-30-2023 +So as we all know, the Linux console is limited to 512 characters, and lives in +kernel space. So I wrote a workaround that displays unicode characters using +braille (assuming your linux console font has braille characters) characters +using only userland busybox. + + +--------------------------=[Part I. Braille Graphics]=-------------------------- +Braille graphics are actually really easy, the braille block goes from U+2800 +to U+28FF, with the lower 8 bits corresponding to the dots in each braille +character in the following order: + +#0 3 +#1 4 +#2 5 +#6 7 + +with 0 being the lowest bit and 7 being the highest bit. + +utf-8 encodes this codepoint with three bytes + +1110xxxx 10xxxxxx 10xxxxxx + +where x represents the bits of the codepoint, therefore U+2800 converted to +UTF-8 is 0xE2A080 (big endian) or 14852224 in decimal (I'll explain why decimal +is relevant later). + +If you take the pixel buffer, shift it according to the above chart (and +adjusted for the utf-8 encoding position change), and OR the base codepoint, you +get your desired braille character. + +The problem is that bash can not do bitwise operations, and that it calls a +seperate process for conversion from hex to decimal. So our code ends up looking +like this: + + if [ "${rawbuff[((1+4*$2))]:((1+2*$1)):1}" == "1" ];then + num=$(($num + 16)) + fi + + where $num starts off as 14852224, we have a raw pixel buffer where each + row is stored as a string where '1' represents a filled in pixel, and + the current braille block we are rendering's x and y position are at $2 + and $1. + +The above code takes the value of the raw pixel buffer at position (1,1) +relative to the current code block, shifts it by 4, then ORs it with the +rendered braille character. + + +I also wrote some code to take commands that draw in the raw pixel buffer as +well. + +code <a href="https://hairydiode.xyz/cgit/bbrll.git/tree/bbrll">here</a> + +----------------=[Part 2, Rendering BDF fonts with only busybox]=--------------- +BDF is a human legible bitmap font format where each character entry looks like: + +STARTCHAR uni6D69 +ENCODING 28009 +SWIDTH 1000 0 +DWIDTH 8 0 +BBX 7 7 0 -1 +BITMAP +98 +1C +A8 +3E +80 +9C +9C +ENDCHAR + + Source: Misaki Mincho, also sidenote, the entire font is only 746K + despite the insanely inefficient format and large amount of characters + supported , meanwhile TeX Live is installing multiple 50 Megabyte fonts + that only support latin. + +The first line is the unicode codepoint, followed by some info I don't care +about, and the bitmap data of the character where each row is a stored as a line +converted to hex. You can tell if we convert the hex to binary, it will be the +"raw pixel format" from before. so all we really need to do is write a small awk +script to find the relevant bitmap lines, then convert to binary and display it +with previous braille display script. + +Complete Character Display code <a href="https://hairydiode.xyz/cgit/bbrll.git/tree/fontd">here</a> + +-------------------------=[Part 3. UTF-8 Shenanigans.]=------------------------- +One annoying thing about utf-8, is that if you want to get the codepoint of a +particular character in a utf-8 string, you have to do some iconv trickery where +you first convert it to UTF-32, then convert it to hex. + +Another problem is that BDF stores the codepoint as DECIMAL!!!!!. You see that +line "STARTCHAR uni6D69"? That's just the name of the character, it could +theoretically be anything. The actual line storing the codepoint is +"ENCODING 28009", So we have to convert from hex to decimal, which is a +surprisingly convoluted procedure in bash. + +All this is done in a wrapper script that displays all the input from stdin and +displays it using all the fonts in a directory given as its argument + +wrapper script code <a href="https://hairydiode.xyz/cgit/bbrll.git/tree/fontd">here</a> + +----------------------------=[Part 4. Practical Use]=--------------------------- +So remember the janky bash based IM from last time? I modified it to use the +braille display from before. I also wrote a little script that displays all the +non-ASCII characters in the previously focused tmux pane, so together we can +both display and input utf-8 characters in the linux console using tmux. + +see the <a href="https://hairydiode.xyz/cgit/bim.git">code</a> and <a href="https://hairydiode.xyz/jankime">writeup</a> + + +"Screenshots" below: + +Bash running in tmux +[usernm@cm│[usernm@cmphostname ~]$ mkdir 帖 │乔 +phostname │[usernm@cmphostname ~]$ cd 帖 │pdr +~]$ ud │[usernm@cmphostname 帖]$ vim 天干 │⢠⠋⣏⡁⡆⡇⠀⠀⠁ +⡤⡧⡄⠀⡧⠄⠀⠀ │ │⢹⠔⢅⠇⡇⡇⠀⠀⠀ +⡇⡇⡇⡖⠓⡆⠀⠀ │ │⠸⠠⠊⠀⠥⠇⠀⠀⠂ +⠁⠏⠁⠧⠤⠇⠀⠀ │ │⣲⡪⢰⣓⣲⠀⠀⠀ +⡤⡧⡄⠀⡧⠄⠀⠀ │ │⠒⣱⠘⡖⡞⠀⠀⠀ +⡇⡇⡇⡖⠓⡆⠀⠀ │ │⠩⠜⠠⠃⠧⠇⠀⠀ +⠁⠏⠁⠧⠤⠇⠀⠀ │ │⢠⠴⠥⠤⡄⠀⠀⠀ +⡤⡧⡄⠀⡧⠄⠀⠀ │ │⠸⢭⠭⡭⠇⠀⠀⠀ +⡇⡇⡇⡖⠓⡆⠀⠀ │ │⠤⠊⠀⠣⠤⠇⠀⠀ +⠁⠏⠁⠧⠤⠇⠀⠀ │ │ +⠉⠉⢹⠉⠉⠁⠀⠀ │ │ +⠉⠉⡝⡍⠉⠁⠀⠀ │ │ +⠤⠊⠀⠈⠢⠄⠀⠀ │ │ +⠈⠉⢹⠉⠉⠀⠀⠀ │ │ +⠒⠒⢺⠒⠒⠂⠀⠀ │ │ +⠀⠀⠸⠀⠀⠀⠀⠀ │ │ +[usernm@cm│ │ +phostname │ │ +~]$ │ │ + │ │ +Leftpane is displaying all the unicode characters in the primary terminal +(remember, on the linux console they would all just be squares), and right pane +is the input method, which displays candidate characters in bash. + +Vim running in tmux +⡇⡇⡇⡖⠓⡆⠀⠀ │甲乙丙丁 │之 鐻 +⠁⠏⠁⠧⠤⠇⠀⠀ │ 最常用 │azn +⡤⡧⡄⠀⡧⠄⠀⠀ │~ │⠤⠤⠼⠤⢤⠀⠀⠀ +⡇⡇⡇⡖⠓⡆⠀⠀ │~ │⠀⠀⣀⠔⠁⠀⠀⠀ +⠁⠏⠁⠧⠤⠇⠀⠀ │~ │⠔⠉⠒⠤⠤⠄⠀⠀ +⡤⡧⡄⠀⡧⠄⠀⠀ │~ │⣊⡂⣀⣗⣒⠀⠀⠀ +⡇⡇⡇⡖⠓⡆⠀⠀ │~ │⢺⡂⣗⢗⡖⡃⠀⠀ +⠁⠏⠁⠧⠤⠇⠀⠀ │~ │⠽⠴⠑⠝⠘⠄⠀⠀ +⠉⠉⢹⠉⠉⠁⠀⠀ │~ │ +⠉⠉⡝⡍⠉⠁⠀⠀ │~ │ +⠤⠊⠀⠈⠢⠄⠀⠀ │~ │ +⠈⠉⢹⠉⠉⠀⠀⠀ │~ │ +⠒⠒⢺⠒⠒⠂⠀⠀ │~ │ +⠀⠀⠸⠀⠀⠀⠀⠀ │~ │ +[usernm@cm│~ │ +phostname │~ │ +~]$ ud │~ │ +⣏⣉⣹⣉⣉⡇⠀⠀ │~ │ +⠧⠤⢼⠤⠤⠇⠀⠀ │~ │ +⠀⠀⠸⠀⠀⠀⠀⠀ │~ │ +⠉⠉⢉⠝⠋⠀⠀⠀ │~ │ +⢀⠔⠁⠀⠀⡀⠀⠀ │~ │ +⠣⠤⠤⠤⠤⠃⠀⠀ │~ │ +⣉⣉⣹⣉⣉⡁⠀⠀ │~ │ +⡇⢀⠜⢄⠀⡇⠀⠀ │~ │ +⠇⠁⠀⠀⠥⠇⠀⠀ │~ │ +⠉⠉⢹⠉⠉⠁⠀⠀ │~ │ +⠀⠀⢸⠀⠀⠀⠀⠀ │~ │ +⠀⠠⠼⠀⠀⠀⠀⠀ │~ │ +⢸⠭⠭⠭⢽⠀⠀⠀ │~ │ +⢹⠭⡏⡭⠭⡅⠀⠀ │~ │ +⠚⠉⠇⠬⠪⠄⠀⠀ │~ │ +⡖⣓⣚⣒⡓⡆⠀⠀ │~ │ +⢀⣓⣲⣒⣃⠀⠀⠀ │~ │ +⠘⠀⠸⠀⠚⠀⠀⠀ │~ │ +⢸⣉⣹⣉⣹⠀⠀⠀ │~ │ +⢸⠤⢼⠤⢼⠀⠀⠀ │~ │ +⠎⠀⠸⠀⠼⠀⠀⠀ │~ │ +[usernm@cm│~ │ +phostname │~ │ +~]$ │-- INSERT -- 2,11-15 All │ |