From 9e6bca0b2fadeb55d55a27329a72e03b31d9998d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Haoran=20S=2E=20Diao=20=28=E5=88=81=E6=B5=A9=E7=84=B6=29?= <0@hairydiode.xyz> Date: Sat, 6 Sep 2025 16:37:54 -0700 Subject: All the sites --- unihome.html | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 212 insertions(+) create mode 100644 unihome.html (limited to 'unihome.html') diff --git a/unihome.html b/unihome.html new file mode 100644 index 0000000..476ad0b --- /dev/null +++ b/unihome.html @@ -0,0 +1,212 @@ + + +We Have Unicode at Home + + + + + +
+
+--------------------------------------------------------------------------------
+
+>HairyDiode                                             
+
+--------------------------------------------------------------------------------
+We Have Unicode at Home                                                      6-30-2023
+--------------------------------------------------------------------------------
+So as we all know, the Linux console is limited to 512 characters, and lives in
+kernel space. So I wrote a workaround that displays unicode characters using
+braille (assuming your linux console font has braille characters) characters
+using only userland busybox.
+
+
+--------------------------=[Part I. Braille Graphics]=--------------------------
+Braille graphics are actually really easy, the braille block goes from  U+2800
+to U+28FF, with the lower 8 bits corresponding to the dots in each braille
+character in the following order:
+
+#0 3
+#1 4
+#2 5
+#6 7
+
+with 0 being the lowest bit and 7 being the highest bit. 
+
+utf-8 encodes this codepoint with three bytes 
+
+1110xxxx 	10xxxxxx 	10xxxxxx
+
+where x represents the bits of the codepoint, therefore U+2800 converted to
+UTF-8 is 0xE2A080 (big endian) or 14852224 in decimal (I'll explain why decimal
+is relevant later).
+
+If you take the pixel buffer, shift it according to the above chart (and
+adjusted for the utf-8 encoding position change), and OR the base codepoint, you
+get your desired braille character.
+
+The problem is that bash can not do bitwise operations, and that it calls a
+seperate process for conversion from hex to decimal. So our code ends up looking
+like this: 
+
+	if [ "${rawbuff[((1+4*$2))]:((1+2*$1)):1}" == "1" ];then
+                num=$(($num + 16))
+        fi
+
+	where $num starts off as 14852224, we have a raw pixel buffer where each
+	row is stored as a string where '1' represents a filled in pixel, and
+	the current braille block we are rendering's x and y position are at $2
+	and $1. 
+
+The above code takes the value of the raw pixel buffer at position (1,1)
+relative to the current code block, shifts it by 4, then ORs it with the
+rendered braille character.
+
+
+I also wrote some code to take commands that draw in the raw pixel buffer as
+well. 
+
+code here
+
+----------------=[Part 2, Rendering BDF fonts with only busybox]=---------------
+BDF is a human legible bitmap font format where each character entry looks like:
+
+STARTCHAR uni6D69
+ENCODING 28009
+SWIDTH 1000 0
+DWIDTH 8 0
+BBX 7 7 0 -1
+BITMAP
+98
+1C
+A8
+3E
+80
+9C
+9C
+ENDCHAR
+
+	Source: Misaki Mincho, also sidenote, the entire font is only 746K
+	despite the insanely inefficient format and large amount of characters
+	supported , meanwhile TeX Live is installing multiple 50 Megabyte fonts
+	that only support latin.
+
+The first line is the unicode codepoint, followed by some info I don't care
+about, and the bitmap data of the character where each row is a stored as a line
+converted to hex. You can tell if we convert the hex to binary, it will be the
+"raw pixel format" from before. so all we really need to do is write a small awk
+script to find the relevant bitmap lines, then convert to binary and display it
+with previous braille display script.
+
+Complete Character Display code here
+
+-------------------------=[Part 3. UTF-8 Shenanigans.]=-------------------------
+One annoying thing about utf-8, is that if you want to get the codepoint of a
+particular character in a utf-8 string, you have to do some iconv trickery where
+you first convert it to UTF-32, then convert it to hex.
+
+Another problem is that BDF stores the codepoint as DECIMAL!!!!!. You see that
+line "STARTCHAR uni6D69"? That's just the name of the character, it could
+theoretically be anything. The actual line storing the codepoint is 
+"ENCODING 28009", So we have to convert from hex to decimal, which is a
+surprisingly convoluted procedure in bash.
+
+All this is done in a wrapper script that displays all the input from stdin and
+displays it using all the fonts in a directory given as its argument 
+
+wrapper script code here
+
+----------------------------=[Part 4. Practical Use]=---------------------------
+So remember the janky bash based IM from last time? I modified it to use the
+braille display from before. I also wrote a little script that displays all the
+non-ASCII characters in the previously focused tmux pane, so together we can
+both display and input utf-8 characters in the linux console using tmux.
+
+see the code and writeup
+
+
+"Screenshots" below:
+
+Bash running in tmux
+[usernm@cm│[usernm@cmphostname ~]$ mkdir 帖                          │乔
+phostname │[usernm@cmphostname ~]$ cd 帖                             │pdr
+~]$ ud    │[usernm@cmphostname 帖]$ vim 天干                         │⢠⠋⣏⡁⡆⡇⠀⠀⠁
+⡤⡧⡄⠀⡧⠄⠀⠀  │                                                          │⢹⠔⢅⠇⡇⡇⠀⠀⠀
+⡇⡇⡇⡖⠓⡆⠀⠀  │                                                          │⠸⠠⠊⠀⠥⠇⠀⠀⠂
+⠁⠏⠁⠧⠤⠇⠀⠀  │                                                          │⣲⡪⢰⣓⣲⠀⠀⠀
+⡤⡧⡄⠀⡧⠄⠀⠀  │                                                          │⠒⣱⠘⡖⡞⠀⠀⠀
+⡇⡇⡇⡖⠓⡆⠀⠀  │                                                          │⠩⠜⠠⠃⠧⠇⠀⠀
+⠁⠏⠁⠧⠤⠇⠀⠀  │                                                          │⢠⠴⠥⠤⡄⠀⠀⠀
+⡤⡧⡄⠀⡧⠄⠀⠀  │                                                          │⠸⢭⠭⡭⠇⠀⠀⠀
+⡇⡇⡇⡖⠓⡆⠀⠀  │                                                          │⠤⠊⠀⠣⠤⠇⠀⠀
+⠁⠏⠁⠧⠤⠇⠀⠀  │                                                          │
+⠉⠉⢹⠉⠉⠁⠀⠀  │                                                          │
+⠉⠉⡝⡍⠉⠁⠀⠀  │                                                          │
+⠤⠊⠀⠈⠢⠄⠀⠀  │                                                          │
+⠈⠉⢹⠉⠉⠀⠀⠀  │                                                          │
+⠒⠒⢺⠒⠒⠂⠀⠀  │                                                          │
+⠀⠀⠸⠀⠀⠀⠀⠀  │                                                          │
+[usernm@cm│                                                          │
+phostname │                                                          │
+~]$       │                                                          │
+          │                                                          │
+Leftpane is displaying all the unicode characters in the primary terminal
+(remember, on the linux console they would all just be squares), and right pane
+is the input method, which displays candidate characters in bash.
+
+Vim running in tmux
+⡇⡇⡇⡖⠓⡆⠀⠀  │甲乙丙丁                                                  │之 鐻
+⠁⠏⠁⠧⠤⠇⠀⠀  │        最常用                                            │azn
+⡤⡧⡄⠀⡧⠄⠀⠀  │~                                                         │⠤⠤⠼⠤⢤⠀⠀⠀
+⡇⡇⡇⡖⠓⡆⠀⠀  │~                                                         │⠀⠀⣀⠔⠁⠀⠀⠀
+⠁⠏⠁⠧⠤⠇⠀⠀  │~                                                         │⠔⠉⠒⠤⠤⠄⠀⠀
+⡤⡧⡄⠀⡧⠄⠀⠀  │~                                                         │⣊⡂⣀⣗⣒⠀⠀⠀
+⡇⡇⡇⡖⠓⡆⠀⠀  │~                                                         │⢺⡂⣗⢗⡖⡃⠀⠀
+⠁⠏⠁⠧⠤⠇⠀⠀  │~                                                         │⠽⠴⠑⠝⠘⠄⠀⠀
+⠉⠉⢹⠉⠉⠁⠀⠀  │~                                                         │
+⠉⠉⡝⡍⠉⠁⠀⠀  │~                                                         │
+⠤⠊⠀⠈⠢⠄⠀⠀  │~                                                         │
+⠈⠉⢹⠉⠉⠀⠀⠀  │~                                                         │
+⠒⠒⢺⠒⠒⠂⠀⠀  │~                                                         │
+⠀⠀⠸⠀⠀⠀⠀⠀  │~                                                         │
+[usernm@cm│~                                                         │
+phostname │~                                                         │
+~]$ ud    │~                                                         │
+⣏⣉⣹⣉⣉⡇⠀⠀  │~                                                         │
+⠧⠤⢼⠤⠤⠇⠀⠀  │~                                                         │
+⠀⠀⠸⠀⠀⠀⠀⠀  │~                                                         │
+⠉⠉⢉⠝⠋⠀⠀⠀  │~                                                         │
+⢀⠔⠁⠀⠀⡀⠀⠀  │~                                                         │
+⠣⠤⠤⠤⠤⠃⠀⠀  │~                                                         │
+⣉⣉⣹⣉⣉⡁⠀⠀  │~                                                         │
+⡇⢀⠜⢄⠀⡇⠀⠀  │~                                                         │
+⠇⠁⠀⠀⠥⠇⠀⠀  │~                                                         │
+⠉⠉⢹⠉⠉⠁⠀⠀  │~                                                         │
+⠀⠀⢸⠀⠀⠀⠀⠀  │~                                                         │
+⠀⠠⠼⠀⠀⠀⠀⠀  │~                                                         │
+⢸⠭⠭⠭⢽⠀⠀⠀  │~                                                         │
+⢹⠭⡏⡭⠭⡅⠀⠀  │~                                                         │
+⠚⠉⠇⠬⠪⠄⠀⠀  │~                                                         │
+⡖⣓⣚⣒⡓⡆⠀⠀  │~                                                         │
+⢀⣓⣲⣒⣃⠀⠀⠀  │~                                                         │
+⠘⠀⠸⠀⠚⠀⠀⠀  │~                                                         │
+⢸⣉⣹⣉⣹⠀⠀⠀  │~                                                         │
+⢸⠤⢼⠤⢼⠀⠀⠀  │~                                                         │
+⠎⠀⠸⠀⠼⠀⠀⠀  │~                                                         │
+[usernm@cm│~                                                         │
+phostname │~                                                         │
+~]$       │-- INSERT --                            2,11-15       All │
+
+
+
+
+ + -- cgit v1.1