[TITLE] [DATE] -------------------------------------------------------------------------------- [SETTITLE]We Have Unicode at Home [SETDATE]6-30-2023 Preface it's just uses more memory, handwriting in the 70s, arabic/farsi terminals, historically never existed an ascii only time. telegram codes busybox bash sed awk grep bc iconv xxd read sort uniq cat tmux kbd console-braille zpix bdf 30M zpix ttf 4.5MiB jizji 1.3M misaki 747K Google 2.7 MB LinBiolinumTI.pfb 860KiB HanaMinA 22M,30M unifont 11.7MiB Latex 2.9GiB cm-super 57.8MiB just european languages + cyrillic cbfonts 70.6MiB ensembl human genome 4.5GiB Rant Aesthetics vs. Function cool-retro-term, pixel fonts, monospace of chinese vs english The text confusion In the beginning there was not the command line. There was wall paintings bone etc Inefficiency The only first class data types on a computer are int, uint, and float. Why is there not universal way to display/store them on posix systems, 256 combos per byte, only 9 used, less than 5% efficiency HTML v. inefficient, easy to grep kinda Json, v. inefficient Data confusion IME table takes in keypresses, spits out unicode character keypresses should be own type, but is ascii, what happens when different keyboard layout? What happens if typing russiand and want to use vim or press C-c? Big table, very simple datatype, not first class Tree/files, super simple datatype, not first class, file argument woes Display: simply doing an OR required like 3 processes because every program required different text representation of the same data, even though first class data type no language has first class lexer, closest is awk bdf file ridiculously inefficient, keywords too long, actual data is 2x by hexadec representation bdf file is just a big table w/ 2d array as output , very simple data type, have to do 1000 conversions for input (decimal codepoint vs 32bit vs utf-8), and output (2d array of bits vs hex representation of the same) Big table no way to sort to make more efficient Representation Forced to represent all out data so that the lowest common denominator teletype in 1970s new jersey can print it if we were to send it directly over serial not just a bash issue: JSON, HTML, PDB, even PDF/postscript Ascii isn't event text, can't write accents or directiona quotes or nn or even a bar over a letter. Flipside, nobody who doesn't use posix knows or cares what ~ and | are. Regex, same basic thing, 30 different variants, because forced to represent as text with no specialized symbols same with code, every language has its own way of representing a code block, none of which are particularly legible if should be one key press and one byte In-band vs out of band no universal way to embed data, json has directional brackets, backslash hell is the norm, completely avoidable, but the text obsession means type info is ignored guis all based off of one dumb xerox experiment all have same issues lossy data display no interop of actual data no open loop input no way to store input as its own data/scripting in memory data: no interop, spend all your time using framework libraries to convert data around. It's not just a bash issue weird selection of first class data types, why is text 1st class and not a mesh or a linked list? Rant In the beginning, there was not a command line. In the beginning, there was iron oxide pigment on torch lit cave walls, then there were stylus indentations on clay, patterns carved on turtle shell, knots tied in string, grooves cut in vinyl, and finally discrete states stored in a great multitude of mechanisms. The universal datatype is not text, it is uint_256, IEEE floating points.