Are there any Unicode experts here?
I've been exchanging emails with Richard Cook (the Unihan maintainer) about getting some "rare" Taiwanese characters added to Unicode. I say "rare" because they're in the Bible, which I think should be covered as a basic text (it's the most-read book in the world!)
My research into word spacing, font issues, and more is covered on the blog at https://pingtype.github.io (click the Docs or Blog header). Practical suggestions for better web design are also welcome.
Regarding fonts, this is my specific rant that made me move from Heiti to Pingfang. Unfortunately forcing users to download 13 MB of Pingfang font was too slow for mobile, so I decided to disable it for the web version of Pingtype.
Edit: These are the IDS codes of the missing characters. Photo evidence from a paper Bible:
⿱髟煮 chhang.jpg Job 39:19, Job 4:15
⿸疒粒not𤷟 liap.jpg 1Sa 5:6, 1Sa 5:9, 1Sa 5:12 ... (17 found) - also see WikiSource.
⿱⿳亠口冖足 37106亮足 lo-.jpg Deu 1:28, Deu 2:10, Deu 9:2
⿰牜周 tiau.jpg 1Ch 17:7, 1Sa 24:3, 2Ch 14:15 (25 found, although 2Ch 14:15 uses 牧 in the paper version)
Minor typos in the article: In using the word "Horse" to show Chinese character evolution, the "Regular" is marked from 220 AD to 907 AD. As a matter of fact, that kind of characters were almost the "standard" in Chinese before Chinese government simplified many words around 1950. Even now, the Republic of China (a.k.a. Taiwan) still recognizes the "Regular" characters as the standard. Among Chinese people in the world, it it also known as the "Traditional" characters.
Funny enough whenever I see a tattoo on a westerner's body, not only is it usually wrong in the grammar/spelling sense. But it is ugly as hell. Would you let a 5 year old tattoo the word, "Strength" onto your body? That's akin to what I see when I see the typography/style of the tattoo. "Sir, not only does it not say Superman, the characters are backwards and missing strokes"
Any Chinese person who tells you the truth about what your tattoo says is being very kind to you, but most Chinese won't say anything bad since they have no reason to embarrass the person.
One time when I was still in college my family took a trip out to Mexico. I forgot what store we went into but the cashier asked my dad to write down the cashier's name (sorry forgot that too) into Chinese. My dad spent a decent amount of time to think of the proper characters, wrote it down and we were on our way. I still think about that incident a lot, like what if my dad was a jerk and wrote something stupid for this guy to get tattooed (he wouldn't). but even then he's essentially trusting my dad is not messing up his name in Chinese (it definitely wasn't something like Mark).
But in Chinese, “every character has to be adjusted,” says Su of Justfont. “Each one is its own image, with its own design needs.”
That's a key concept not only for font design, but also for learners of Chinese. For certain characters like 醫 you have to scale down or elongate the radicals to be balanced within a unified whole. Add the importance of stroke order and simplified vs. traditional characters, and learning basic writing skills (let alone calligraphy) gets really tricky.
I was once involved with a software project, actually the DOS version of Lotus 1-2-3 2.4J, which bundled some Japanese fonts that were licensed from a Taiwanese font maker. The QA manager told one of the staff to print out every character and check them. I thought it was crazy but the junior guy came back a few weeks later with a list of mistakes that he had found. They were reported to the maker and a new updated version was received. This was at the end of the era when software was distributed on physical media (CDs in this case) and providing updates was a costly business.
I hate to say this, but I don't see the point in maintaining complicated old writing systems. (I mean, of course I see the historical and cultural value, but I don't see why should people keep using it)
You write a "new" Chinese character and then there is: a) no way to represent it on a computer unless you draw it b) no way of knowing how it's pronounced
Latin, Cyrillic, Arabic, Hebrew (ok, they have some common roots), Korean are much more maintainable and "portable".
No, Chinese won't be the new English. You get to write and conversate in English in a short time frame (1 yr). Not Chinese. And certainly the learning curve gets steeper the further you go.
Since we're in this topic: I'm curious, is there any "Google Fonts" for Chinese fonts? That is, a high quality free font repository.
This is a job that is ripe for automation from deep learning.
I wonder if this is something that machine learning could help with? You could train an aesthetic model to make suggestions and tweak as necessary.
When the Macintosh was introduced in 1984, files had a data fork and a resource fork. The data fork was normal file data. The resource fork was an map of (OSType, int16) -> data, where OSType was a four-character resource type identifier such as 'MENU' to specify a menu, 'PICT' for picture, etc.
Sort of like MIME types, which were standardized a decade later.
Resources were limited to, I think, 4MB. 4MB was 32x the RAM capacity, and 10x the [floppy] disk capacity, of the original Macintosh.
One of the OSTypes was 'FONT'. Font data was simply a resource stuck in some file — the system file, or, as a kind of pre-web “web font”, an application.
When we added support for CJK fonts, around 1990, we had to also add OS/file system support for resource sizes > 4MB. (I think the limit was increased to 16MB.)
Resources were a clever invention that facilitated the development of GUI-heavy apps on what by today's standards are ridiculously resource-constrained computers. (The Apple Watch series 3 has more than 60,000 times the RAM of that first Macintosh — although only a third the screen resolution. :-)
Resources also enabled a limited kind of “view source”, that helped a generation of programmers learn their way around Mac application structure. You couldn't view the actual code source, but you could browse the GUI resources of any application you could get your hands on. (This is similar to do the modern web, where the use of webpack, Babel, uglification, and the use of compile-to-js languages, means the actual source code to a complex web site is not accessible, but the assets are.)
As MacOS 10.0, which built on the Unix- (Mach-)based NextOS, resources (multiple data within a file; one OS file per UI file) were replaced by Bundles (many OS files — in a directory — per UI “file”). Bundles are a much better solution for a world with a heterogeneity of operating systems (macOS, Windows, Linux and other Un*xen), where files and tools need to port between multiple file systems. Although bundles come with their own portability problems.
Quartz is publishing such interesting content.
Thanks for this post -- it was an education!
Turtle graphics all the way down.
Can't they just use a shorter set of characters (ie the latin alphabet or the IPA) to write down the pronunciation?