#unicode - kbin.social

elilla, 1 day ago to random

everybody's hyped for the new 1FAE9 FACE WITH BAGS UNDER EYES, which I mean, :big_mood:, but for you appreciators of text-presentation glyphs, might I draw your attention to the new range "Symbols for Legacy Computing Supplement" (1CC00–1CEBF) that includes several gaming sprites from retrocomputer codesets including Pac-Man, a full set of Space Invaders, 1CC96 FLAPPING BIRD and so on—as well as a full set of box characters for #teletext emulators?

#unicode

https://www.unicode.org/charts/PDF/Unicode-16.0/U160-1CC00.pdf

Another section showing some old terminal drawing elements ("white lower left pointer", "two rings aligned horizontally", "inverse black diamond" etc.) and more game sprites (such as tanks and racing cars and fish in various positions).

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ petes_bread_eqn_xls

exegete, 1 day ago to hebrew

I just stumbled onto something horrifying, neo-Nazi symbolism seemingly hidden away in #Unicode. The first Unicode #Hebrew codepoint, corresponding to א, is u05D0. The integer corresponding to the hex? 1488. You can't convince me that was a mere coincidence.

Who planned this??? #antisemitism

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

krans, 1 day ago

@exegete The first #Unicode #Hebrew codepoint is U+05BE HEBREW PUNCTUATION MAQAF.

It should be possible to check the archive of WG4 minutes and papers to look for corroborating evidence for whether there is a conspiracy or a coincidence. Members of the Unicode standards body hang out on Mastodon and may be interested in investigating further.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

spacemagick, 6 days ago to Futurology

https://xkcd.com/1953/
#XKCD #Unicode #computing #emoji

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ drq

Edent, 6 days ago to webdev

🆕 blog! “Accents and eBooks”

By and large, the English language doesn't use diacritical marks. Even our loanwords are stripped of them; we drink in a cafe rather than the more pretentious café. This has a consequence for HTML and, by extension, eBooks. As a quick primer, modern computing gives us two main ways of displaying a letter with an […]

👀 Read more: https://shkspr.mobi/blog/2024/05/accents-and-ebooks/
⸻
#ebook #HTML #unicode

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Edent

blog, 6 days ago to webdev

Accents and eBooks
https://shkspr.mobi/blog/2024/05/accents-and-ebooks/

By and large, the English language doesn't use diacritical marks. Even our loanwords are stripped of them; we drink in a cafe rather than the more pretentious café. This has a consequence for HTML and, by extension, eBooks.

As a quick primer, modern computing gives us two main ways of displaying a letter with an accent. The first is simple - encode every single accented letter as a separate "pre-composed" character. So è (U+00E8), é (U+00E0), ê (U+00EA, and ë (U+00EB) are all stored as different codepoints.

But this seems a little inefficient and can make it hard to search through text for an exact lexical match.

So there is a second way to add accents. You take the base character - e (U+0065) - and then apply a separate "combining" accent character to it. For example the combining accent ◌́ (U+0301). That means you can add an accent to áńý ĺét́t́éŕ!́

Note, the accent ◌́ (U+0301) is separate from the character ´ (U+00B4). In fact, most accents have a pre-composed, combining, and separate form. This, understandably, causes much confusion!

Here's a good example. I was reading the excellent Fallen Idols, when I noticed this typesetting bug.

The phrase "Swords of Qadisiyyah." But the combining macron over the letter "a" has been rendered as a separate dash.

It's always hard to transliterate languages. The Victory Arch in Iraq is known as قوس النصر, and usually written in English as the "Swords of Qādisīyah".

Examining the HTML code in the eBook, it was obvious that the publishers had used a macron ¯ (U+00AF) rather than the combining version ◌̄ (U+0304).

I've reported it to the publisher. I've no idea if they'll fix it in a subsequent re-issue.

https://shkspr.mobi/blog/2024/05/accents-and-ebooks/

#ebook #HTML #unicode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Edent, 15 days ago to random

The nice thing about finding a typographical mistake in an #eBook is that they files are just HTML.
You can inspect element and see where the bug is.

In this case, they used ¯ when they clearly meant ̄

#Unicode

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ petes_bread_eqn_xls

zirias, 17 days ago to FreeBSD

Hello bsd.cafe 🤩!

I finally did it and moved to a more appropriate "home realm" for a #FreeBSD enthusiast. Thanks @stefano for offering this!

Moving followers worked flawlessly, restoring all my settings was pretty quick, but of course all my old toots are left on https://techhub.social/@zirias 🙈

So I guess I'll introduce myself here by writing a little thread, adding a few of my works that someone might find interesting. But first a bit of "who am I":

I'm a "professional" software architect/developer (mostly #dotnet platform in the day job), FreeBSD hobby-admin and ports committer, #C64 fan (and occassionally coder and even musician), and apart from computers also interested in music (playing a few instruments myself), traveling, cooking, sometimes sports, sometimes politics ... but probably won't toot about any non-technical stuff (or, very very rarely).

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

zirias, 17 days ago

Also quite recent: #dos2ansi. This is a very versatile converter for #MSDOS #ansiart (and other "text") files to a format using #Unicode and only standard #ANSI #SGR escape sequences, so, suitable for today's terminals like #xterm. It includes an ansiart viewer which is "just" a shellscript, leveraging dos2ansi, xterm, less and some nice original #IBM fonts to do its job. So, maybe something for the #retrocomputing fans.

https://github.com/Zirias/dos2ansi

Docs (manpages) are here:
https://zirias.github.io/dos2ansi/

As there was some interest, a #FreeBSD port is available: https://www.freshports.org/converters/dos2ansi

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Wuzzy, 18 days ago to Game

#Repixture 3.15.1 is here!

Sign text will now finally be shown on the sign. Over 60,000 glyphs are supported. A new pole sign can stand on the ground, hang from the ceiling or a wall.

There's also a spyglass item, and moon phases have been added.

▶️ Release notes: https://forum.minetest.net/viewtopic.php?p=435725#p435725
▶️ ContentDB page: https://content.minetest.net/packages/Wuzzy/repixture/

#Game #Minetest #release #sandbox #voxel #Unicode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

lukiss, 18 days ago to random

( // #Unicode #TruchetTiles #SuperCollider
q=[
[[" ","│"," "],["╮","╰","┬"],["╰","┬","╯"]],
[["╭","╯","◦"],["┴","┬","─"],[" ","│"," "]],
[["o","│"," "],["─","╯","╭"],[" ","╭","╯"]],
[[" ","╰","╮"],["╮"," ","╰"],["╰","╮"," "]],
[["╭","┴","╮"],["┤","°","╰"],["╰","╮","◯"]],
[["╭","┴","╮"],["┤","O","├"],["╰","┬","╯"]],
];
8.do{
t=q.pyramid(3).scramble;
t[0].size.do{|l|
t.size.do{|n|t[0].size.do{|i|t[n][l][i].post}};
Post.nl
}
});

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mcc, liaizon, Binder, yaxu +1 more

ramsey, 22 days ago to random

I found what I think is an edge-case bug in ICU. It’s unlikely to impact most folks, unless you’re trying to run the ECMA-402 test suite.

https://unicode-org.atlassian.net/browse/ICU-22765

#Unicode #ICU4C #ECMA #ECMA402

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Girgias

SnoopJ, 22 days ago to mahjong

my recent interest in #mahjong has collided with my on-going interest in #Unicode as I remember that the block U+1F000 through U+1F02B are allocated for encoding tiles

🀀🀁🀂🀃🀄🀅🀆🀇🀈🀉🀊🀋🀌🀍🀎🀏🀐🀑🀒🀓🀔🀕🀖🀗🀘🀙🀚🀛🀜🀝🀞🀟🀠🀡🀢🀣🀤🀥🀦🀧🀨🀩🀪🀫

this information has no practical use to me, but it's nice that the UCS represents them

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ villares

Wuzzy, 23 days ago to gamedev

This is my sign test wall where I've been testing Unicode rendering on the signs in #Repixture, using various #Unicode strings.

🟩 green = OK
🟨 yellow = Unsupported, but renders as U+FFFD REPLACEMENT CHARACTER (also OK)
🟥 red = FAIL

I think I should maybe call it a day.

#GameDev #Minetest

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

Wuzzy, 23 days ago to gamedev

How #Arabic should render (left) vs how it would render on #Repixture signs (right) if I would re-enable it.

RTL support works, but that's not good enough. If I understand correctly, the glyphs also need to connect.

But then, even #GNU #Unifont (the font I use) doesn't seem to have the neccessary glyph variants.

#GameDev #Unicode #Minetest

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

Wuzzy, 23 days ago to gamedev

Turns out rendering #Arabic is hard. No, you can't just implement the #Unicode #Bidirectional Algorithm and call it a day. It turns out the Arabic letters/symbols/? have different forms depending on where they are in the word and probably there are other non-trivial features. Yeah, I guess I just postpone this.

So my 1000 IQ workaround for now is to just render all arabic characters as U+FFFD REPLACEMENT CHARACTER for now.

#GameDev #Repixture #Minetest

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

codepoints, 1 month ago to random

Hey! A new blog post!

https://blog.codepoints.net/emojis-under-the-hood.html

Emojis under the Hood

in which I explain how #emojis are composed on the #Unicode code point layer, and what funny effects that sometimes has.

With notable mentions of work by @CharlotteBuff, @Edent, @eevee, @mathias, and @emojipedia.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

Edent, 1 month ago to php

🆕 blog! “Where you can (and can't) use Emoji in PHP”

I was noodling around in PHP the other day and discovered that this works: <?php $🍞 = "bread"; echo "Some delicious " . $🍞; I mean, there's no reason why it shouldn't work. An emoji is just a Unicode character (OK, not just a character - but we'll get on to that), so it should […]

👀 Read more: https://shkspr.mobi/blog/2024/04/where-you-can-and-cant-use-emoji-in-php/
⸻
#emoji #php #unicode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ krinkle, Edent, ziegenberg, phpc

blog, 1 month ago to php
Where you can (and can't) use Emoji in PHP
https://shkspr.mobi/blog/2024/04/where-you-can-and-cant-use-emoji-in-php/

I was noodling around in PHP the other day and discovered that this works:
<?php$🍞 = "bread";echo "Some delicious " . $🍞;
I mean, there's no reason why it shouldn't work. An emoji is just a Unicode character (OK, not just a character - but we'll get on to that), so it should be fine to use anywhere.

Emoji work perfectly well as function names:
function 😺🐶() {   echo "catdog!";}😺🐶();
Definitions:
define( "❓", "huh?" );echo ❓;
And, well, pretty much everywhere:
class 🦜{    public int $🐦;    public ?string $🦃;    public function __construct(int $🐦, ?string $🦃)    {        $this->🐦 = $🐦;        $this->🦃 = $🦃;    }}$🐓 = new 🦜(1234, "birb");echo $🐓->🐦;
How about namespaces? Yup!
namespace 😜;class 😉 {    public function 😘() {        echo "Wink!";    }}use 😜😉;$😊 = new 😉();$😊->😘();
Even moderately complex Unicode sequences work:
echo <<<🏳️‍🌈Unicode is magic!🏳️‍🌈;
I've written before about the Quirks and Limitations of Emoji Flags. The humble 🏳️‍🌈 is actually the sequence U+1F3F3 (white flag), U+FE0F (Variation Selector 16), U+200D (Zero Width Joiner), U+1F308 (Rainbow).

Take a complex emoji like "Female Astronaut with Medium Dark Skin Tone" - 🧑🏾‍🚀 - that also works!
$🧑🏾‍🚀 = 1;$👷🏻‍♂️ = 2;echo $🧑🏾‍🚀 + $👷🏻‍♂️;
Probable the most complex emoji has 10 different codepoints! It looks like this - 🧑🏾‍❤️‍💋‍🧑🏻

And it works!
$🧑🏾‍❤️‍💋‍🧑🏻 = "Kiss Kiss. Bang Bang!";echo $🧑🏾‍❤️‍💋‍🧑🏻[-1];
There are some emoji which don't work;
$5️⃣ = "five";
The 5️⃣ emoji is U+0035 (Digit Five), U+FE0F (Variation Selector 16), U+20E3 (Combining Enclosing Keycap). PHP doesn't allow variables to start with digits, so it craps out with PHP Parse error: syntax error, unexpected integer "5", expecting variable or "{" or "$" in php shell code on line 1

You also can't use "punctuation" emoji as though they were normal characters:
echo 5 ❗= 6;
And, while not strictly emoji, you can't use mathematical symbols:
echo 5 ≤ 6;
So, there you have it. Is this useful? Well, probably. It is easy to get lost in a sea of text - so little pictograms can make it easier to see what you're doing. If the basic ASCII characters aren't part of your native language, perhaps it is useful to make use of the full range of Unicode.

Does your favourite programming language support Emoji?

https://shkspr.mobi/blog/2024/04/where-you-can-and-cant-use-emoji-in-php/

#emoji #php #unicode
reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mobileatom, symfonystation

seav, 1 month ago to random

As soon as I saw this video on my YouTube feed, I immediately sussed out what was going on despite having not seen the actual video being referred to but only having come across the title.

I lurked at the badsite and saw that I was right. (And I hate that I had to go to the badsite to confirm. 🤷‍♂️)

https://youtu.be/QztFpzKsdeA

#MKBHD #MarquesBrownlee #TechReviews #HumaneAI

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

seav, 1 month ago

Tangentially related. As a coder who has had to deal with Unicode/encoding issues, this is freaking hilarious! 🤣

https://youtube.com/shorts/z301SnrlBv4

#Unicode #encoding #mojibake

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

slink, 1 month ago to random

changes to #unicode
contradict #unicode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

zirias, 1 month ago to FreeBSD

🚦🚥 ... ok it works 🌋

A super-simple #emoji keyboard for #x11.

Well, I did have to fiddle with the keymap.

And I had to add delays 🤯👹 (otherwise there are races between keymap changes and keyboard events).

And I had to misuse the #Xtest extension, cause applications ignore "synthetic" events. 🫥😣

But hey, it works 🕺

Now needs some basic, uhm, "features" (like recently used, like search by name).

https://github.com/Zirias/qxmoji
#BSD #FreeBSD #Linux

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

zirias, 21 days ago

#qXmoji v0.7 released!

https://github.com/Zirias/qxmoji/releases/tag/v0.7

This brings several improvements, mainly in the build system, but the major change is support for localization, with translated Emoji names imported from #Unicode #CLDR. I added a German translation, see screenshot. Once again, I'd appreciate more translations, the process to translate is documented here:
https://github.com/Zirias/qxmoji/blob/master/TRANSLATE.md

Updated FreeBSD port:
https://people.freebsd.org/~zirias/patches/0001-x11-qxmoji-Add-new-port.patch

#X11 #emoji #keyboard #FreeBSD #Linux

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jhx

youronlyone, 1 month ago to linguistics

It's easier to use Hangeul and Kana to write pronunciations of Filipino words, than to use Filipino diacritical marks.

Last we were taught about Filipino diacritical marks was in Grade 4 or 5 (early 90s). I don't know why, but after that diacritical marks were totally forgotten.

Tracking it down, IIRC, it was late 90s / early 00s when it was officially removed by the KWF.

Sometime 2010, the KWF brought diacritical marks back, though limited.

In 2014 (or was it 2016?) the KWF introduced a new diacritical mark, the Filipino schwa. It didn't exist before. There are only like 4 Philippine languages with a schwa vowel. They added it in Filipino so words from those Philippine languages can be integrated into the Filipino language.

Here's my problem, no matter how many times I read the KWF document on Filipino diacritical marks, I can't get my head around it. 🤪 I understood it differently, or I remembered them incorrectly. 🤷🏽‍♂️ Or! I've been pronouncing a lot of words wrongly! 🤦🏽‍♂️

However, when I use Hangeul and Kana, I don't have to worry about diacritical marks. Both scripts have stable pronunciations, not like Latin characters where we have to use diacritical marks.

The only catch, the reader should be able to read Hangeul or Kana scripts, which most don't. 🤔 So, back to trying to get a grasp of Filipino diacritical marks. 🤯

Am I right that the Filipino diacritical marks represent the sound?

Examples:

e = neutral = abrupt soft stop?

è = high to low = abrupt hard stop? (paiwa?)

é = low to high = malumay? (malumanay?)

ê = low to high to low = ??

ë = the new Filipino schwa (no idea, since I don't speak the few Philippine languages where a Filipino schwa is needed).

Any experts out there?

(In the revived diacritical marks, we no longer use ē. IIRC, it used to represent a long vowel sound.)

#Wika #Language #Filipino #Tagalog #Latin #Hangeul #Kana #LearningFilipino #MatutoMagFilipino @pilipinas @philippines @pinoy

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

silsinn9821, 1 month ago

@youronlyone So, in other words, they made it so even old computers who can do #ISO-8859-1 / #Windows-1252 but not #Unicode can still type such words with diacritics via Alt+NumKey combinations. But ē only exists in Unicode (maybe it first showed up on #Windows-1257 but that was only used by Baltic languages), so it can't be typed via Alt+NumKey codes (& not everyone knows that CharMap is a thing).
@pilipinas @philippines @pinoy

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lpwaterhouse, 1 month ago to random

I am currently designing a small toy-language and was considering making all strings proper #Unicode objects and all source files utf-8. Lo and behold, Unicode has recently published some guidance: #TR55 http://www.unicode.org/reports/tr55/ I am, however, rather deeply concerned about the general strong preference for #blocklists over #allowlist, e.g. as recommended for identifiers. I get wanting to allow people to use their own language and script wherever possible, and therefore recommending switching from e.g. requiring type names to start with an upper-case character to blocking an initial lower-case character, thereby allowing the use of unicameral (without upper and lower case) scripts. But I have this deep gut-feeling that while the TR certainly solves some existing #vulnerability classes, it also opens up a huge amount of new ones with this general attitude. I haven't yet gone through the TR with a fine-toothed comb to allay that fear, but I'd appreciate input from anyone that has thoughts on the matter.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ lpwaterhouse

Edent, 1 month ago to random

Of course, we all know what the Fab Four enjoyed a bit of…

U+1FAB4 😉

#Unicode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SnoopJ, 5 months ago to random

[sobbing]
n-nice

https://www.unicode.org/L2/L2023/23260-eye-bags-face-emoji.pdf

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ onelson

SnoopJ, 1 month ago

NICE, it looks like FACE WITH BAGS UNDER EYES may have been approved, it's listed in the Emoji 16 alpha repertoire

https://www.unicode.org/L2/L2024/24112-pri498-emoji-v16-alpha.pdf

(Presumably it would have been decided as part of UTC #179 but the agenda/minutes for that meeting aren't yet posted)

#Unicode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ onelson