lpwaterhouse, I am currently designing a small toy-language and was considering making all strings proper #Unicode objects and all source files utf-8. Lo and behold, Unicode has recently published some guidance: #TR55 http://www.unicode.org/reports/tr55/ I am, however, rather deeply concerned about the general strong preference for #blocklists over #allowlist, e.g. as recommended for identifiers. I get wanting to allow people to use their own language and script wherever possible, and therefore recommending switching from e.g. requiring type names to start with an upper-case character to blocking an initial lower-case character, thereby allowing the use of unicameral (without upper and lower case) scripts. But I have this deep gut-feeling that while the TR certainly solves some existing #vulnerability classes, it also opens up a huge amount of new ones with this general attitude. I haven't yet gone through the TR with a fine-toothed comb to allay that fear, but I'd appreciate input from anyone that has thoughts on the matter.
Add comment