MantisBT - Core Inform
View Issue Details
0000913Core InformReleasing, bibliographic data, cBlorbpublic2012-05-04 21:022014-05-07 07:33
(serious/mild) Game compiles but misbehaves
0000913: It is impossible to include Unicode substitutions—and therefore fancy dashes—in bibliographic metadata
Inform flattens all dashes to the ASCII hyphen minus (this is spec, per WI 5.10). Furthermore, we can't specify e.g. [unicode 8212] in bibliographic metadata (well, we can, but it doesn't produce the relevant Unicode character) since text substitutions can't be used in bibliographic metadata. As a result, there's no obvious way to produce fancy dashes in such metadata.

The example uses an en dash to specify a numerical range, and em dashes to set off a parenthetical remark. But none of those are rendered as dashes, and replacing them with actual Unicode characters produces ordinary hyphens (which are, oddly enough, not printed correctly by interpreters for the banner text and are called out as "something not a string" -- this is probably a distinct bug involving an encoding error (my wild guess is that some UTF-8 continuation bytes were left behind by the hyphen conversion (and Inform cleaned them up when doing iFiction, which is why this only happens to banner text), but I honestly have no idea)).
The story headline is "Adventures in [']68[unicode 8211]79".

The story description is "This is a sentence[unicode 8212]with a parenthetical[unicode 8212]kind of like how William Shatner speaks."

There is room.
iFiction is explicitly a UTF-8 format under the treaty, and should be able to handle any fancy characters we care to throw at it; there is no immediate issue with putting general Unicode characters into metadata.

Some discussion, which unfortunately went off-topic rather quickly, is at [^]

It's not immediately obvious to me how to fix this, but I do feel it's a deficiency in Inform (and arguably a failure to fully comply with the treaty obligation for iFiction UTF-8 support), and not a feature request. I would recommend allowing the limited use of [unicode 1234] substitutions in metadata, in cases where the numbers are simple and constant (no [unicode the number of on-stage people] for instance), so that this sort of problem will be universally solved.

An alternative would be to drop hyphen conversion entirely. However, entering raw unicode characters into metadata is often inconvenient for authors; supporting [unicode 1234] would be helpful in the long run, but this is getting into feature request territory.
No tags attached.
related to 0000926closed graham It is impossible to include fancy dashes in strings without using Unicode substitutions 
Issue History
2012-05-04 21:02NYKevinNew Issue
2012-05-05 13:34zarfNote Added: 0001667
2012-05-05 13:42zarfNote Added: 0001668
2012-05-26 17:23EmacsUserIssue cloned0000926
2012-05-26 17:23EmacsUserRelationship addedrelated to 0000926
2012-05-26 17:26EmacsUserNote Added: 0001671
2012-05-26 17:26EmacsUserSummaryIt is impossible to include fancy dashes in bibliographic metadata => It is impossible to include Unicode substitutions—and therefore fancy dashes—in bibliographic metadata
2012-05-26 17:27EmacsUserStatusnew => confirmed
2014-03-09 01:56grahamNote Added: 0002535
2014-03-09 01:56grahamStatusconfirmed => resolved
2014-03-09 01:56grahamResolutionopen => fixed
2014-03-09 01:56grahamAssigned To => graham
2014-05-07 07:32jmcgrewFixed in Version => 6L02
2014-05-07 07:33jmcgrewStatusresolved => closed

2012-05-05 13:34   
I think we should separate the issues:

- Putting Unicode characters in metadata. An author might do these by typing them literally in the UTF-8 source code, or by using [unicode NUMBER] substitutions, or by using [unicode NAME] substitutions with the "Unicode Character Names" extensions. Ideally all of these methods would work equally well.

- The special-case treatment of fancy dashes (described in 5.10). I'm not convinced these need to be simplified inside quoted text. It can be worked around by saying "[unicode 8212]", but that doesn't work in metadata, which returns us to the previous point.
2012-05-05 13:42   
(However, I don't buy the argument that the Babel spec requires Inform to support em-dashes. It just says that the ifiction file's encoding is UTF-8. It doesn't say that Inform must be able to generated all possible Unicode characters. It can't, in fact -- it chokes on Unicode values beyond 65535.)
2012-05-26 17:26   
I've kept this bug for the first point in 0000913:0001667; see the related issue for the second.
2014-03-09 01:56   
This was absolutely a suggestion, not a bug, in spite of generating bug reports 0000913 and 0000926, but I've implemented it; Unicode substitutions are now legal in bibliographic data.