Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000913Core InformReleasing, bibliographic data, cBlorbpublic2012-05-04 21:022014-05-07 07:33
Assigned Tograham 
Platformx86OSWindowsOS Version7
Product Version6G60 
Target VersionFixed in Version6L02 
Summary0000913: It is impossible to include Unicode substitutions—and therefore fancy dashes—in bibliographic metadata
DescriptionInform flattens all dashes to the ASCII hyphen minus (this is spec, per WI 5.10). Furthermore, we can't specify e.g. [unicode 8212] in bibliographic metadata (well, we can, but it doesn't produce the relevant Unicode character) since text substitutions can't be used in bibliographic metadata. As a result, there's no obvious way to produce fancy dashes in such metadata.

The example uses an en dash to specify a numerical range, and em dashes to set off a parenthetical remark. But none of those are rendered as dashes, and replacing them with actual Unicode characters produces ordinary hyphens (which are, oddly enough, not printed correctly by interpreters for the banner text and are called out as "something not a string" -- this is probably a distinct bug involving an encoding error (my wild guess is that some UTF-8 continuation bytes were left behind by the hyphen conversion (and Inform cleaned them up when doing iFiction, which is why this only happens to banner text), but I honestly have no idea)).
Minimal Source Text To Reproduce
The story headline is "Adventures in [']68[unicode 8211]79".

The story description is "This is a sentence[unicode 8212]with a parenthetical[unicode 8212]kind 
of like how William Shatner speaks."

There is room.
Additional InformationiFiction is explicitly a UTF-8 format under the treaty, and should be able to handle any fancy characters we care to throw at it; there is no immediate issue with putting general Unicode characters into metadata.

Some discussion, which unfortunately went off-topic rather quickly, is at [^]

It's not immediately obvious to me how to fix this, but I do feel it's a deficiency in Inform (and arguably a failure to fully comply with the treaty obligation for iFiction UTF-8 support), and not a feature request. I would recommend allowing the limited use of [unicode 1234] substitutions in metadata, in cases where the numbers are simple and constant (no [unicode the number of on-stage people] for instance), so that this sort of problem will be universally solved.

An alternative would be to drop hyphen conversion entirely. However, entering raw unicode characters into metadata is often inconvenient for authors; supporting [unicode 1234] would be helpful in the long run, but this is getting into feature request territory.
TagsNo tags attached.
Effect(serious/mild) Game compiles but misbehaves
Attached Files

- Relationships
related to 0000926closedgraham It is impossible to include fancy dashes in strings without using Unicode substitutions 

-  Notes
zarf (developer)
2012-05-05 13:34

I think we should separate the issues:

- Putting Unicode characters in metadata. An author might do these by typing them literally in the UTF-8 source code, or by using [unicode NUMBER] substitutions, or by using [unicode NAME] substitutions with the "Unicode Character Names" extensions. Ideally all of these methods would work equally well.

- The special-case treatment of fancy dashes (described in 5.10). I'm not convinced these need to be simplified inside quoted text. It can be worked around by saying "[unicode 8212]", but that doesn't work in metadata, which returns us to the previous point.
zarf (developer)
2012-05-05 13:42

(However, I don't buy the argument that the Babel spec requires Inform to support em-dashes. It just says that the ifiction file's encoding is UTF-8. It doesn't say that Inform must be able to generated all possible Unicode characters. It can't, in fact -- it chokes on Unicode values beyond 65535.)
EmacsUser (manager)
2012-05-26 17:26

I've kept this bug for the first point in 0000913:0001667; see the related issue for the second.
graham (administrator)
2014-03-09 01:56

This was absolutely a suggestion, not a bug, in spite of generating bug reports 0000913 and 0000926, but I've implemented it; Unicode substitutions are now legal in bibliographic data.

- Issue History
Date Modified Username Field Change
2012-05-04 21:02 NYKevin New Issue
2012-05-05 13:34 zarf Note Added: 0001667
2012-05-05 13:42 zarf Note Added: 0001668
2012-05-26 17:23 EmacsUser Issue cloned 0000926
2012-05-26 17:23 EmacsUser Relationship added related to 0000926
2012-05-26 17:26 EmacsUser Note Added: 0001671
2012-05-26 17:26 EmacsUser Summary It is impossible to include fancy dashes in bibliographic metadata => It is impossible to include Unicode substitutions—and therefore fancy dashes—in bibliographic metadata
2012-05-26 17:27 EmacsUser Status new => confirmed
2014-03-09 01:56 graham Note Added: 0002535
2014-03-09 01:56 graham Status confirmed => resolved
2014-03-09 01:56 graham Resolution open => fixed
2014-03-09 01:56 graham Assigned To => graham
2014-05-07 07:32 jmcgrew Fixed in Version => 6L02
2014-05-07 07:33 jmcgrew Status resolved => closed

Copyright © 2000 - 2010 MantisBT Group
Powered by Mantis Bugtracker