Syntax 0.8 RFC3 until November 4

stas · October 30, 2018, 12:29pm

I’d like to give you all an update on the progress of Syntax 0.8 after the last week’s RFC. There are a number of open questions regarding a few of the proposals and I’d like to invite everyone to share their thoughts here or on GitHub. I’ll wait until the end of the week before moving forward.

We went ahead with the removal of backslash escapes from TextElements (#123). I prepared a PR and expect to merge it today. In 0.8, backslash escapes may only be used in StringLiterals.
Related, in #194 we’re considering extending (or changing) the syntax of Unicode escape sequences. Right now, they’re written as \uXXXX, where X stands for a hex digit, e.g. \u00A0 is a non-breaking space. We’d like to allow escape sequences for Unicode characters whose code points are longer than 4 digits. If you ever used such escape sequence, your feedback on the desired syntax will be appreciated!

Preserving content identation: we have a working patch in #162 but there’s still one detail which needs resolution. Let me try to summarize it here. Feel free to ask if something isn’t clear.

In Fluent, leading whitespace is removed from translations.

  # These are equal. The translations start with "Lorem…".
  lipsum = Lorem ipsum dolor sit amet enim.
  lipsum =     Lorem ipsum dolor sit amet enim.

To preserve leading whitespace, an explicit StringLiteral must be used.

  # The translation starts with "    Lorem…".
  lipsum = {"    "}Lorem ipsum dolor sit amet enim.

The proposal to preserve the indentation of multiline translations removes the largest indent common to all lines of the translation and keeps the rest.

  # The 4 spaces before "Suspendisse" are part of the translation.
  lipsum =
      Lorem ipsum dolor sit amet enim. Etiam ullamcorper.
          Suspendisse a pellentesque dui, non felis. Maecenas 
      malesuada elit lectus felis, malesuada ultricies.

Which creates an interesting question about what should happen when there’s extra indent on the first line of the translation:

  # Should this translation start with "    Lorem", or should
  # Fluent trim all leading whitespace here as well?
  lipsum =
          Lorem ipsum dolor sit amet enim. Etiam ullamcorper.
      Suspendisse a pellentesque dui, non felis. Maecenas 
      malesuada elit lectus felis, malesuada ultricies.

In a comment on GitHub I explained why I’m leaning towards keeping the initial indent in such cases. It all boils down to the expectations localizers will have seeing such messages. I’d love to hear your thought on this!

The parameterized terms (macros) proposal (#176) was designed to supersede VariantLists. I’m looking for use-cases which VariantLists solve well and which cannot be catered to with macros. As reminder, VariantLists allow localizers to define multi-faceted values of terms; each value can correspond to a grammatical case, plural form, mutation based on a position in a sentence etc. Macros can do that too, but the way variant are accessed is different between VariantLists and macros. I prepared a gist to illustrate the differences, based on the current localizations of Firefox into Italian and Ukrainian. From the gist:
```
  # BEFORE (VariantLists)
  sync-signedout-account-title =
      Connetti il tuo { -fxaccount-brand-name[lowercase] }

  # AFTER (Macros)
  sync-signedout-account-title =
      Connetti il tuo { -fxaccount-brand-name($first-letter: "lowercase") }
```

We’ll wait a few days to give everyone a chance to share their thoughts on these issues. Please give your feedback until the end of the week. On Monday, November 5th, I will triage the proposals again and decide which ones can be merged and which ones need more work or should be dropped. Thank you!