The proposed multilingual functionality is intended to be a general solution applicable to all Wagns. However, it has been inspired by Wikirate.org, and many of the examples below will borrow from that site's structure.
The proposal seeks to honor these constraints/considerations:
- everything is a card.
- a user should only see content in a language that he/she understands unless there are explicit instructions to break that pattern.
- a user who understands multiple languages has the potential to play a special role in an international community: translator.
The essence of the proposal is that there will be two new Setting: *name translation and *code translation. rules based on these Setting will initially support these values: universal, monolingual, strict, free, and patterned.
Anyone reading this is warmly invited to contribute use cases to explore how they may be addressed in the proposed system.
Standard practice among wiki communities, most notably Wikipedia, is to have separate sites for different languages. There is an en.wikipedia.org and an nl.wikipedia.org, and while there are some connections between them (same technology, use of Wikidata, shared governance...), they are, by and large, separate projects with separate communities.
A Wikipedia-style solution will not work for many Wagn sites, many of which are intended to support one set of unified data in multiple languages. For example, consider Wikirate.org. As the name implies, quantitative data is quite central to WikiRate, and rich, nuanced interactions with quantitative data are core to what we are trying to achieve. Many of those numbers (eg transparency scores and voting that feeds into them) are based on micro-interactions on the site, and our vision of transparency for those numbers depends on our being able to see exactly where they all come from.
All of this means implementing WikiRate's vision on separate sites, as Wikipedia does, would not work. We do not want companies to score differently in different languages, nor do we want to multiply a company’s' reporting overhead by asking them to respond to the same questions on multiple sites. We don't want transparency scores to be based on separate sites that users can't see. Perhaps most importantly, we want the world-wide community of wikirate users to be able to speak with a unified voice in pushing to "make companies clear" and to enjoy facing the cross-cultural challenges of working on this together.
Current problems / bugs:
The following issues are problematic even in monolingual (non-English) contexts:
- lots of hard-coded English, no default content for other languages.
- the current name key mechanism is based on English pluralization
- international content basically works, but sometimes (as in inclusion and probably links) content is breaking largely because of unicode characters getting treated as html entities. We may be able to change this behavior by messing with this tinyMCE setting (they've changed their dosc infrastructure, probly need new link):
Name Uniqueness For example, it’s always been the case that, ...
Data Representation We will need to make, at a minimum, this alteration to the cards table:
cards + lang + translatee_id The basic ...
- In the case of “Strict”, this proposal assumes that a given word can be translated between two languages in a
Found ICU and ruby bindings -- ...
I need to add a lot more examples...
Playing with more general character classes: https://gist.github.com/GerryG/5f2993f262fbe14f57f2
I also updated the link in the old discussion for that ICU library for ruby.
Cool proposal, questions coming up as I read it. There is some footprint of this in the cardnames, and we should be able to translate the user presentation of a lot of system features by just having cardnames for the same card. Note Numeric Name Parts, which would make the key representation Universal for some name parts, whether or not there are existing names in different languages.
I think some of your monolingual examples could be Strict or maybe that is Mapped. The codenames will be mono-lingual, but multiple names and sometimes content would allow *create and *read to just have translated names doing most of the work. You seem to be connecting Strict with contractual things, but it can be used with functional things too. Maybe there is space between Strict and Free and the tools you envision to maintain strict translations can instead tell you how much in sync the different versions are and which ones are authoritative. If more than one is considered authoritative, they shouldn't contradict, they should convey as close as possible the same meaning.
Are Strict and Mapped related? Maybe one is just more specific, the other a subset of other, functionally. You'll desire more translations for a mapped card, and will want to translate updates, but would be more tolerant of being temporarily out of sync. Would you want to list required languages in a rule or something for Strict cards?
Good points about numeric name parts. I like that idea a lot. (Will follow up there at some point)
Strict and Mapped are quite different. Strict refers to *actual* translations, where Mapped refers to virtual translations. You would not want to store mapped translations in the database, but strict translations *must* be stored there. So, no, I wouldn't consider one a subset of the other.
"You seem to be connecting Strict with contractual things, but it can be used with functional things too". I want very much to avoid that. Wherever possible, functional things should only be represented once and then translated automatically.
I agree that some of the Monolingual examples could be Mapped. The most common case for Mapped is Pointers (cards which are entirely comprised of mapped references), and several of the rules I mentioned as Monolingual candidates (eg permissions) are pointers and thus probably more naturally mapped. *structure rules are a little more ambiguous, because it's possible to put non-referential natural language content in them, but in general that's going to be a poor choice in a multilingual context, so they, too, may make sense to treat as Mapped in multilingual sites.
I also resonate with your thoughts about the strict translation having a canonical version. Actually, I kind of think all strict translations will need to set one of the versions as canonical and then update from there, though we will want to be able to change which is canonical. The data representation embraces this. But I would say that the idea that translations "shouldn't contradict" isn't really "between Strict and Free"; that's just Strict. That's pretty much how I would define it.
All I'm saying is that there is a spectrum of both translation quality and synchronization. Perfection isn't really an option, but "strict" will represent a place pretty close to that end of the scale, but lots of times things will be in flux. I'm saying metrics relating to updates (sync) and quality (high standard of strictness) will be good to have for any community target required.