Language Specific Key Generation
Idea
+issues
In order to support multilingual decks, we need name processing to be sensitive to the language of the name.
Current name gem is english centric. Mainly this is the singularization done for key generation. We need to be able to have this language specific key processing be different depending on the language of the name.
related: we may want to consider support for key algorithms as options. Some sites, for example, might want to choose not to handle camelcasing. so WikiRate = wikirate and McCutchen = mccutchen (neither of which are currently true). The hardest part is handling the conflicts when you change from one algorithm to another.
Note that if/when you change key algorithms, you might need a migration both to fix keys and to handle 'discovered duplicates'.
Also related: key algorithms tied to 'type'. For number and dates/times, but could have other uses. Seems that maybe we need to modularize around names. For key processing for sure. But this could be the right time to design how namespaces will be handled in the name gem. Not time to implement, but as long as we have to consider name related modularization, we should handle the high level design issues.
Data types in names relate in the sense that some data is universal, or almost universal (dates and times are mostly universal, but there are style/default variations).
yeah, "discovered duplicates" is what I meant by "conflicts"
key generation can't be directly tied to a cardtype. You get a name, you convert to a key, you lookup the card, and only then do you have a cardtype.
but I think perhaps you're suggesting nametypes? That could be useful, but I wouldn't really want nametypes v contenttypes to become a wagneering (set) distinction. The Wagneer would know Date as a type of card, period.
It should probably be possible for a mod to add power to the key algorithm, either in general or for a specific deck (namespace). We may decide that certain complex names might be best relegated to separate decks (which could somehow be tied to a type). eg /date/01-02-2015.
Dates are actually a good example of why you want languages and decks tracked separately. 15 de Enero and January 15 would be in the same deck but different languages. (in this case probably just one card in db -- mapped)
I don't really mean cardtypes. What I mean is how type shows up in parts of keys. We don't really have that now. Translation type (mode?) is the first feature that really does this.
Not totally sure it is a good idea, but the translation types sort-of introduce it. The idea is that a name part can be recognized as a type. Like happens with constants in most programming languages. Data formats like YAML do something like this with unquoted data, no? Such a type would potentially change with a rename.
Yes, exacty. Dates could have keys that are numeric and mono-lingual and the translations automatic. They can be considered 'strict' with the translation happening by having key to name be language sensitive. The parsing would also have to be language sensitive, but that would be more or less automatic based on selecting the name module configuration for the language.
Agreed on the module points. That is where language parsing would be customized. The only point is that the key should be numeric data for dates that compares the same across languages.
so, I think it's important to use the terms precisely.
a key can't be monolingual OR strict. A translation pattern applies to the name or content of a given Set and describes a relationship among the members of that set in different languages. A key can be associated with either (a) no language (universal) or (b) one specific language. That's it.
No Set can be both monolingual AND strict -- it has to follow one and only one translation pattern.
The pattern you're describing is neither monolingual NOR strict but mapped. With mapped cards, you store them once and handle them differently (effectively as virtual cards) in different languages.
~~
I'm increasingly thinking that language-specific key generation must be reconciled with the notion of universal cards.
In the current plan, an entire name is interpreted in one language context. Consider /en/Matthias+about and /de/Matthias+über. The name of one is a strict translation of the other. Matthias (name) is universal, "about" is English, and "über" is German. If I'm using an "English" key algorithm, then I might to singularize Matthias to Matthia. In the German algorithm, that doesn't happen. We don't know that Matthias is universal until I've looked it up.
I can think of two main approaches:
1. Our name handling is aware of all universal names and name patterns. Whenever a name is initialized in a specific language context, we first determine whether the name is a universal name, which would make it, in some sense, reserved such that it cannot be used in a specific language. This would allow us to keep the language-specific key generation proposed here. Note that in order to create a new universal name (eg, sign up a new user), we have to run through every single language-specific key algorithm out there and make sure the universal name doesn't have a duplicate key with any existing card in any language.
2. Simplify our key handling drop support for plurals and possibly even camelcase. Basically we strip out all non-alphanumerics, lowercase, and boom, we're done. Fairly easy in terms of implementation, but a potential nightmare in terms of migration because of the heavy use of pluralization.
To be honest, I lean a bit towards solution 2 and to trying to handle inflections more intelligently in links / nests. A one-time headache that dramatically simplifies things over the long haul.
What do you call a key that doesn't change with language, but the name does? In other words, you could automatically generate a translated name that would have that key (in a standard (language based?) format.
Well, I guess you're right that it could be mapped. If you wanted the content translated in one of the other modes, the names of course would follow with the content, but the name wouldn't need to be edited when adding a new translation card, whatever the mode is.
I think you are on the right track in thinking about universal names. I don't think you can get away with 2 nor do I like it as a solution. I also suspect that solutions along those (solution 1) lines will fit better as another aspect of the scope logic that is part of what having namespaces implies outside the language context.
Is there also a relationship between universals and an idea like the 'primary' or 'authoritative' translation in other modes? You may prevent collisions with universals, but you might match on authoritative names even outside their native language context.
I'm guessing that if 'Gerry' is a Universal, you could still do es:Gerry in a Spanish context and have that not collide with the universal.
I don't know if the lang:Name syntax is official, but if it isn't we should declare what it is in the doc. I'm thinking that we would use similar syntax for overriding scopes (could be instead of or in additions to the / // /// syntax).
if Gerry is a Universal, then es:Gerry is a reference to that universal.
Then there would be no way to specify a name that has gerry as the key in Spanish that isn't a reference to the universal card with that key?
I still think #2 is going to make more sense. I get complaints about Wagn's weird key patterns all the time. why doesn't WikiRate == wikirate?
If the concern is plurals, perhaps we can support explicit aliasing there?
no, that would break everything. The point of universals is that you need to be able to refer to them in any language context, not in some languageless context.
Well that would give a good, simple path for the migrations. Just add the aliases as needed.
That is mainly about the references table with its use of key as part of the reference, right? The content might have s or not and key (via references) points it to the same card.
Maybe the issue with dates can be handled similarly. Add language specific aliases to universal names. They'd have the same keys, which is fine because they are the same card. Only the alias has a language.
I think I know the right way to use the new terms around this. The term namespace is reserved for talking about the distinct groupings of cards where each must have a unique key, and the other use cases that are about stacking or whatever we call it of these spaces. Formally, I think we use namespace scoping ideas to describe how namespaces work in the different contexts.
For languages, you are saying roughly that there is a namespace for each language, plus a universal one. From the perspective of a given language context, we combine the universal and specific language namespaces in a specific way. Namely, universal names are 'taken' already in any specific language.
In scoping rules within stacked decks, we are again combining namespaces, but with a different logic specific to how they are used (particularly important for rule lookup).
I'm ok with that vocabulary, but that means we need we need NOT to use namespace to mean deck. And, really, since we more often use "deck" to mean an installation, we should probably call these ones "nested decks". So, decko will have two main kinds of namespacing: languages and nested decks.
Yes, exactly. I think splitting those two ideas is important.
Yes, two forms of namespaces, distinct in use and semantics.
So, "nested decks" is fine, but I think what is now a deck, may become decks as in local decks. Even with these, I think the idea of nesting/stacking is similar to mounting a filesystem. This means, I can have local decks, whose namespace in inaccessable until the deck is re-attached. That might be deleting and recovering a Deck card from the trash, which I think we have discussed before.
All the stuff about scoping rules comes in with local namespaces, I think. It could be really useful for distributing modules. We'll have to think on it a lot.
In this lingo, a deck has a single namespace with other namespaces nested and/or connected via the language features.
it's obviously got lots of parallels for mounting, but that doesn't make "mounting" a good word for it. File system metaphors aren't going to speak to a big chunk of our audience, so the question is what that word will sound like to people who don't know its fs meaning.
And while local and remote deck handling will have a lot of parallels, they're also going to have a lot of differences, because there are simply going to be several things that are much harder to implement with remote decks. So, sure, we can have some shared vocabulary, but we need good ways to distinguish the two.
"what is now a deck, may become decks as in local decks"
sigh. I don't know what you're talking about.
We agree that we won't call it mounting (except maybe when explaining the analogy and that is dependent on audience and not the typical one).
Right, I think I tried so say that about remotes. Some things will be harder when remote, but we want to find implementation tricks to make it as close as we need to. The closer the better, but we agree it is hard.
On 'deck' ... local decks, I'm just trying to connect what you said about common parlance calling a wagn instance a deck. I think we want to keep that, but when we do add deck_id, a deck is now decks, expressed as 'nested decks' to capture the idea of a single deck with others attached.
When it goes remote, we just also attach the nested decks of a remote deck, which can of course be repeated, so we still have one namespace, with local and remote decks nested inside it.
I'm trying out the language to make sure we have the same idea of how to use it. If you want other conventions, just express the way you'd like to refer to things.
+discussed in support tickets
+relevant user stories