handle breaking characters in card names+discussion

Create an individual ticket for each character? Or try to group together related issues (eg post-creation redirect to New Card with pre-character text)? Or...?

 


 

Also see identify problem characters for search

 


 

Pre-ticket system (early 2008 or maybe even 2007):

Illegal characters are allowed in double-square bracket links, and if you click on one, it will let you try to create a page, parsing should be done letting you know about illegal links before this point, try [c++][http://wagn.org/wiki/c++] --[Mike Buland][http://wagn.org/wiki/Mike_Buland] [I think we need a pretty robust answer to this -- there is a whole class of related problems. I'm not sure it should be fixed in the link, but it should definitely be caught earlier than it is. Happy to do a design session -- efm.]

 


There are a *lot* of problem characters implied by the non-US money symbols (will test this a bit). Maybe better to whitelist allowed characters?

--John Abbe.....Sun Nov 30 12:01:41 -0800 2008


yah, even things like é ä ø √ have the same issues

--John Abbe.....Sun Nov 30 12:17:32 -0800 2008

 


Lew, I'd still like to look at using the card key for links whenever possible. Without having looked carefully at the bugs, my guess is that that would help. --Ethan McCutchen, 1 Dec 2008

 


See http://sandbox.wagn.org/wagn/name_testing_2

It's more consistent now, which is good. But there are still serious problems:

If you create a card "nordic äccent" via link you get a card name "nordic_auml_ccent" but if you create it by New Card you get "nordic ccent" (same with <, >, & and some (non-ASCII?) other symbols)

If you create a name through the New Card interface, you can have some (ASCII?) symbols, and through inclusion you can have almost any symbols. Cards created via link have their name converted to the key format. This seems broken to me. However i named a card in a link, that's what i want the name of the card to be.

Right square bracket still causes the link to be unrecognized.

Just to help me understand - are keys restricted to alphanumeric and underscore?

  --John Abbe.....Mon Dec 22 22:22:40 -0800 2008


yes, keys are restricted to alphanumeric characters and underscore.

Because work on these issues has lots of side effects, I don't think we should try to make lots of other changes in 0.11, except perhaps to double-check the escaping on name submissions for new cardss. However, I want to make sure the 0.11 page indicates the major improvements.

Could you refactor this ticket so that the Issues section is much shorter, the solution indicates what was accomplished by Wagn 0.1, and the examples section focuses on what remains to be done? Either that or new ticket. So long as there is record of what's been done and what hasn't.

  --Ethan McCutchen.....Tue Dec 23 10:36:31 -0800 2008


This from the email:

while it's true that we're now using the key in the URL, it's important that we separate out two issues:

1. the original intent of the key (to support case / space variants)
2. the url interface

We've been using keys for a long time now to support case variations, including singularizing each word in a long name. The key has only recently been exposed to users, which is why you're now responding to some It makes complete sense that you would singularize multiple words. If we don't want separate cards for "Lunch" and "Lunches," then why do we want separate cards for "Lunch in New York" and "Lunches in New York"? I don't think there's been any situation where the problem has originated from purpose #1, only from purpose #2 -- you don't think it looks good.

Similarly, the strange singularizations (like ye for yes, plu for plus, and connectipedium for connectipedia) have only caused problems when users can see them, whereas the benefits of singularizing have been pretty huge, imho.

So I suspect the solution is to disentangle the issue. We'll create a less modified version of the title for the url interface that still strips most weird characters but retains case and pluralization.

  --Ethan McCutchen.....Mon Dec 29 12:12:31 -0800 2008


ok, I think now we should have much more consistent behavior. I added a new function to generate url keys that don't change case or pluralization but still strip out weird characters. As we go we may decide to consider fewer characters "weird". Underneath, the key still works the same way.

I also fixed the bug with links by returning to what we had on nonexistent cards -- full CGI escaping. This means the urls will sometimes have lots of %'s and such in them (which, note, are missing from the existing card urls where we don't have to preserve as much information). There may be cases where they are breaking, but I don't know of them.

My recommendation is that we close this card if the core improvements are working well and make some more, narrower tickets. I would say any specific character issues, like the handling of right square brackets in links, will likely be low priority, whereas better handling of non-ASCII characters should be at least medium, since we're seeing a lot of those cards out there, and it's a huge (and not particularly difficult) first step towards internationalization. It does merit some design thought, however.


  --Ethan McCutchen.....Mon Dec 29 12:20:55 -0800 2008