Fix match searching in names

Ticket

+commit
 

"match" won't work on just one property (like name or content); it always searches both of those.

 

the "match" operator in WQL was implemented mostly with the site search (search bar) in mind, and it's a bit wonky in there.  In sites with postgres fulltext working (like all the wagn sites run by GC), the search is implemented to search on an index of name and content.

 

but if you search on {:name=>[:match,'hosting']}, for example, you clearly want just the ones that match name.

 

Currently most operators mean "content" when a property is not specified.  With "match", I think it's ok for the default to be a combination of properties (content and name).

 

However, if the property is specified, we need to search on just that property, even if it's with less sophisticated matching.

 

it would also be great to have tests for most all of the operators.  I have a hunch there are several others not performing properly that we haven't flushed out yet.

 

 

It's searching on content, not just name.

 

The don't search content of hard-formatted cards ticket shows where some of the unwanted content comes from.  Eg., to see the "matt" in mattcoop, look in the Changes tab of mattcoop).

 

 

Interested again, from wanting to do a Search for all cards that don't have a * in them, but that's not going to work given that we just strip *s out. Any ideas on how to accomplish an "all cards" list?

--John Abbe.....Thu Oct 16 13:03:13 -0700 2008


I fixed it so that name and content can be matched independently. If you do :match=>X, then it will still match both. But now :content=>[:match,X], it will get just content, and :name=>[:match,X] now just gets name.

  --Ethan McCutchen.....Wed Mar 18 18:28:20 -0700 2009


Looks like it's picking up anything with matt anywhere in it (e.g. formatting):

((removed reference to sandwagn))

  --John Abbe.....Thu Mar 26 12:27:28 -0700 2009


after looking at this for a bit, I decided it might be best just to leave it that way, but to stop stripping out all the weird characters for name and content searches (which can't use fulltext), and leave a little bit of extra power in there for programmers. That way we can use all the pattern matching power of postgres if we want to:

http://www.postgresql.org/docs/7.4/interactive/functions-matching.html

so, for example this finds all cards with names having a word beginning in "matt":

{"name": ["match", "\mMatt"]}

(note that you have to double the backslash)

  --Ethan McCutchen.....Thu Mar 26 13:53:04 -0700 2009


While messing with this i somehow ended up testing search by content. ((removed reference to sandwagn)) is:
{"content": ["match", "Matt"]}
and ((removed reference to sandwagn)) is:
{"match": "Matt"}

...but the former returns 121 results to the latter's 11. Looks like the shorthand is only finding whole words, while the longhand is finding matt even within words. We just decided to make {"content": ["name", "Matt"]} act like longhand here, but not strip out weird characters, so that you can do things like "\mWag" to find anything that starts with "Wag". I found this agreeable at first, now i'm wondering if it'd be better to have match only find whole words (a reasonable default for the less geeky imho), and introduce textmatch (tmatch?) to do the fancy stuff?

In any case, the two should be consistent. Let me know what to do re ticketing.

  --John Abbe.....Thu Apr 23 16:04:53 -0700 2009


The behavior you describe is what I intended.

"match" (alone) uses fulltext indexing, a completely separate system from what we're doing for the property-specific matching. That indexing only works with whole words (though it handles things like plurals and conjugations I think), which is the expected behavior for search bars, because Google has made that behavior idiomatic. The indexing makes it very fast and provides relevance sorting.

The property-specific stuff uses built-in sql pattern matching. The behavior that we see is the default behavior for pattern matching in that realm. It is relatively easy to start with the default behavior of matching anywhere in the word, and relatively hard to go in the other direction. We can't really duplicate the behavior of the indexed searches in the unindexed searches. Anyhow, we don't really want people building lots of these unindexed searches, because they're slower. Nor do we really have to worry about this yet, since there are so few WQL'ers, particularly nongeeky ones.

Lew and I were talking today about moving away from postgres fulltext and into some more portable indexing libraries. Once we do that, we may be able to do indexed searches for all the options, and then have a separate operator (as you suggest) for more advanced searches.

For now, let's just add an idea for "more consistent matching behavior" and see what comes of the research into the other indexing mechanisms.

  --Ethan McCutchen.....Thu Apr 23 16:53:29 -0700 2009


Okay.

more consistent matching behavior

I'll fix the documentation.

  --John Abbe.....Fri Apr 24 16:26:31 -0700 2009


I can't tell if this got done (from +solution):

"it would also be great to have tests for most all of the operators. I have a hunch there are several others not performing properly that we haven't flushed out yet."

  --John Abbe.....Fri Apr 24 16:50:04 -0700 2009


hmm. No, I built a few new tests, but nothing comprehensive. Because of the cleanup, I'm more optimistic that most things are working. I think at this point I'm content to wait until there's evidence of a problem.

  --Ethan McCutchen.....Fri Apr 24 16:53:41 -0700 2009


okay - closing

  --John Abbe.....Fri Apr 24 16:55:32 -0700 2009