Creating better search suggestions with Sphider

Comments (2)

Here's another simple hack for Sphider. If you use Sphider, you may or may not have noticed that the "did you mean" function isn't very accurate much of the time. Why is that? Simple, here is what it does with the search terms:

  1. Use MySQL's SOUNDEX function to find the close matches
  2. Use PHP's levenshtein function to find the first closest levenshtein distance.

Well, this is OK, but often gives poor results, especially as it keeps the first closest result, even if better matches exist later in the result array. So what I have done is to add two additional levels of matching, after getting the levenshtein distance, we then try and match the metaphone keys, and finally check the keywords using PHPs similar_text function to see if there is a better match than the current one. So this is what we end up with:

  1. Use MySQL's SOUNDEX function to find the close matches
  2. Use PHP's levenshtein function to find the close levenshtein distance.
  3. Use PHP's metaphone function to match keys, if they match, then we perform step 4.
  4. Use PHP's similar_text function to see if this result is better than the last one.

OK, how to implement? Easy, open the include/searchfuncs.php file, around line 333, find the following;

$near_word ="";
while ($row=mysql_fetch_row($result)) {
    
    $distance = levenshtein($row[0], $word);
    if ($distance < $max_distance && $distance <4) {
        $max_distance = $distance;
        $near_word = $row[0];
    }
}

Now REPLACE that with the following code (indent as appropriate);

// BEGIN BETTER SEARCH SUGGESTION FIX
$near_word ="";
$max_similar = 0;
while ($row=mysql_fetch_row($result)) {
    $distance = levenshtein($row[0], $word);
    if ($distance <= $max_distance && $distance < 4) {
        if ($max_distance >= $distance) {
            if (metaphone($row[0]) == metaphone($word)) {
                $similar = similar_text($row[0],$word);
                if ($similar >= $max_similar) {
                    $max_distance = $distance;
                    $max_similar = $similar;
                    $near_word = $row[0];
                }
            }
        } else {
            $max_distance = $distance;
            $near_word = $row[0];
        }
    }
}
// END BETTER SEARCH SUGGESTION FIX

Save and upload, and now your "did you mean" search suggestions should be much much more accurate.

bookmark / share this: Bookmark and Share
rated 5/5 (1 vote)


2 comments

Add a new comment »

Willy Willy said:
Jun 9th, 2010 at 12:24 pm

Thanks again Matt,

It's true, the "Did you mean" suggestions are a weak point in the original.

This is another nice improvement for Sphider.

Btw. I just read your follow up answers in the other article concerning the hotmail problems. As I didn´t receive notifications I hadn't read them before. :-) I don´t have spam problems on my hotmail account yet, and it's the account I use for forum signups and such so I can't really be bothered with Uncle Bill's policy. But it sounds really strange, microsoft seems to be working hard to remain funny in certain aspects...

Greetings!


Dmitrii Lavrinyuk Dmitrii Lavrinyuk said:
Jun 9th, 2010 at 1:13 pm

Hi, thanks for the article. Already applied in their true until that changes are not seen =) set them up. If you have more fashion, write articles, I will be glad to read and improve the script.



Write a comment

* = required field

:

:

:

:

You may insert urls in plain text, urls will be automatically linkified for trusted users and on seasoned posts only. All first comments are moderated, so use your email if you want to be remembered.


Back to top