Creating better search suggestions with Sphider
filed under: Web Development / PHP Programming
Here's another simple hack for Sphider. If you use Sphider, you may or may not have noticed that the "did you mean" function isn't very accurate much of the time. Why is that? Simple, here is what it does with the search terms:
- Use MySQL's SOUNDEX function to find the close matches
- Use PHP's levenshtein function to find the first closest levenshtein distance.
Well, this is OK, but often gives poor results, especially as it keeps the first closest result, even if better matches exist later in the result array. So what I have done is to add two additional levels of matching, after getting the levenshtein distance, we then try and match the metaphone keys, and finally check the keywords using PHPs similar_text function to see if there is a better match than the current one. So this is what we end up with:
- Use MySQL's SOUNDEX function to find the close matches
- Use PHP's levenshtein function to find the close levenshtein distance.
- Use PHP's metaphone function to match keys, if they match, then we perform step 4.
- Use PHP's similar_text function to see if this result is better than the last one.
OK, how to implement? Easy, open the include/searchfuncs.php file, around line 333, find the following;
$near_word ="";
while ($row=mysql_fetch_row($result)) {
$distance = levenshtein($row[0], $word);
if ($distance < $max_distance && $distance <4) {
$max_distance = $distance;
$near_word = $row[0];
}
}
Now REPLACE that with the following code (indent as appropriate);
// BEGIN BETTER SEARCH SUGGESTION FIX
$near_word ="";
$max_similar = 0;
while ($row=mysql_fetch_row($result)) {
$distance = levenshtein($row[0], $word);
if ($distance <= $max_distance && $distance < 4) {
if ($max_distance >= $distance) {
if (metaphone($row[0]) == metaphone($word)) {
$similar = similar_text($row[0],$word);
if ($similar >= $max_similar) {
$max_distance = $distance;
$max_similar = $similar;
$near_word = $row[0];
}
}
} else {
$max_distance = $distance;
$near_word = $row[0];
}
}
}
// END BETTER SEARCH SUGGESTION FIX
Save and upload, and now your "did you mean" search suggestions should be much much more accurate.
2 comments
Add a new comment »Categories
Recent Entries
Recent Comments
- Redbeard (I managed to get Vampire: The Masquerade - Bloodlines ...)
- Tina (Installed this today and working like a charm :) Thanks!)
- Tina (Thanks for this great mod :) Working on my site for quite a ...)
- konga (Hi Matt, if you plan to update it, please have a look into ...)
- David (Yo Matt, I have a problem, when I use this mod. ...)
- Dave (Have a Toshiba NB305. Win7 starter would not do screen ...)
- Daichisan (Howdy Matt, I dont really get it, whats ...)
- Matt (Mine is just customized further, that's all :) It just spits ...)
- Matt (David, Do you own or admin the server? Do you have exec ...)
Popular Entries
- Compact archives for sNews 1.7 (5/5)
- Light-weight related articles mod for sNews 1.7 (4.78/5)
- SEF / SEO search for your sNews website (4.75/5)
- 1024x600 netbook wallpapers of Evangeline Lilly (4.67/5)
- Gravatar mod for sNews 1.7 (4.67/5)
- An improved tag cloud for sNews 1.7 (4.67/5)
- Image / math hybrid captcha version 2, vastly improved (4.64/5)
- An easy mod to create custom break titles for your sNews articles (4.6/5)
- Command & Conquer Generals, and the Zero Hour expansion on the Acer Aspire One netbook (4.6/5)
Jun 9th, 2010 at 12:24 pm
Thanks again Matt,
It's true, the "Did you mean" suggestions are a weak point in the original.
This is another nice improvement for Sphider.
Btw. I just read your follow up answers in the other article concerning the hotmail problems. As I didn´t receive notifications I hadn't read them before. :-) I don´t have spam problems on my hotmail account yet, and it's the account I use for forum signups and such so I can't really be bothered with Uncle Bill's policy. But it sounds really strange, microsoft seems to be working hard to remain funny in certain aspects...
Greetings!
Jun 9th, 2010 at 1:13 pm
Hi, thanks for the article. Already applied in their true until that changes are not seen =) set them up. If you have more fashion, write articles, I will be glad to read and improve the script.