Related Articles mod for sNews CMS, public beta release
filed under: sNews CMS / Hacks & Mods
FYI, this mod is considered deprecated and not-so-efficient. Please use the new and improved light-weight related articles mod.
Here is the beta release of my related articles/posts/entries mod for sNews CMS. It's quite straight forward, all the information needed is contained in the single file, related.php. I will also give a brief install walk through here, so let's get started...
First download the mod zip file here;
extract the zip file, upload related.php to the same folder as your snews.php file and the related.png into your images folder or create a file called related.php and copy the following code to it and then upload it to the same folder as your snews.php file.
<?php
/*****************************************************************
Addon Name: sNews Related Articles
Description: Returns a list of related entries based on a boolean match using titles, post bodies and requiring a threshold to be met
Version: 0.9.2b
Author: Matt Jones
Author URI: http://www.mdj.us/
License: Creative Commons Attribution-Share Alike 3.0 United States License
License URI: http://creativecommons.org/licenses/by-sa/3.0/us/
*****************************************************************/
/*************************** HOW-TO USE **************************
1) Set up your full text index on your database;
ALTER TABLE articles ADD FULLTEXT(text, title);
2) Back up your snews.php file, then add the include; find the function articles, look around line 760 (on a default snews.php) for the following code;
} else if (empty($currentPage)) {
if ($infoline == true) {
$tag = explode(',', tags('infoline'));
foreach ($tag as $tag ) {
switch ($tag) {
case 'date':
echo $a_date_format;
break;
case 'readmore':
case 'comments': ;
break;
case 'edit':
if (_ADMIN) {
echo ' '.$edit_link;
}
break;
default:
echo $tag;
}
}
} else if (_ADMIN) {
echo '<p>'.$edit_link.'</p>';
}
}
then add below;
if ($infoline == true && $_catID != 0) { // displays it only on regular articles (not pages) and only when the article is veiwed on it's own page
include('related.php');
}
3) Add some .related_posts styles for the div & ul to your stylesheet
4) View the config options below and change if you like
5) Upload this file and your modified snews.php and it should work
*****************************************************************/
/**************************** CONFIG ****************************/
$debug = "FALSE"; // toogle to TRUE to print debugging information via comments in the HTML source
$lang_title = "Related Articles:"; // the title you would like printed out
$body_weight = "1"; // weight put on words in the body, options are 1,2,3 i.e. 1 is normal and 3 is highest, at a rank of 1 each word match will score a 1
$title_weight = "3"; // weight put on words from the title, options are 1,2,3 i.e. 1 is normal and 3 is highest, at a rank of 3 each word match will score a 3
$threshold = "AUTO"; // this can be any positive numerical value, the auto setting will attempt to calculate a sensible threshold based on the number of words supplied and their weighting
$threshold_lvl = "17"; // threshold adjustment, must be a 2 digit number from 01-99, the higher the stricter... I've found 15-20 works pretty well
$display_score = "FALSE"; // shows the match score in paranthesis beside the matches
$wordlimit = "250"; // a huge article with a gazillion words will meet any threshold and slow things down, so let's keep it sane here, the average page contains roughly 250 words so let's grab the first page or two worth of words
$post_limit = "5"; // Max number of related articles to show
/**************************** TO-DO LIST ****************************
Add the option to require one or more of the meta keywords to match
Add the option to require article be in the same category to match
********************************************************************/
/************* CORE LOGIC - DO NOT EDIT BELOW THIS LINE ************/
if ($debug == "TRUE") { // DEBUG is on so let's start the timer
$time_start = microtime(true);
}
$text = scrub_text($text); // first we clean up the body text words
$word_count = count_words($text); // get the total numbers of words in the body
if ($word_count >= $wordlimit) { // too many words in the article, let's cut it down
$text = limit_words($text,$wordlimit);
$word_count = $wordlimit;
}
$text = rank_text($text,$body_weight); // rank the body text words
$title = scrub_text($title); // first we clean up the body text words
$title = limit_words($title,"30"); // sanity check the title doesn't have too many words
$title_word_count = count_words($title); //count the total words
$title = rank_text($title,$title_weight); // rank the title words
if ($threshold == "AUTO") { // user wants the threshold auto generated so let's do that now
$threshold = auto_calc_threshold($word_count,$body_weight,$title_word_count,$title_weight,$threshold_lvl);
}
// build the MySQL full-text search query
$now = date("Y-m-d H:i:s",time()); // we don't want to show future articles
$query = "SELECT a.id,a.title,a.seftitle AS asef,a.text,a.category,a.published,a.visible,a.keywords_meta,c.id,c.name,c.seftitle AS csef,c.published,c.subcat AS subcat,
MATCH(a.title, a.text) AGAINST ('".$title." ".$text."' IN BOOLEAN MODE) AS score
FROM "._PRE."articles a, "._PRE."categories c WHERE MATCH(a.title, a.text) AGAINST ('".$title." ".$text."' IN BOOLEAN MODE)
AND a.id != '".$_ID."' AND date <= '".$now."' AND a.published='1' AND a.visible='YES' AND c.published='YES' AND a.category=c.id GROUP BY a.id HAVING score > '".$threshold."' ORDER BY score DESC LIMIT ".$post_limit."";
if ($debug == "TRUE") { // DEBUG is on so let's output the query
echo "\n\n<!-- DEBUG: SQL query: ".$query." -->\n\n";
}
$result_set1 = mysql_query($query);
$count = mysql_num_rows($result_set1);
$time_end = microtime(true);
$time = $time_end - $time_start;
if($count > 0) { // we have related posts
if ($debug == "TRUE") { // DEBUG is on so let's output the number of results
echo "\n\n<!-- DEBUG: ".$count." results found in ".round($time,4)." seconds. -->\n\n";
}
while($result_set2 = mysql_fetch_array($result_set1)) {
unset($subcat);
if ($result_set2['subcat'] != "0" ) { // this article is in a subcategory so we need to get the seftitle
$query = "SELECT id,seftitle FROM "._PRE."categories WHERE id='".$result_set2[subcat]."'";
$subcsef = mysql_query($query);
$subcseft = mysql_fetch_array($subcsef);
$subcat = $subcseft['seftitle']."/";
}
$score = round($result_set2['score'], 0);
$list .= '<li>';
if ($debug == "TRUE") { // DEBUG is on so let's output the score of the result in the list
$list .= "<!-- DEBUG: result score ".$score." -->";
}
$list .= '<a href="'._SITE.$subcat.$result_set2['csef'].'/'.$result_set2['asef'].'">'.$result_set2['title'].'</a>';
if ($display_score == "TRUE") { // display_score on so let's output the score
$list .= " (".$score.")";
}
$list .='</li>';
}
//let's wrap the list items in a div and ul
echo '<br /><div class="related_posts">'.$lang_title.'<ul class="related_posts">'.$list.'</ul></div>';
} else {
if ($debug == "TRUE") { // DEBUG is on so we need to know there were no results found
echo "\n\n<!-- DEBUG: No results found in ".round($time,4)." seconds.-->\n\n";
}
}
if ($debug == "TRUE") { // DEBUG is on so lets see the threshold that we searched on
echo "\n\n<!-- THRESHOLD: ".$threshold." -->\n";
echo "\n<!-- DEBUG: The title words we searched for were: ".$title." -->\n";
echo "\n<!-- DEBUG: The article words we searched for were: ".$text." -->\n\n";
}
/*************************** FUNCTIONS **************************/
// auto calculate the threshold
function auto_calc_threshold($word_count,$body_weight,$title_word_count,$title_weight,$threshold_lvl="15"){
//without a stoplist matching the one mysql full text uses, this won't be perfect.
$max_body_score = ($word_count * $body_weight);
$max_title_score = ($title_word_count * $title_weight);
$threshold_lvl = ".".$threshold_lvl;
$retval = (($max_body_score + $max_title_score)*($threshold_lvl));
return $retval;
}
// prepare the text, remove all the crap and any word less than 4 chars
function scrub_text($string) {
$retval = strip_tags(stripslashes($string)); // takes the text and removes any code
$retval = preg_replace("/[^a-zA-Z0-9-\s]/i", '', $retval); // replace non alpha numeric chars with whitespace
$retval = preg_replace("/\s\s+/", ' ', $retval); // collapse excessive whitespace
$retval = preg_replace("/^([^\W]{1,3}[\W])|($[^\W]{1,3}^)|($[^\W]{1,3}\W+)|(\W[^\W]{1,3}(?=\W|$))/", '', $retval); // regex should strip words less than 4 characters
$retvalex = explode(" ",$retval); // create an array from the string
$retvalarr = array_unique($retvalex); // now strip duplicate words
$retval = implode(" ",$retvalarr);
return $retval;
}
// set the rankings for the IN BOOLEAN MODE full-text search
function rank_text($string, $rank="0") {
$retval = $string;
if ($rank == "2") {
$retval = preg_replace("/(^|\s)(\w)/e", "'\\1>>\\2'", $retval); //increase the rank value using the > operator
} elseif ($rank == "3") {
$retval = preg_replace("/(^|\s)(\w)/e", "'\\1>>>\\2'", $retval); //increase the rank value using the > operator
}
return $retval;
}
// counts the total words
function count_words($string) {
$array = explode(" ", $string);
$retval = count($array);
return $retval;
}
// this function cuts down the max number of words for the article body
function limit_words($string, $wordsmax="250") {
$array = explode(" ", $string);
array_splice($array, $wordsmax);
$retval = implode(" ", $array);
return $retval;
}
?>
Now back up your snews.php and work on a copy, within the function articles, look for the following code;
} else if (empty($currentPage)) {
if ($infoline == true) {
$tag = explode(',', tags('infoline'));
foreach ($tag as $tag ) {
switch ($tag) {
case 'date':
echo $a_date_format;
break;
case 'readmore':
case 'comments': ;
break;
case 'edit':
if (_ADMIN) {
echo ' '.$edit_link;
}
break;
default:
echo $tag;
}
}
} else if (_ADMIN) {
echo '<p>'.$edit_link.'</p>';
}
}
and add the following code BELOW it
if ($infoline == true && $_catID != 0) { // displays it only on regular articles (not pages) and only when the article is viewed on it's own page
include('related.php');
}
Now set up the full text indexes on your MySQL table
ALTER TABLE articles ADD FULLTEXT(text, title);
Now add some .related_posts styles to your stylesheet, something like
div.related_posts {
margin: 5px 0;
}
ul.related_posts {
margin: 0;
padding: 0;
list-style: none;
}
ul.related_posts li{
padding-left:16px;
margin-bottom:.2em;
background-image:url('../images/related.png');
background-repeat:no-repeat;
background-position:0 .2em;
}
Here's the icon I used...
... (from pinvoke)... That should be it, upload your modified stylesheet and then your modified snews.php and you should be in business. You can tinker with the config options in the related.php if you like, if you want more matches, lower the threshold, or raise it to tighten match requirements. This is the first release and considered beta right now, please let me know any problems you have or changes you'd like to see, I need your feedback to improve it.
18 comments
Commenting is closed at this time.
Commenting is closed at this time.
Categories
Recent Entries
Recent Comments
- Dede (I checked it today in a shop. GT2 had some troubles with six ...)
- Matt (Bintang, You need to re-direct the url, try ...)
- jesth (Ohh.. why didn't I think of that, thanks alot.)
- Matt (Dede, I don't have Gran Turismo 2, any of the 2nd+ generation ...)
- Matt (Jesth, Just change the if condition, instead of looking for ...)
- jesth (Hi (again) Was wondering, is it possible to make it ...)
- Bintang Sembilan (Matt, thanks for your modd. I have apply it to my ...)
- Dede (Hello there. Can you check something for me? I want to buy ...)
- Matt (I think it's a driver issue Terrence, or it was a driver issue. ...)
Popular Entries
- Light-weight related articles mod for sNews 1.7 (4.5/5)
- Image / math hybrid captcha version 2, vastly improved (4.42/5)
- 1024x600 netbook wallpapers of Evangeline Lilly (4.4/5)
- Compact archives for sNews 1.7 (4.4/5)
- sNews Ajax Polls mod now available (4.38/5)
- Pretty date and comments bars in sNews CMS (4.35/5)
- Page caching mod for sNews 1.7 (4.33/5)
- Gravatar mod for sNews 1.7 (4.29/5)
- An improved tag cloud for sNews 1.7 (4.29/5)
Jan 12th, 2009 at 5:56 pm
Great mod man..thanks...
Jan 20th, 2009 at 1:58 am
Great news, thanks :-)
Jan 23rd, 2009 at 2:51 pm
Thats great mod :)
Big TXH!
Feb 17th, 2009 at 11:40 am
This the ultimate Mod for related content.
Bravissimo Matt!
Mar 26th, 2009 at 8:41 am
Hello Matt.
I'me seeing a bug ;-)
http://carnet.hiseo.fr/sexy-actrices/lecon-57-france-nuyen/
When the related article is in a sub category, the link genrated does not care of the category:
- http://myblog//jambes/lili-damita
- http://myblog/sexy-actrices/jambes/lili-damita/
Mar 26th, 2009 at 9:18 am
HMMM... this is interesting Philippe, if you look at it on this article, you'll see that 4 of my related links are in subcategories.
You didn't change the routines inside the related.php file or modify the categories table in the database, right?
Have a look and make sure the subcat part of related.php is correct;
while($result_set2 = mysql_fetch_array($result_set1)) { unset($subcat); if ($result_set2['subcat'] != "0" ) { // this article is in a subcategory so we need to get the seftitle $query = "SELECT id,seftitle FROM categories WHERE id='".$result_set2[subcat]."'";/> $subcsef = mysql_query($query); $subcseft = mysql_fetch_array($subcsef); $subcat = $subcseft['seftitle']."/"; }you're getting the slash "/", so it looks like it's not getting the seftitle for some reason.
You are using sNews 1.7, right?
Mar 26th, 2009 at 10:08 am
Hi Matt,
yeap: I'm using for this blog sNews 1.7...
I didn't change the script, and didn't modify the table since I've installed the Mod.
I use a subcategory since a few days and just noticed that issue.
Since I use it "as it" and if it works on your blog there should be something missing in my core script. But what?
Mar 26th, 2009 at 11:15 am
Yeah, something strange is happening, it doesn't appear that the subcats are appearing in your breadcrumb menu either.
http://carnet.hiseo.fr/sexy-actrices/jambes/lili-damita/
it leaves "jambes" out of the breadcrumb menu.
but that may be completely un-related for all I know, as you're using a custom breadcrumb.
This is a tricky one, I'm going to try and play around with some setting on my site and see what happens...
Mar 26th, 2009 at 12:29 pm
Yeap my breadcrumb is crap: I'm gonna change it tomorrow.
I've been searching where I may have screwed up something but didn't find anything weird.
But as a PHP dumber I don't know where to look. :-D
Thanks for the help.
Mar 27th, 2009 at 7:48 am
AHA!
I think I got it! I goofed up and didn't include the DB prefix on the subcategory selection! You use a table prefix and I don't, that's why it would work on mine and not on yours, here's the fix.
In related.php, find
and change it to
$query = "SELECT id,seftitle FROM "._PRE."categories WHERE id='".$result_set2[subcat]."'";I have updated the tutorial and zip file with this change as well.
Mar 27th, 2009 at 8:08 am
You squashed the bug!
Good shot Matt! ;-)
Aug 3rd, 2009 at 8:13 am
I've got a style problem with the clean theme in Firefox (not in IE8). The page is printing the list with squares as well as the icon.
Aug 3rd, 2009 at 8:24 am
Michael,
Try adding;
to
Aug 4th, 2009 at 3:52 am
Thanks Matt, that worked.
Nov 7th, 2009 at 7:59 pm
Hi.
Matt, can You help me with this problem?:
http://snewscms.com/forum/index.php?topic=8761.0
Nov 7th, 2009 at 8:23 pm
Hi toolman,
It's a pretty beefy query, you're using matching to get an accurate list of possibly related articles. Using mysql full-text, it shouldn't be causing that much strain, perhaps too much for your host to accept? Though if you're on shared hosting with 4,000 uniques a day, I can understand where the host is coming from.
You can set the worldlimit variable to something smaller, like 50 or 100. If that doesn't make your host happy, you'll probably want to match only on titles, which isn't a very effective means if you ask me.
In the meantime, try setting debug to true and then look at the source code to see how long the queries are taking.
Nov 8th, 2009 at 6:31 am
Now I have wordlimit=100. I will try with wordlimit=15.
Thx.
Nov 8th, 2009 at 7:52 am
OK, sounds good, when I get some free time, I will look at making the changes needed to have the option to match on only the titles.