Related Articles mod for sNews CMS, public beta release

Related Articles mod for sNews CMS, public beta release

FYI, this mod is considered deprecated and not-so-efficient. Please use the new and improved light-weight related articles mod.

Here is the beta release of my related articles/posts/entries mod for sNews CMS. It's quite straight forward, all the information needed is contained in the single file, related.php. I will also give a brief install walk through here, so let's get started...


First download the mod zip file here;

extract the zip file, upload related.php to the same folder as your snews.php file and the related.png into your images folder or create a file called related.php and copy the following code to it and then upload it to the same folder as your snews.php file.

<?php 
/*****************************************************************
Addon Name: sNews Related Articles
Description: Returns a list of  related entries based on a boolean match using titles, post bodies and requiring a threshold to be met
Version: 0.9.2b
Author: Matt Jones
Author URI: http://www.mdj.us/
License: Creative Commons Attribution-Share Alike 3.0 United States License
License URI: http://creativecommons.org/licenses/by-sa/3.0/us/
*****************************************************************/

/*************************** HOW-TO USE **************************
1) Set up your full text index on your database;
    ALTER TABLE articles ADD FULLTEXT(text, title); 

2) Back up your snews.php file, then add the include; find the function articles, look around line 760 (on a default snews.php) for the following code;
            } else if (empty($currentPage)) {
                if ($infoline == true) {
                    $tag = explode(',', tags('infoline'));
                    foreach ($tag as $tag ) {
                        switch ($tag) {
                            case 'date':
                                echo $a_date_format;
                                break;
                            case 'readmore':
                            case 'comments': ;
                                break;
                            case 'edit':
                                if (_ADMIN) {
                                    echo ' '.$edit_link;
                                }
                                break;
                            default:
                                echo $tag;
                        }
                    }
                } else if (_ADMIN) {
                    echo '<p>'.$edit_link.'</p>';
                }
            }

then add below;

            if ($infoline == true && $_catID != 0) { // displays it only on regular articles (not pages) and only when the article is veiwed on it's own page
                include('related.php'); 
            }

3) Add some .related_posts styles for the div & ul to your stylesheet

4) View the config options below and change if you like

5) Upload this file and your modified snews.php and it should work
*****************************************************************/

/**************************** CONFIG ****************************/
$debug          = "FALSE"; // toogle to TRUE to print debugging information via comments in the HTML source
$lang_title     = "Related Articles:"; // the title you would like printed out
$body_weight    = "1"; // weight put on words in the body, options are 1,2,3 i.e. 1 is normal and 3 is highest, at a rank of 1 each word match will score a 1
$title_weight   = "3"; // weight put on words from the title, options are 1,2,3 i.e. 1 is normal and 3 is highest, at a rank of 3 each word match will score a 3
$threshold      = "AUTO"; // this can be any positive numerical value, the auto setting will attempt to calculate a sensible threshold based on the number of words supplied and their weighting
$threshold_lvl  = "17"; // threshold adjustment, must be a 2 digit number from 01-99, the higher the stricter... I've found 15-20 works pretty well
$display_score  = "FALSE"; // shows the match score in paranthesis beside the matches
$wordlimit      = "250"; // a huge article with a gazillion words will meet any threshold and slow things down, so let's keep it sane here, the average page contains roughly 250 words so let's grab the first page or two worth of words
$post_limit     = "5"; // Max number of related articles to show


/**************************** TO-DO LIST ****************************
Add the option to require one or more of the meta keywords to match
Add the option to require article be in the same category to match
********************************************************************/




/************* CORE LOGIC - DO NOT EDIT BELOW THIS LINE ************/
if ($debug == "TRUE") { // DEBUG is on so let's start the timer
    $time_start = microtime(true);
}
$text = scrub_text($text); // first we clean up the body text words
$word_count = count_words($text); // get the total numbers of words in the body
if ($word_count >= $wordlimit) { // too many words in the article, let's cut it down 
    $text = limit_words($text,$wordlimit);
    $word_count = $wordlimit;
}
$text = rank_text($text,$body_weight); // rank the body text words


$title = scrub_text($title); // first we clean up the body text words
$title = limit_words($title,"30"); // sanity check the title doesn't have too many words
$title_word_count = count_words($title); //count the total words
$title = rank_text($title,$title_weight); // rank the title words

if ($threshold == "AUTO") { // user wants the threshold auto generated so let's do that now
    $threshold = auto_calc_threshold($word_count,$body_weight,$title_word_count,$title_weight,$threshold_lvl);
}

// build the MySQL full-text search  query
$now = date("Y-m-d H:i:s",time()); // we don't want to show future articles

$query = "SELECT a.id,a.title,a.seftitle AS asef,a.text,a.category,a.published,a.visible,a.keywords_meta,c.id,c.name,c.seftitle AS csef,c.published,c.subcat AS subcat, 
 MATCH(a.title, a.text) AGAINST ('".$title." ".$text."' IN BOOLEAN MODE) AS score
 FROM "._PRE."articles a, "._PRE."categories c WHERE MATCH(a.title, a.text) AGAINST ('".$title." ".$text."' IN BOOLEAN MODE)
 AND a.id != '".$_ID."' AND date <= '".$now."' AND a.published='1' AND a.visible='YES' AND c.published='YES' AND a.category=c.id GROUP BY a.id HAVING score > '".$threshold."' ORDER BY score DESC LIMIT ".$post_limit."";
if ($debug == "TRUE") { // DEBUG is on so let's output the query
    echo "\n\n<!-- DEBUG: SQL query: ".$query."  -->\n\n";
}
$result_set1 = mysql_query($query);
$count = mysql_num_rows($result_set1);


$time_end = microtime(true);     
$time = $time_end - $time_start;


if($count > 0) { // we have related posts
    if ($debug == "TRUE") { // DEBUG is on so let's output the number of results
        echo "\n\n<!-- DEBUG: ".$count." results found in ".round($time,4)." seconds. -->\n\n";
    }
    while($result_set2 = mysql_fetch_array($result_set1)) {
        unset($subcat);
        if ($result_set2['subcat'] != "0" ) { // this article is in a subcategory so we need to get the seftitle
              $query = "SELECT id,seftitle FROM "._PRE."categories WHERE id='".$result_set2[subcat]."'";
              $subcsef = mysql_query($query);
              $subcseft = mysql_fetch_array($subcsef);
              $subcat = $subcseft['seftitle']."/";
          }
        $score = round($result_set2['score'], 0);
        $list .= '<li>';
            if ($debug == "TRUE") { // DEBUG is on so let's output the score of the result in the list
                $list .= "<!-- DEBUG: result score ".$score." -->";
            }
            $list .= '<a href="'._SITE.$subcat.$result_set2['csef'].'/'.$result_set2['asef'].'">'.$result_set2['title'].'</a>';
            if ($display_score == "TRUE") { // display_score on so let's output the score
                $list .= " (".$score.")";
            }
            $list .='</li>';
    }
    //let's wrap the list items in a div and ul
    echo '<br /><div class="related_posts">'.$lang_title.'<ul class="related_posts">'.$list.'</ul></div>';
    
} else {
    if ($debug == "TRUE") { // DEBUG is on so we need to know there were no results found
        echo "\n\n<!-- DEBUG: No results found in ".round($time,4)." seconds.-->\n\n";
    }
}

if ($debug == "TRUE") { // DEBUG is on so lets see the threshold that we searched on
    echo "\n\n<!-- THRESHOLD: ".$threshold." -->\n";
    echo "\n<!-- DEBUG: The title words we searched for were: ".$title." -->\n";
    echo "\n<!-- DEBUG: The article words we searched for were: ".$text." -->\n\n";
}

/*************************** FUNCTIONS **************************/
    
// auto calculate the threshold
function auto_calc_threshold($word_count,$body_weight,$title_word_count,$title_weight,$threshold_lvl="15"){
    //without a stoplist matching the one mysql full text uses, this won't be perfect.
    $max_body_score = ($word_count * $body_weight);
    $max_title_score = ($title_word_count * $title_weight);
    $threshold_lvl = ".".$threshold_lvl;
    $retval = (($max_body_score + $max_title_score)*($threshold_lvl));
    return $retval;
}

// prepare the text, remove all the crap and any word less than 4 chars
function scrub_text($string) {    
    $retval = strip_tags(stripslashes($string)); // takes the text and removes any code
    $retval = preg_replace("/[^a-zA-Z0-9-\s]/i", '', $retval); // replace non alpha numeric chars with whitespace
    $retval = preg_replace("/\s\s+/", ' ', $retval); // collapse excessive whitespace
    $retval = preg_replace("/^([^\W]{1,3}[\W])|($[^\W]{1,3}^)|($[^\W]{1,3}\W+)|(\W[^\W]{1,3}(?=\W|$))/", '', $retval); // regex should strip words less than 4 characters
    $retvalex = explode(" ",$retval); // create an array from the string
    $retvalarr = array_unique($retvalex); // now strip duplicate words
    $retval = implode(" ",$retvalarr);
    return $retval;
}

// set the rankings for the IN BOOLEAN MODE full-text search
function rank_text($string, $rank="0") {    
    $retval = $string;
    if ($rank == "2") {
        $retval = preg_replace("/(^|\s)(\w)/e", "'\\1>>\\2'", $retval); //increase the rank value using the > operator
    } elseif ($rank == "3") {
        $retval = preg_replace("/(^|\s)(\w)/e", "'\\1>>>\\2'", $retval); //increase the rank value using the > operator
    } 
    return $retval;
}

// counts the total words
function count_words($string) {
    $array = explode(" ", $string);
    $retval = count($array);
    return $retval;
}
 
// this function cuts down the max number of words for the article body  
function limit_words($string, $wordsmax="250") {
    $array = explode(" ", $string);
    array_splice($array, $wordsmax);
    $retval = implode(" ", $array);
    return $retval;
}
?>  
   

Now back up your snews.php and work on a copy, within the function articles, look for the following code;

} else if (empty($currentPage)) {
    if ($infoline == true) {
        $tag = explode(',', tags('infoline'));
        foreach ($tag as $tag ) {
            switch ($tag) {
                case 'date':
                    echo $a_date_format;
                    break;
                case 'readmore':
                case 'comments': ;
                    break;
                case 'edit':
                    if (_ADMIN) {
                        echo ' '.$edit_link;
                    }
                    break;
                default:
                    echo $tag;
            }
        }
    } else if (_ADMIN) {
        echo '<p>'.$edit_link.'</p>';
    }
}

and add the following code BELOW it

if ($infoline == true && $_catID != 0) { // displays it only on regular articles (not pages) and only when the article is viewed on it's own page
    include('related.php'); 
}

Now set up the full text indexes on your MySQL table

ALTER TABLE articles ADD FULLTEXT(text, title); 

Now add some .related_posts styles to your stylesheet, something like

div.related_posts {
    margin: 5px  0;
}
ul.related_posts {
    margin: 0;
    padding: 0;
    list-style: none;
}
ul.related_posts li{
    padding-left:16px;
    margin-bottom:.2em;
    background-image:url('../images/related.png');
    background-repeat:no-repeat;
    background-position:0 .2em;
}

Here's the icon I used...related icon... (from pinvoke)... That should be it, upload your modified stylesheet and then your modified snews.php and you should be in business. You can tinker with the config options in the related.php if you like, if you want more matches, lower the threshold, or raise it to tighten match requirements. This is the first release and considered beta right now, please let me know any problems you have or changes you'd like to see, I need your feedback to improve it.

Tags

 

You might like

Comments


Great mod man..thanks...


Great news, thanks :-)


Thats great mod :)
Big TXH!


This the ultimate Mod for related content.
Bravissimo Matt!


HMMM... this is interesting Philippe, if you look at it on this article, you'll see that 4 of my related links are in subcategories.

You didn't change the routines inside the related.php file or modify the categories table in the database, right?

Have a look and make sure the subcat part of related.php is correct;


while($result_set2 = mysql_fetch_array($result_set1)) {
unset($subcat);
if ($result_set2['subcat'] != "0" ) { // this article is in a subcategory so we need to get the seftitle
$query = "SELECT id,seftitle FROM categories WHERE id='".$result_set2[subcat]."'";/>
$subcsef = mysql_query($query);
$subcseft = mysql_fetch_array($subcsef);
$subcat = $subcseft['seftitle']."/";
}

you're getting the slash "/", so it looks like it's not getting the seftitle for some reason.


You are using sNews 1.7, right?


Hi Matt,
yeap: I'm using for this blog sNews 1.7...
I didn't change the script, and didn't modify the table since I've installed the Mod.
I use a subcategory since a few days and just noticed that issue.
Since I use it "as it" and if it works on your blog there should be something missing in my core script. But what?


Yeah, something strange is happening, it doesn't appear that the subcats are appearing in your breadcrumb menu either.

http://carnet.hiseo.fr/sexy-actrices/jambe...

it leaves "jambes" out of the breadcrumb menu.

but that may be completely un-related for all I know, as you're using a custom breadcrumb.

This is a tricky one, I'm going to try and play around with some setting on my site and see what happens...


Yeap my breadcrumb is crap: I'm gonna change it tomorrow.
I've been searching where I may have screwed up something but didn't find anything weird.
But as a PHP dumber I don't know where to look. :-D
Thanks for the help.


AHA!

I think I got it! I goofed up and didn't include the DB prefix on the subcategory selection! You use a table prefix and I don't, that's why it would work on mine and not on yours, here's the fix.

In related.php, find

$query = "SELECT id,seftitle FROM categories WHERE id='".$result_set2[subcat]."'";
and change it to

$query = "SELECT id,seftitle FROM <span class="highlight">"._PRE."</span>categories WHERE id='".$result_set2[subcat]."'";

I have updated the tutorial and zip file with this change as well.


You squashed the bug!
Good shot Matt! ;-)


I've got a style problem with the clean theme in Firefox (not in IE8). The page is printing the list with squares as well as the icon.


Michael,

Try adding;

list-style:none;
to

ul.related_posts li


Thanks Matt, that worked.


Hi toolman,

It's a pretty beefy query, you're using matching to get an accurate list of possibly related articles. Using mysql full-text, it shouldn't be causing that much strain, perhaps too much for your host to accept? Though if you're on shared hosting with 4,000 uniques a day, I can understand where the host is coming from.

You can set the worldlimit variable to something smaller, like 50 or 100. If that doesn't make your host happy, you'll probably want to match only on titles, which isn't a very effective means if you ask me.

In the meantime, try setting debug to true and then look at the source code to see how long the queries are taking.


Now I have wordlimit=100. I will try with wordlimit=15.
Thx.


OK, sounds good, when I get some free time, I will look at making the changes needed to have the option to match on only the titles.


Hi Matt,
I just installed this related mod in my site as per your instructions. But when i see the sub categories pages I find they are broken- like it shows only the first article with related articles.
Pl. see the url below: http://vasthurengan.com/pin_code/andhra-pr...
Please help me.
vasthurengan


Hi Matt,
Yes now it is working O>K
Thanks

Comments are closed. No new comments allowed.

Copyleft 2002 - 2017 Matt Jones
Hand crafted with HTML5 & CSS3
↑ Back to top