我想让我的数据库文章重新分类。我分解给定文章的文本,然后查看是否有一个单词出现在category表中的文章match 1标记中,然后在这个类别名称中更新这篇文章。我的代码在这里。我想做一个限制,每个类别最大有5篇文章。但是更新限制不起作用。谢谢。
<?php
header('Content-type:text/html; charset=utf-8');
$db = mysql_connect("localhost","root","root") or die("can not connect Mysql Server");
mysql_select_db("12",$db);
$result = mysql_query("SELECT title,content,id,cat,date FROM articles Order By date DESC"); //get all the articles
$count = 0;
$ids = array();
$categories = array('1','2','3','4','5','6','7','8','9','10');//category numbers, for 1 = art, 2 = travel... these are stored in another refrenced DB table
$curCategory = array_shift($categories);
echo $curCategory;
while ($row = mysql_fetch_array($result))
{
$tt = $row['title'].' '.$row['content'];
$tt = preg_replace('/[^a-zA-Z0-9 ]/','',$tt);
$words = preg_split("/\s+/",$tt);
$uniqueWords = array_keys(array_flip($words)); // broken article sentence into words
$parts = '';
foreach($uniqueWords as $word){
$parts[] = " tag1 = '$word' OR tag2 = '$word' OR tag3 = '$word' OR tag4 = '$word' OR tag5 = '$word' ";
}
$where = implode(" OR ", $parts);
mysql_select_db("12",$db);
mysql_query("SET NAMES utf8");
$query1 = mysql_query("SELECT count(*) as count FROM tag1 WHERE ($where) AND category ='count($categories)' "); //put the break words into reference table match out the category number
$count = 0;
while ($row = mysql_fetch_array($query1)) {
$count = $row['count'];
}
if($count) {
$ids[] = $row['id'];
$count++;
if($count == 5) {
mysql_query("UPDATE articles SET cat = '$curCategory' WHERE id in ('".implode("', '", $ids)."')"); //update every category max articles
if(!$curCategory = array_shift($categories)) {
break;
}
$count = 0;
$ids = array();
}
}
}
?>参考表
category | tag1 | tag2 | tag3 | tag4 | tag5
1 | paint | picture| sculpture | photo | bronze
2 | tourism | travel | tour | journey | trip
3 | style | vogue | fashion | mode | Popular
... // 10 categories, category 1 = art , category 2 = travel ...发布于 2011-04-13 04:09:55
太可怕了。
代码杂乱无章,正如其他答案所解释的那样,但是您对标记数据结构的选择也会给您带来麻烦。
创建一个单独的标签表,并将其链接到您的文章,而不是5列对应5个标签:
article | tag |
1 | paint |
1 | picture |
1 | sculpture |
1 | photo |
1 | bronze |
2 | tourism |
2 | travel |
2 | tour |然后,当您标记时,您不需要担心标记是tag1还是tag2,或者tag3是否为空,或者您是否改变了主意并想要6个标记。该结构适用于0到任意数量的标签,只需使查询的“部分”部分类似于:
$parts = " tag in ('"
.implode($uniqueWords,"', '")
."')";
// e.g. if uniqueWords = ['one','two','three'], $parts= "tag in ('one','two','three','')"这种内爆可能需要调整,以将所有的引号和括号放在正确的位置。
但是,这些都不能解决您的实际问题。我不清楚你是否要为你的文章寻找前5个关键词,或者任何5个标签,或者最好的5个标签。我的建议是这样的。
分解你的文章,在寻找独特的单词时,统计单词出现的次数,不包括常见的英语单词,如"the“。然后按照出现的顺序对唯一的单词进行排序,重复最多的单词在前面。你在你的文章中有一个主要单词的列表,取前五个,它们是标签。插入到表中。
或者,这里有一个解决方案,听起来可能很混乱,但最终可能会更有效率。编写一个完全在MySQL中执行此过程的数据库过程。您需要两个表:
tagstable - 1 column "tag" is the PK
| paint |
| picture | articlewordstable - 1 column "word" is the PK - empty
| sculpture | | - |
| photo | | - |
| bronze |将标记化的单词插入到文章词表。然后使用tagstable查询该表的连接:
SELECT word FROM articlewordstable
INNER JOIN tagstable
ON tag = word;您将获得一个单词列表,这些单词也是标签。你可以设置5个结果的限制,你也可以这样做
SELECT word, count(word) occurrences FROM articlewordstable
INNER JOIN tagstable
ON tag = word
GROUP BY word
ORDER BY occurrences DESC;这将给你最常用的单词,这些单词也出现在你的标签列表中。这也可以限制为5,然后使用您认为合适的。
希望这能有所帮助!
发布于 2011-04-06 19:50:53
非常非常奇怪的代码。但是..。$ids[] = $row['id']; -您的sql没有id列,因此结果中没有任何in。也许是因为您在外部循环和内部循环中都使用了$row --这就是问题所在。
另外,你有没有意识到那篇只有100个单词的文章(不是很多,对吧?)形成一个500 OR的sql查询?:)
那么mysql_select_db和mysql_query("SET NAMES utf8");呢--为什么他们在这个周期中,为什么呢?
发布于 2011-04-06 20:10:10
让我们分析一下这段代码:
// this query returns one row with column `count`, you're comparing column
// `category` to the literal string `count($categories)` where
// `$categories` is an array of numbers and therefore evaluates to `count(Array)`
$query1 = mysql_query("SELECT count(*) as count FROM tag1 WHERE ($where) AND category ='count($categories)' ");
$count = 0;
// warning: overwriting previous $row variable
while ($row = mysql_fetch_array($query1)) {
// an if($row=...) is better since you've on row anyway
// Contents of $row = array( 'count' => NUMBER );
// You're overwriting $count with the number of found articles
$count = $row['count'];
}
// unless the query failed or there are no articles found, the next condition is true
if($count) {
// undeclared variable $ids; $row['id'] does not exist since it is overwritten
$ids[] = $row['id'];
// The next lines do not limit the number of updates, it only updates
// if $count == 4; where $count is the number of articles in a category
$count++;
if($count == 5) {
mysql_query("UPDATE articles SET cat = '$curCategory' WHERE id in ('".implode("', '", $ids)."')");
// so if the current catgeory has five articles, quit?
if(!$curCategory = array_shift($categories)) {
break;
}
// otherwise, reset for the next category
$count = 0;
$ids = array();
}
}你绝对应该看看你的代码,看看你是否理解了所有的东西。我确信覆盖$row不是故意的,您在$query1中的查询也不正确。在命名变量时,要使它们更具描述性。例如,使用$catCount_row而不是$row。请注意,您每次都会重写$count,也许您希望将其从while循环中删除。
如果未达到4篇文章数,则不会进行任何更新。
https://stackoverflow.com/questions/5531793
复制相似问题