我有非常大的数据库需要清理。在旧系统上,允许用户使用HTML (和从Word文件粘贴)。现在,我在DB中的不同位置有大量的打开标签。
有人知道执行这种清理的应用程序/脚本吗?
发布于 2011-11-07 17:03:20
你可以用一些php (或者其他后端语言)来做这件事。
下面是一个用php编写的示例
$link = mysql_connect($host, $username, $password); //connecting to database
mysql_select_db($database_name,$link); // select a database
$q = mysql_query('SELECT id_row,cell_with_tags FROM tablename WHERE some_condition_if_it_needed');
if(mysql_num_rows($q) !== 0){ //check if result of our query not empty
while($res = mysql_fetch_assoc($q)){ // gets all rows-cells as associative array
$result[] = array('id_row'=>$res['id_row'],
'cell_with_tags'=>strip_tags($res['cell_with_tags']);
}
}
if(!empty($result)){
foreach($result as $k=>&$v){ // update our database. Write there values without tags
mysql_query('UPDATE tablename SET cell_with_tags = '.$result[$k]['cell_with_tags'].' WHERE id='.$result[$k]['id_row']);
}
}如果你不想删除所有的strip_tags标签,你也可以修改html函数的调用。(http://ru.php.net/manual/en/function.strip-tags.php)
如果无法执行php脚本,则只能使用mysql执行。我是从this得到的
DROP FUNCTION IF EXISTS strip_tags ||
CREATE FUNCTION strip_tags( x longtext) RETURNS longtext
LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA
BEGIN
DECLARE sstart INT UNSIGNED;
DECLARE ends INT UNSIGNED;
SET sstart = LOCATE('<', x, 1);
REPEAT
SET ends = LOCATE('>', x, sstart);
SET x = CONCAT(SUBSTRING( x, 1 ,sstart -1) ,SUBSTRING(x, ends +1 )) ;
SET sstart = LOCATE('<', x, 1);
UNTIL sstart < 1 END REPEAT;
return x;
END;
||
delimiter ;
mysql> SELECT strip_tags('<a href="HelloWorld.html"><B>Hi, mate!</B></a>') as strip_tags;发布于 2012-09-29 03:57:33
如果您将<和>字符作为文本的一部分,而不是HTML的一部分,该怎么办?即使您的字符串在字符串末尾包含损坏的HTML标记。因此,您应该使用此版本的函数:
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4096) )
RETURNS varchar(4096)
DETERMINISTIC
BEGIN
DECLARE iStart, iEnd, iLength, DirtyLength, DirtyTail int;
DECLARE iNextLessThenAngleBracket, iLengthToNextLessThenAngleBracket int;
WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
BEGIN
SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
SET iNextLessThenAngleBracket = Locate( '<', Dirty, Locate('<', Dirty ) + 1);
SET iLength = ( iEnd - iStart) + 1;
SET iLengthToNextLessThenAngleBracket = ( iNextLessThenAngleBracket - iStart) + 1;
IF iLength < iLengthToNextLessThenAngleBracket THEN
BEGIN
IF iLength > 0 THEN
BEGIN
SET Dirty = Insert( Dirty, iStart, iLength, '');
END;
END IF;
END;
ELSE
BEGIN
IF iNextLessThenAngleBracket != 0 THEN
BEGIN
SET Dirty = Insert( Dirty, iStart, 1, '*/*');
END;
ELSE
BEGIN
SET Dirty = Insert( Dirty, iStart, iLength, '');
END;
END IF;
END;
END IF;
END;
END WHILE;
IF Locate( '<', Dirty ) > 0 THEN
BEGIN
SET DirtyLength = CHAR_LENGTH(Dirty);
SET DirtyTail = DirtyLength - Locate( '<', Dirty ) + 1;
SET Dirty = Insert( Dirty, Locate( '<', Dirty ), DirtyTail, '');
END;
END IF;
BEGIN
SET Dirty = REPLACE(Dirty, '*/*', '<');
END;
RETURN Dirty;
END;
|
DELIMITER ;https://stackoverflow.com/questions/7024490
复制相似问题