Difference between revisions of "Purge spam revisions from mediawiki database permanently"

From LVSKB
Jump to: navigation, search
m
m (Working Log: fix the link)
Line 10: Line 10:
 
LVSKB manually and permanently.
 
LVSKB manually and permanently.
  
[http://meta.wikimedia.org/wiki/Help:Administration#Deletion Mediawiki Administrator Help] has instructions to delete spam revisions manually.
+
[http://www.mediawiki.org/wiki/Manual:Administrators#Deletion Mediawiki Administrator Help] has instructions to delete spam revisions manually.
  
 
First, search all the history that contains spam revisions, there are many different approaches, for example
 
First, search all the history that contains spam revisions, there are many different approaches, for example

Revision as of 15:13, 20 December 2008

Introduction

Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert the spam links. After ConfirmEdit extension was installed, spam programs are difficult to post spam automatically. However, those spam links are still in page history, and in database.

It's really annoying to keep those spams in the database, which occupy a lot of space. And, search engine crawlers can still reach those spam links in page history, those links are connected to *bad* sites, I think that it could lower page rank of our own web pages in search engines.

Working Log

Finally, spent a couple of hours hours purging all spams in page history in LVSKB manually and permanently.

Mediawiki Administrator Help has instructions to delete spam revisions manually.

First, search all the history that contains spam revisions, there are many different approaches, for example

select old_id, old_title from text where old_text like '%wyger.nl%';
select * from revision where rev_text_id = 309;
select * from page where page_id = 957;

the delete spam history manually. Repeat this procedure if you can find more spam revisions.

Second, purge them into database permanently mysql> select count(*) from archive; mysql> delete from archive; If you do not want to see deletion log, do mysql> describe logging; mysql> select * from logging where log_id >= 1710 and log_type = 'delete'; mysql> delete from logging where log_id >= 1710 and log_type = 'delete'; Run "php purgeOldText.php" to purge text, which would save a lot of disk space.

[wensong@dragon maintenance]$ php purgeOldText.php
 
Purge Old Text
 
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
4263 inactive items found.
[wensong@dragon maintenance]$ php purgeOldText.php --purge
 
Purge Old Text
 
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
4263 inactive items found.
Deleting...done.

[wensong@dragon wensong]$ ls -l lvskb-mysql-2008022*
-rw-rw-r--    1 wensong  wensong    543134 Feb 24 00:05 lvskb-mysql-20080223-1.bz2
-rw-rw-r--    1 wensong  wensong   6082070 Feb 23 08:48 lvskb-mysql-20080223.bz2

Run "php rebuildrecentchanges.php" to rebuild recent changes page

Just log this whole procedure for future reference.