Some Statistics on Spam

Well, lately the TDF Wiki Admins luckily hadn’t much to do, but well I still do want to provide some statistics as the results are very impressive.

Our activated “abuse filter” has caught 2214 edits in the the time from 26 March 2018 to 12 June 2018 (so within 78 days) done by 1244 users. Be careful in interpreting the data as after many weeks the spammer detected the problem and tried to hit enter multiple times but with a new session (so all the accounts do not get blocked, part 2 of the rule was never hit except one false positive and by one of my test accounts!).

 

For a very short test in the night this filter seems very successful at the moment at least for my nerves for the last 3 months. Finger crossed that this stays so in our wiki (and of course all other platforms we provide – if they have spam problems, I do not know, except nabble)!

Some manually work still include 9 classical spammers (so it seems somebody other) who tried to post something in the “main space” / “article space” which we need to block and delete manually, but in 78 days this were only 9 times!!! That’s admin work as I love it: concentrate on the content, not fighting spam!

To mention: another abuse filter which hits much more the false positive, but is alive since ages has protected us from another 5 edits by 4 users from spam. This is ok, as the warning was reworked to make it more clearly what we do want and that most regular users also got excluded finally…

Advertisements
Some Statistics on Spam

Archive.org Resources

archive.org has a great web archive – containing many archived revisions of all web pages. You might even know that – if not, try it out! Even sites are archived which are offline since ages! Archive.org works since 1998(!) – shortly after the beginning of the internet itself!

 

Why do I write this blog post? What does this have to do with LibreOffice?

It’s rather easy: eskiwiki and trwiki (which I do try to integrate into tdfwiki) do both based on a not maintained server where nobody has access. There is a great tool and initiative out there called wikiteam, who does try to archive and make dumps of wikis accessible (not only mediawiki).

I do know that there are requests to get content of the “wiki Pardus” (the offline wiki of the Linux distribution Pardus) and what a luck there is a backup of the Pardus wiki in their archives.

 

So again: If you know a great resource (even offline) for help pages, FAQ pages, tutorials, wikis, etc. pp. then ask us.

Archive.org Resources

Eskiwiki and trwiki

There are many old – often read-only – resources out there. Sometimes still based for OpenOffice.org and thus heavily outdated. Many creators and administrators of these sites are either no longer active (in neither LibreOffice nor OpenOffice communities) and many of these resources have been made read-only for some reasons.

Original I got the request from the German community to recreate somehow the content of the old http://www.ooowiki.de back to live. The doing is on my To-Do-List since ages, and it will happen at some time. First I do want to test some easier conversation of a straight on two Turkish wikis: Eskiwiki (“old wiki”) and TRWIKI. Both are based on MediaWiki and can be merged first within in itself (so the real content without spam goes from Eskiwiki to TRWIKI). After the first migration a rather short test-migration should happen to the test instance of the TDFwiki. Hopefully everybody involved will give ones go to move than to the live-TDFwiki. Of course we will try to preserve working links and contributions (who made the edits) as much as possible / the license allows. All these tasks are not always easy nor clear, e.g. do we can preserve the links without any breakage; or can we preserve the attribution of the edits or map these to the correct account.

All in all these involves many steps even in such relative easy example as the TRWIKI; but getting to pages like the read-only “only in HTML format” OOoWiki requires some more steps. OOowiki original based on MoinMoin-Wiki and the generated HTML was optimized and simplified. So this is not a simply straight forward like in the two Turkish wikis which simply need to export the XML content (of all relevant revisions) and import this into another MediaWiki instance. But with a few additional steps (and maybe a little bit manually work) we also get the old OOoWiki editable again and with a bit luck the German community can bring the content up to date.

For any language community – if you have somewhere resources (even not under your control) and you want to get them under somehow control of the TDF (and I’m not only talk about wikis!) ask the infra team and me. Very likely we can find a solution to get the stuff working.

Eskiwiki and trwiki

Fighting “Version 7.0”-Vandalism

In my last post I wrote about the most recent fight against spammers. We  sadly also have to fight vandalism. Luckily The Document Foundation had very seldom problems with human vandals, but this seems to change slowly. Since a few months a person (maybe a group up to five person) try to increase the version number for the next major release to 7.0 without having any support by the marketing folks, the TDF, the ESC or whatever team.

Many involved persons in the wiki team do not understand the obsession to increase a stupid number. As far as I know this person / these persons do not have added any valuable contribution to the project (neither code, translations, marketing, etc.). Moreover we do not understand the sense in increasing the version number without having an important reason or new features in the product, that justifies a version bump.

The problem with these kind of vandalism is that the manual work to revert and delete the stuff can be incredible high. Additional these kind of vandalism sadly cannot be prevented by filters, rules or (IP or username) blocks as this is simply to “random”. For that reason many related pages (e.g. the ReleasePlan) have been protected (move, edit, creation), although we do no like that. The whole project tries to be open as possible and tries not to create unnecessary barriers for new users.

 

For the case this persons reads this post, feel free to response here (or in the wiki), but do not increase our workload for reverting stuff. This doesn’t help anybody!

 

Fighting “Version 7.0”-Vandalism

Spammers are dump

Long time no post, but actual events show that there is need for a new post.

 

In the last few weeks the amount of spam and thus the workload for the active admins (K-J, fitoschido, Eric, myself) got high for deleting spam and blocking these accounts.

 

As this was too time consuming (~12 pages per day), I had to thing about another solution.

 

Luckily I have some experience with the extension “AbuseFilters”. The result of our new AbuseFilter 67 are impressive:

AbuseFilterLog

The result of two days active filter is great.

Statistics: Of the last 636 actions, this filter has matched 25 (3.93%).

(actions == User creation, page moves,  page deletions, edits, changing user rights, etc.)

 

The logic behind this filter is rather easy:

  • all new accounts (0 edits),
  • edits which want to be done in the User:-Namespace
  • a new page should be created
  • the page content does not start with TDF

if these criteria are valid, then the user has to click again on save. If that happens (and the actual spammer show that they do not read or at least do not hit save again) then the user gets automatically blocked.

 

New valid users do get a warning, both then they try to edit the USer-namespace above (see https://wiki.documentfoundation.org/WikiAction/edit/User:Dennisroczek/10 ) but also as a warning when asking to hit save again.

 

We – TDF wiki admins – do follow edits (see screenshot above) which got prevented by this filter and check for false positives.

Spammers are dump

Nabble interface…

Since a few months I’m also an administrator of our Nabble archives. I started to check the posts and removing spam and banning the spammer. Slowly I get one archive cleaned after another.

From time to time I realize that the interface is missing some bits, so I change the interface in the Nabble’s “NAML macro language”. I added access keys to places where I think it is useful or where I use it on my own.

This weekend I had my fight with the NAML to get web feeds placed to the html head-tag for a few hours as their macro language is not documented and not really intuitive. 😦

 

If you have any improvement requests, simply drop me a line. 😉

 

Nabble interface…

Some updates on the TDFWIKI

Last month we had some spam attacks on our TDFWIKI. We blocked the new accounts and deleted the pages manually. A really big thank goes to cloph as he helped me and disabled the manual registration for new users when I wasn’t available. 🙂

After that we had modified again our Title-Blacklist to prevent pages create with well known spam titles (mostly including phone numbers for Canada and USA, some words like phone support, printer assistance, etc.).

Now we should improve our “abuse filter” which checks every edit for spammers who try to add phone numbers to the page content itself. Sadly we realized that it prevented sophie to add and create the recordings of the BoD calls, so it got deactivated again. This is still on my todo list.

 

After rereading my “unread” planet blog post – in this case Charles’ Losing the Art of Wiki , I added as suggested by beluga / buovjaga to add yet another extension: an improvement to the RecentChanges page.

Some updates on the TDFWIKI