Hackfest Achievements and ESKIWIKI

Hey everybody,

long time I didn’t reported any progress, simply because I hadn’t that much spare time to work on this any topic. So basically the Munich Hackatron last weekend was exactly the “weekend blocker” I needed to get back into this topic. At that weekend I finally was able to import most edits from the old Turkish “eskiwiki” (OpenOffice.org.tr)-wiki. I’m still cleaning up the imported edits and I still do even find missing pages as you can see at the Special:Log page. With all the amount of spam in ESKIWIKI it is really not that easy to find pages with real content. I should have found every page, but as you can see in the ImportLog that I found another few pages this weekend.

Now my focus is at the image cleanup and preparing all pages to move over to TDFwiki. Sadly all this work is very time consuming so it lasts very long.

I do have a special request to the Turkish speaking community members:

Can you “merge” the content of the old OOo Writer page and the newer one. The other component pages do have the same problem. A list of the old OOo pages are linked in this old revision. Otherwise the links on the old OOo pages are unlinked and very hard to find.

Fun fact: After 9½ years I fixed a bug and embed the correct image at the page “SIKLIK”, see this revision. The problem was that both images were first uploaded and then a simple copy and past error followed when creating the page. I found this fact as I simply uploaded all images from the eskiwiki and now checking all unused images and wanted images (and either try to fix it, or delete them again for not uploaded the old stuff to TDFwiki later). Sadly a few images are borked (1x1px) and aren’t even archived at archive.org – so even borked ages ago.

Let’s see how much spare time I find in the next weeks to finish the rest.

Hackfest Achievements and ESKIWIKI

Some Statistics on Spam

Well, lately the TDF Wiki Admins luckily hadn’t much to do, but well I still do want to provide some statistics as the results are very impressive.

Our activated “abuse filter” has caught 2214 edits in the the time from 26 March 2018 to 12 June 2018 (so within 78 days) done by 1244 users. Be careful in interpreting the data as after many weeks the spammer detected the problem and tried to hit enter multiple times but with a new session (so all the accounts do not get blocked, part 2 of the rule was never hit except one false positive and by one of my test accounts!).

 

For a very short test in the night this filter seems very successful at the moment at least for my nerves for the last 3 months. Finger crossed that this stays so in our wiki (and of course all other platforms we provide – if they have spam problems, I do not know, except nabble)!

Some manually work still include 9 classical spammers (so it seems somebody other) who tried to post something in the “main space” / “article space” which we need to block and delete manually, but in 78 days this were only 9 times!!! That’s admin work as I love it: concentrate on the content, not fighting spam!

To mention: another abuse filter which hits much more the false positive, but is alive since ages has protected us from another 5 edits by 4 users from spam. This is ok, as the warning was reworked to make it more clearly what we do want and that most regular users also got excluded finally…

Some Statistics on Spam

Archive.org Resources

archive.org has a great web archive – containing many archived revisions of all web pages. You might even know that – if not, try it out! Even sites are archived which are offline since ages! Archive.org works since 1998(!) – shortly after the beginning of the internet itself!

 

Why do I write this blog post? What does this have to do with LibreOffice?

It’s rather easy: eskiwiki and trwiki (which I do try to integrate into tdfwiki) do both based on a not maintained server where nobody has access. There is a great tool and initiative out there called wikiteam, who does try to archive and make dumps of wikis accessible (not only mediawiki).

I do know that there are requests to get content of the “wiki Pardus” (the offline wiki of the Linux distribution Pardus) and what a luck there is a backup of the Pardus wiki in their archives.

 

So again: If you know a great resource (even offline) for help pages, FAQ pages, tutorials, wikis, etc. pp. then ask us.

Archive.org Resources

Eskiwiki and trwiki

There are many old – often read-only – resources out there. Sometimes still based for OpenOffice.org and thus heavily outdated. Many creators and administrators of these sites are either no longer active (in neither LibreOffice nor OpenOffice communities) and many of these resources have been made read-only for some reasons.

Original I got the request from the German community to recreate somehow the content of the old http://www.ooowiki.de back to live. The doing is on my To-Do-List since ages, and it will happen at some time. First I do want to test some easier conversation of a straight on two Turkish wikis: Eskiwiki (“old wiki”) and TRWIKI. Both are based on MediaWiki and can be merged first within in itself (so the real content without spam goes from Eskiwiki to TRWIKI). After the first migration a rather short test-migration should happen to the test instance of the TDFwiki. Hopefully everybody involved will give ones go to move than to the live-TDFwiki. Of course we will try to preserve working links and contributions (who made the edits) as much as possible / the license allows. All these tasks are not always easy nor clear, e.g. do we can preserve the links without any breakage; or can we preserve the attribution of the edits or map these to the correct account.

All in all these involves many steps even in such relative easy example as the TRWIKI; but getting to pages like the read-only “only in HTML format” OOoWiki requires some more steps. OOowiki original based on MoinMoin-Wiki and the generated HTML was optimized and simplified. So this is not a simply straight forward like in the two Turkish wikis which simply need to export the XML content (of all relevant revisions) and import this into another MediaWiki instance. But with a few additional steps (and maybe a little bit manually work) we also get the old OOoWiki editable again and with a bit luck the German community can bring the content up to date.

For any language community – if you have somewhere resources (even not under your control) and you want to get them under somehow control of the TDF (and I’m not only talk about wikis!) ask the infra team and me. Very likely we can find a solution to get the stuff working.

Eskiwiki and trwiki

Fighting “Version 7.0”-Vandalism

In my last post I wrote about the most recent fight against spammers. We  sadly also have to fight vandalism. Luckily The Document Foundation had very seldom problems with human vandals, but this seems to change slowly. Since a few months a person (maybe a group up to five person) try to increase the version number for the next major release to 7.0 without having any support by the marketing folks, the TDF, the ESC or whatever team.

Many involved persons in the wiki team do not understand the obsession to increase a stupid number. As far as I know this person / these persons do not have added any valuable contribution to the project (neither code, translations, marketing, etc.). Moreover we do not understand the sense in increasing the version number without having an important reason or new features in the product, that justifies a version bump.

The problem with these kind of vandalism is that the manual work to revert and delete the stuff can be incredible high. Additional these kind of vandalism sadly cannot be prevented by filters, rules or (IP or username) blocks as this is simply to “random”. For that reason many related pages (e.g. the ReleasePlan) have been protected (move, edit, creation), although we do no like that. The whole project tries to be open as possible and tries not to create unnecessary barriers for new users.

 

For the case this persons reads this post, feel free to response here (or in the wiki), but do not increase our workload for reverting stuff. This doesn’t help anybody!

 

Fighting “Version 7.0”-Vandalism

Spammers are dump

Long time no post, but actual events show that there is need for a new post.

 

In the last few weeks the amount of spam and thus the workload for the active admins (K-J, fitoschido, Eric, myself) got high for deleting spam and blocking these accounts.

 

As this was too time consuming (~12 pages per day), I had to thing about another solution.

 

Luckily I have some experience with the extension “AbuseFilters”. The result of our new AbuseFilter 67 are impressive:

AbuseFilterLog

The result of two days active filter is great.

Statistics: Of the last 636 actions, this filter has matched 25 (3.93%).

(actions == User creation, page moves,  page deletions, edits, changing user rights, etc.)

 

The logic behind this filter is rather easy:

  • all new accounts (0 edits),
  • edits which want to be done in the User:-Namespace
  • a new page should be created
  • the page content does not start with TDF

if these criteria are valid, then the user has to click again on save. If that happens (and the actual spammer show that they do not read or at least do not hit save again) then the user gets automatically blocked.

 

New valid users do get a warning, both then they try to edit the USer-namespace above (see https://wiki.documentfoundation.org/WikiAction/edit/User:Dennisroczek/10 ) but also as a warning when asking to hit save again.

 

We – TDF wiki admins – do follow edits (see screenshot above) which got prevented by this filter and check for false positives.

Spammers are dump

Nabble interface…

Since a few months I’m also an administrator of our Nabble archives. I started to check the posts and removing spam and banning the spammer. Slowly I get one archive cleaned after another.

From time to time I realize that the interface is missing some bits, so I change the interface in the Nabble’s “NAML macro language”. I added access keys to places where I think it is useful or where I use it on my own.

This weekend I had my fight with the NAML to get web feeds placed to the html head-tag for a few hours as their macro language is not documented and not really intuitive. 😦

 

If you have any improvement requests, simply drop me a line. 😉

 

Nabble interface…

Some updates on the TDFWIKI

Last month we had some spam attacks on our TDFWIKI. We blocked the new accounts and deleted the pages manually. A really big thank goes to cloph as he helped me and disabled the manual registration for new users when I wasn’t available. 🙂

After that we had modified again our Title-Blacklist to prevent pages create with well known spam titles (mostly including phone numbers for Canada and USA, some words like phone support, printer assistance, etc.).

Now we should improve our “abuse filter” which checks every edit for spammers who try to add phone numbers to the page content itself. Sadly we realized that it prevented sophie to add and create the recordings of the BoD calls, so it got deactivated again. This is still on my todo list.

 

After rereading my “unread” planet blog post – in this case Charles’ Losing the Art of Wiki , I added as suggested by beluga / buovjaga to add yet another extension: an improvement to the RecentChanges page.

Some updates on the TDFWIKI

Cleanup tasks ongoing

At the end of the year I started to cleanup many stuff in the wiki.

First I re-installed again the extension UserMerge to merge wiki users on request and did merge a user. For the case you want to merge some users, simply ping me in IRC, write me a mail or request it directly on my wiki page. 😉

A seldom spam post was quickly deleted again.

Moreover I started to fix dead links. Most links were simply moved (Sun, Oracle, Apache did make really a mess with their links) or even in our own infrastructure… Moreover my bot ran through many links to switch the http links to the securer https variants.

Moreover I try to finish a few patch sets for our Bugilla-Mediawiki-integration extension to push them upstream.

Cleanup tasks ongoing