User talk:Rjwilmsi/Archives/2016/March

Page contents not supported in other languages.
Source: Wikipedia, the free encyclopedia.

Persondata and diacritics

I've been processing some Wikidata challenges created by Kasparbot. In case you didn't know, these are inconsistencies between cached (and now deleted) Persondata entries and the data already in Wikimedia. Many of them involve article titles that are non-English names that have lost their diacritics. There's a discussion going on at Wikipedia talk:Persondata, but I'm one of those arguing that a single instance of a mis-spelling should not be regarded as a valid "Alias", and should not be confused with a convenience redirect in Wikipedia itself. A counter-argument concerns the exact-match nature of Wikidata's search.

I noticed that a run of RjwilmsiBot in late 2010 created a whole lot of Persondata entries that had the NAME field shorn of diacritics, and that's where a bunch of these challenges are coming from. Articles like Stéphane Denève and Johann Friedrich Dübner, where "Deneve" and "Dubner" are frank mis-spellings, and (I would argue) don't belong in Wikidata as an "alias".

I'd just like to understand the history: was it intentional that the diacritics be removed in Persondata? If so, what's the rationale? I realize that the barn was emptied a long time ago, but I guess we should understand the history before we proceed. David Brooks (talk) 21:59, 9 March 2016 (UTC)

The Persondata NAME parameter was originally set to be like the DEFAULTSORT i.e. without diacritics (plus surname first etc.) Rjwilmsi 08:12, 10 March 2016 (UTC)
DavidBrooks Don't deal with NAME. That's one parameter not to transfer to Wikidata. That is extremely unreliable. The NAME should come from the article only. Bgwhite (talk) 09:50, 10 March 2016 (UTC)
Thanks, both. I understand. I have this can of worms open at Wikipedia talk:Persondata, so perhaps we should join up these discussions, and I think T.seppelt could maybe change Kasparbot so that it includes the NAME recommendation as a hint. David Brooks (talk) 17:26, 10 March 2016 (UTC)

AWB external

I've come across what looks like another external processing bug.

AWB settings:

 <ExternalProgram>
   <Enabled>true</Enabled>
   <Skip>true</Skip>
   <Program>c:\cygwin\bin\gawk.exe</Program>
   <Parameters>-f /home/myname/wi-awb/demon-win.awk "%%title%%"</Parameters>
   <PassAsFile>true</PassAsFile>
   <OutputFile>h:\article.txt</OutputFile>
 </ExternalProgram>

This cfg has worked well for thousands of articles but failed on an article named Friends of the Mission Clinic of Our Lady of Guadalupe, Inc. -- AWB doesn't execute the script or move on to the next name in the list. I added debug statements in the awk script and verified the script is not executing only for this article. It's not the length of the title, other longer article names run OK. My guess is the trailing "." perhaps in combination with the length and/or "," -- GreenC 22:06, 11 March 2016 (UTC)

If you manually run the script with that title input via the Windows command line what error do you get? Rjwilmsi 08:47, 12 March 2016 (UTC)

Script runs as expected no error. I replicated the problem using a different test script.

   <Program>c:\cygwin\bin\gawk.exe</Program>
   <Parameters>-f /test.awk "%%title%%"</Parameters>

The test script writes argv[1] to a temp file. The full test.awk :

BEGIN { print "name is " ARGV[1] > "/tmp/tempfile" }

AWB says "processing page", the CPU load goes up and nothing happens. Running AWB 5.8.5.1

Running from the Windows command line:

c:\cygwin\bin\gawk.exe -f /test.awk "Friends of the Mission Clinic of Our Lady of Guadalupe, Inc."

It correctly drops the argv[1] into "/tmp/tempfile"

-- GreenC 15:48, 12 March 2016 (UTC)

OK, regex backtracking issue within the AWB code that replaces keywords (%%title%% etc.). rev 11972 fixes it. Rjwilmsi 17:11, 12 March 2016 (UTC)
Thank you. -- GreenC 20:22, 12 March 2016 (UTC)

External processing timeout

You might think I spend my days looking for AWB external processing bugs but they seem to find me. In this case the script can take a long time to complete (up to 5 minutes?) but AWB aborts after a minute or two, the same message appears about a thread aborting, as if pressing the stop button before a script completes. The script does a lot of network I/O to a remote API and has staggered delays built as their API is unreliable so occasionally it takes a long time. It is being run in unattended bot mode so an abort is a problem. Is this easily adjustable? -- GreenC 21:27, 20 March 2016 (UTC)

I configured my test external process script to sleep for 10 minutes, AWB didn't time it out. I cannot see any timeouts in the AWB code. Please post the full exception you get. Rjwilmsi 10:10, 21 March 2016 (UTC)
Discovered it only happens with Bot Autosave enabled. There is a 2 minute timeout on external scripts. A small window pops up with the header "External processing error" and window main "Thread was being aborted." with a red X in a circle to the left. That's it. Is there another place to get debugging info? It's the same error message when pressing the stop button (in the Start tab) while an external script is running. -- GreenC 17:07, 21 March 2016 (UTC)
From the code there will be an "External processing error" message box if there is an exception. As your script is started the exception must be in System.Diagnostics.Process.WaitForExit or in the in/out file, check, read and delete steps. We do not set a timeout. I ran a test script to sleep for 10 minutes, I once got the same error and further debug was that the reading of the in/out file was the cause. Another time the external process steps completed successfully with the script taking 10 minutes. I can't see why the reading of the in/out file would sometimes cause an error, could it be that your script or something else is modifying that file (e.g. if your script spawns other scripts?) Rjwilmsi 18:59, 21 March 2016 (UTC)
Yes in fact it is. The script writes the name of the article to /tmp/name.txt which is a shared directory with a Linux VirtualBox machine. The real script under Linux does it work, then writes out /tmp/article.txt (the i/o file) - which the original script detects and exits passing control back to AWB. The first pass-through script deletes /tmp/article.txt on startup to clean the slate and avoid any unexpected data from a previous aborted run. It sounds like AWB is opening article.txt on starting the external process and deleting it is a bad idea. -- GreenC 20:24, 21 March 2016 (UTC)
Yeah I didn't know AWB would hold open a file handle. I've modified the scripts so instead of deleting the i/o file it makes it zero length. And rather than copying over the original, it opens and writes to it. This seems to work. Thanks for debugging the source of the problem. -- GreenC 22:52, 21 March 2016 (UTC)

Nitrogen dioxide poisoning

Hi. You looked at Nitrogen dioxide poisoning in January. Did you go so far as to check whether the references actually supported the text? Thanks Peter Damian (talk) 21:49, 28 March 2016 (UTC)

No, I was formatting the references not reviewing the sources. Rjwilmsi 13:29, 30 March 2016 (UTC)

Why AWB requires not actual .NET, but .NET 3.5?

Consequence of this is that on Win 8/10 AWB require installing the old .NET, because preset .NET 4.5 is unsuitable for AWB. It so difficult to rewrite some lines of code to make it work on .NET 4.5? MaxBioHazard (talk) 11:38, 30 March 2016 (UTC)

  • It's my understanding that AWB will work on .NET 3.5 or newer on Win 8/10, though I do not personally have a Win 8/10 machine to test that on. If there are specific issues with AWB and .NET 4.5 perhaps you would like to raise phabricator tickets with full details so somebody may investigate them? Rjwilmsi 13:27, 30 March 2016 (UTC)