Wikipedia:Wikipedia Signpost/Single/2013-02-11

Source: Wikipedia, the free encyclopedia.
The Signpost
Single-page Edition
WP:POST/1
11 February 2013

 

2013-02-11

An article is a construct – hoaxes and Wikipedia

The views expressed in this op-ed are those of the author only; responses and critical commentary are invited in the comments section. The Signpost welcomes proposals for op-eds at our opinion desk.

Wikipedia gets quite a bit of press attention from drive-by vandalism, incoherent scribbles, rude gestures, and just plain page blanking perpetuated by Internet trolls and schoolchildren who take the site's free-to-edit model as an invitation to cause as much havoc as possible. The public perception that Wikipedia is riddled with errors and perpetually vandalized was a major retardant in the site's formative years, when it first engaged in its still-central battle for relevance and accuracy.

But this is a battle that, on the whole, Wikipedia has been winning for a lengthy amount of time. Years of nearly unchecked growth and explosive expansion have made Wikipedia not only the largest but also the most expansive information compendium the world has ever seen. Editing is tightly watched by users armed with tools like Twinkle, Huggle, rollback, semiprotection, and bots. Vandalism as we most commonly think of it is anything but dead—visible pages still regularly get as much as 50 percent of their edits reverted[1]—but today's compendium of anti-vandalism tools have confined it in lesser form to the furthest and most overtaxed fringes of Wikipedia.

The dearth of vandalism lasting more than a few seconds has done much to improve our image. Five years ago, a project as enterprising as the Wikipedia Education Program could never have even existed, let alone thrived as it does today.[2] The days when being a regular editor on Wikipedia was seen as unusual by others are slowly becoming more distant, its use ever more mainstream, and its editing body ever more academic. But another, subtler form of vandalism persists, and in the deterioration of its more visible cousin, may even be spreading—fabrication.[3] Wikipedia has a long, daresay storied history with the spinning of yarns; our internal list documents 198 of the largest ones we have caught as of 4 January 2013. This op-ed will attempt to explain why.

It's frighteningly easy

Wikipedia's policy on vandalism is complex and extensive. Coming in at 41 KB, it is best remembered by the {{nutshell}} wrapper that adorns its introduction, stating that "Intentionally making abusive edits to Wikipedia will result in a block", a threat carried through more often than not. At just over 5k, the guideline on dealing with hoaxes is comparatively slim, and readily admits that "it has been tried, tested, and confirmed—it is indeed possible to insert hoaxes into Wikipedia". It is not hard to tell which is the more robust of the two policies.

First and foremost, this is a consequence of Wikipedia's transitional nature. The site has become mired somewhere between the free-for-all construction binge it once was, and the authoritarian, accuracy-driven project it is quickly becoming. The days of rapidly developing horizontal sprawl are long gone, swallowed up by the project's own growth; increasingly narrow redlink gaps and ever deeper vertical coverage are the new vogue, spearheaded by the plumping of standards and the creation of such initiatives as GLAM and the Education Initiative. Wikipedia gets better, but it also gets much more specialist in nature, and this has a major impact on its editing body. Explosive growth both in the number of articles and the number of editors, once the norm, has been superseded by a more than halved level of article creation and the declining number of active editors, both besides bullish, frankly unrealistic growth projections by the Wikimedia Foundation.[4] The project has reached its saturation limit—put another way, there simply aren't enough new people out there with both the will and the smarts to sustain growth—and the result is that an increasingly small, specialized body of editors must curate an increasingly large, increasingly sophisticated project.[5]

A sparser, more specialized editing body dealing with highly developed articles and centered mainly on depth has a harder time vetting edits than a larger, less centric one focused more on article creation. Take myself as an example: while I have the depth of field to make quality tweaks to Axial Seamount, I could never do as good a job fact-checking Battlecruiser as a Majestic Titan editor could, and I cannot even begin to comprehend what is going on at Infinite-dimensional holomorphy. This hasn't mattered much for pure vandalism: the specialization of tools has proved more than adequate to keep trollish edits at bay. But vetting tools have not been so well-improved; the best possible solution available, pending changes, has received a considerable amount of flak for various reasons, and has so far only been rolled out in extremely limited form. On pages not actively monitored by experienced editors, falsified information can and indeed does slide right through; with an ever-shrinking pool of editors tending to an ever growing pool of information, this problem will only get worse for the foreseeable future.

The relative decline in editor vetting capacity is paralleled by the ease with which falsehoods can be inserted into Wikipedia. Falsified encyclopedic content can exist in one of three states, by its potential to fool editors examining it: inserted without a reference, inserted under a legitimate (possibly offline) reference that doesn't actually support the content, and inserted under a spurious (generally offline) reference that doesn't actually exist. While unreferenced statements added to articles are often quickly removed or at least tagged with {{citation needed}} or {{needs references}}, editors who aren't quite knowledgeable about the topic at hand passing over a page are extremely unlikely to check newly added references, even online ones, to make sure the information is legitimate. This is doubly true for citations to offline sources that don't even exist. Taking citations valeur faciale is standard operating procedure on Wikipedia: think of the number of times that you have followed a link through or looked up a paper or fired off an ISBN search to ascertain the credibility of a source in an article you are reading; for most of us, the answer is probably "not many". After all, we're here to write content, not to pore over other articles' sourcing, a tedious operation that most of us would rather not perform.

This is why complex falsifications can be taken further than mere insertions: they can achieve the kinds of quality standards that ought to speedily expel any such inaccuracies with great prejudice. The good article nominations process is staffed in large part by two parties: dedicated reviewers who are veterans of the process, and experienced bystanders who want to do something relatively novel and assist with the project's perennial backlog. In neither case are the editors necessarily taking up topic matters they are familiar with (most of the time they are not), and in neither case are the editors obligated to vet the sourcing of the article in question (they rarely do; otherwise who would bother?[6]), whatever the standards on verifiability may be. And when a featured article nomination is carried through without a contribution of content experts (entirely possible), or the falsification is something relatively innocent like a new quote, such articles may even scale the heights of the highest standard of all in Wikipedia, that much-worshiped bronze star! Nor are hoaxes necessarily limited to solitary pages; they can spread across Wikipedia, either through intentional insertions by the original vandal, or through the process of "organic synthesis"—the tendency of information to disseminate between pages on Wikipedia, either through copypaste or the addition of links.

Then why aren't we buried?

Readers of this op-ed may well take note of its alarmist tone, but they need not be worried: studies of Wikipedia have long shown that Wikipedia is very accurate, and, by derivation, that false information is statistically irrelevant. Well, if as I have striven to show manufacturing hoaxes on Wikipedia is so strikingly easy, why isn't a major problem?

Answering this question requires asking another one: who are vandals, anyway? The creation of effective, long-lasting hoaxes isn't a matter of shifting a few numbers; it requires an understanding of citations and referencing and the manufacture of references to sources, the positing of real intellectual effort into an activity only perpetuated by unsophisticated trolls and bored schoolchildren, and as it turns out the difficulties involved in making believable cases for their misinformation are a high wall for would-be vandals. And even when real hoaxes are made, studies have shown that Wikipedia is generally fairly effective (if not perfect) at keeping its information clean and rid of errors. Hoaxes have reached great prominence, true, but they are small in number, and they can be caught.

But there is nonetheless a lesson to be learned. Wikipedia is extremely vulnerable. If some sophisticated wash wants to launch a smear campaign on the site, falsification would be the way to do it; and that is something that should concern us. The continual unveiling and debunking of hoaxes long after they have been created is a drag on the project's credibility and on its welfare, and when news breaks out about hoaxes on the site in the media it takes a toll on our mainstream acceptance. This is not a problem that can be easily solved; but nor is it one that should be, as it is now, easily ignored.

Addendum: some highlights

Sorted by date of discovery, here is a selection of what I consider to be fifteen of the most impactful and notable hoaxes known to have existed on Wikipedia.

  • November 6, 2003 – February 23, 2004: Uqbar. One of the earliest hoaxes to have been debunked, the kingdom of Uqbar is a historical hoax (a story within a story) that was passed off as real early in Wikipedia's history.
  • December 2004 – April 2005: Roylee. A referral for comment on four months of activity from a user who "has carried out a sustained introduction of fringe theories and original research into a large number of articles (145 listed at User:Mark Dingemanse/Roylee [defunct]) since December 2004."
  • May 26 – September 22, 2005: Wikipedia biography controversy. To quote from the article: "a series of events that began in May 2005 with the anonymous posting of a hoax article ... about John Seigenthaler, a well-known American journalist. The article falsely stated that Seigenthaler had been a suspect in the assassinations of U.S. President John F. Kennedy and Attorney General Robert F. Kennedy. Then 78-year-old Seigenthaler, who had been a friend and aide to Robert Kennedy, characterized the Wikipedia entry about him as "Internet character assassination". The hoax was not discovered and corrected until September 2005... after the incident, Wikipedia co-founder Jimmy Wales stated that the encyclopedia had barred unregistered users from creating new articles."
  • October 5 – 26, 2005: Alan Mcilwraith. A former call center worker who created a new identity for himself as a decorated military man on Wikipedia, complete with an in-uniform portrait (now known to have been bought on eBay). The story hit headlines in April 2006, and the article was recreated—now about the hoax he perpetuated (see Signpost coverage).
  • ? – March 3, 2007: Essjay controversy. The only fabrication on Wikipedia major enough to have a 39k Good article to call all of its own, this was a hoax not in the classical sense—that is, not carried out across the mainspace—but in an extremely prominent editor's falsified credentials; when combined with a poorly timed promotion to ArbCom, the result was a spectacular fireworks display.
  • November 2005 – 21 June 2007: Baldock Beer Disaster. A disaster in more ways than one; the article appeared on the Main Page as a Did you know? entry on November 25, 2005, and was not rooted out until more than a year and a half later.
  • November 18 – December 18, 2008: Edward Owens hoax. A fisherman turned pirate who never really existed, created by students as part of a class exercise at George Mason University; now has its own article.
  • September 13–14, 2010: Roger Vinson. An addition was made claiming that the man in question, a federal judge in Florida, is an avid taxidermist who displays mounted bear heads in his courtroom. When Rush Limbaugh used this erroneous information on his talk show, it sparked a media reaction—a demonstration of how even relatively short-lived pieces of vandalism can be damaging.
  • Spring 2009 – October 2011: Cohen-Cruse Ruse. "A number of apparent sock puppets seem to be creating an elaborate set of fake pages around a few members of a "Cohen" and a "Cruse" family. It involved a number of completely (very carefully) faked biographies, other faked things (like synagogues) and a lot of associated edits to real pages that attempted to justify and contextualize those fake people." It lasted two years, and a major community clean-up followed.
  • ? – February 15, 2012: Legolas2186. Allegations of impropriety were brought against Legolas2186, a prolific (and supposedly trustworthy) writer with a large number of Madonna-related article credits to his name. As was eventually discovered, Legolas had been manufacturing sources, inventing information, and generally doing as he damn well pleased with his sourcing. A permanent ban and months of clean-up by the community followed (see Signpost coverage).
  • March 8, 2006 – March 21, 2012: Brierfield, Lancashire. An addition was made claiming that the small town was the primary inspiration for Tolkien's Mordor. By the time it was removed in March 2012, it had been on the page for a good six years.
  • June 9, 2004 – July 13, 2012: Gaius Flavius Antoninus. Created on June 9, 2004 and lasting eight years and one month before discovery, this purported assassin of Julius Caesar has the honor of being the longest-lasting hoax ever created on Wikipedia. Given the level of dissemination that happened in that time and the prominence of Caesar's (historically classical) assassination, it's also probably one of the most illustrative of the failings of Wikipedian vetting.
  • September 25 – November 19, 2012: Chen Fang. Chen Fang was the mayor of a small town in China, but he was also a student at an American university who created a fictional article about himself to make a statement about Wikipedian inaccuracy, and his case was cited in a Harvard University writing guideline on the topic. It took seven years and two months for someone to notice.
  • July 4, 2007 – January 28, 2013: Bicholim conflict. The primary inspiration for this op-ed, the Bicholim conflict is (was) one of the most complex and well-crafted hoaxes to have existed on Wikipedia, and spent half a decade, most of its life, as a supposedly verified Good article. A complete fabrication, in 4,500 words it described a clash between colonial Portugal and the Indian Maratha Empire in an undeclared war that supposedly helped cement Goa's independence (see Signpost coverage).
  • ? – February 1, 2013: Bonō Pusī Kalnapilis. A hoax created on our sister project, the German Wikipedia, that was not discovered to be a hoax until it was selected as a Did you know? entry, spending two hours on the main page before being caught.

Notes

  1. ^ See the rough guide to semi-protection.
  2. ^ Not to imply that it has been unilaterally successful, but rather that it is quite voluminous.
  3. ^ The difference between fabrication and hoaxes on Wikipedia is not strictly defined, as Wikipedia hoaxes are technically articles that are spurious. This op-ed will treat the matter in a wider sense and include smaller bits of misinformation.
  4. ^ Per the movement goals of the Strategic Panning Initiative.
  5. ^ For more information on the why of Wikipedian editing trends, refer to this op ed: "Openness versus quality: why we're doing it wrong, and how to fix it". For more details on the Wikimedia Foundation's response, refer to this special report: "Fighting the decline by restricting article creation?".
  6. ^ Good article reviewers are as much regular editors as the next fellow, which means that they find vetting references about as fun as the next fellow—that is to say, not at all. But see revisions made to the reviewing guideline in light of recent discussion on the topic.


Reader comments

2013-02-11

A lousy week

This edition covers content promoted between 3 and 9 February 2013.
A head louse

Featured articles

NGC 1316
Incredipede
A beehive
Pinball machine

Six featured articles were promoted this week:

  • Richard Wagner (nom) by Smerus et al. Wagner (1813–1883) was a German composer, theatre director, polemicist and conductor who greatly influenced the development of classical music. Primarily known for his operas, Wagner began writing music in the 1820s with influences from Beethoven. He remained highly productive, but until his final years, Wagner's life was characterised by political exile, turbulent love affairs, poverty and repeated flight from his creditors.
  • Fort Dobbs (North Carolina) (nom) by Cdtew. Fort Dobbs was an 18th-century fort in the Province of North Carolina which was used for frontier defense during and after the French and Indian War. Named after Arthur Dobbs, it was the only fort on the frontier between South Carolina and Virginia during its active years. After being abandoned in 1766 it disappeared; the site was only rediscovered in 2006.
  • Homework (Daft Punk album) (nom) by Hahc21. Homework is the 1997 debut of French electronic music duo Daft Punk. Produced without plans to make an album, it was ultimately released on Virgin Records to commercial success, charting in 14 countries and selling more than 2 million copies. It ignited international interest in French progressive house and touch music.
  • Alloxylon flammeum (nom) by Casliber. Alloxylon flammeum is a medium-sized tree of the family Proteaceae native to tropical rainforests in Queensland, Australia. Formally described in 1991 after being split out from Oreocallis, it is readily available for cultivation. It prefers areas with good drainage. It is nationally considered "vulnerable" as its habitat is the target of clearing.
  • Percy Fender (nom) by Sarastro1. Percy Fender (1892–1985) was an English cricketer who captained Surrey for ten years and played in 13 Test matches. He was noted as a quality cricketer as early as 1914, and in 1920 he set a currently-unbroken record by hitting a first-class century in 35 minutes. Although a popular choice with the press, he never became captain of the England national team. After his cricket career ended in 1935 he continued to write about the subject.
  • The King and I (nom) by Wehwalt and Ssilvers. The King and I is a 1951 musical adapted from Margaret Landon's novel Anna and the King of Siam by Richard Rodgers and Oscar Hammerstein II. The two act drama follows a British schoolteacher who is hired by the King of Siam; their relationship is marked by conflict and an unrecognised love. The play was a hit, running for three years on Broadway and receiving three Tony Awards. It continues to be staged by amateur and professional troupes.

Featured lists

One featured list was promoted this week:

Featured pictures

Fourteen featured pictures were promoted this week:

Scorpion Pass, part of Route 227 in Israel.


Reader comments

2013-02-11

Just the Facts – WikiProject Infoboxes

This week, we got the details on WikiProject Infoboxes. Started in January 2007, the project seeks to make infoboxes look consistent across Wikipedia's articles and provide tools for editors to create new infobox templates when the need arises. The project's efforts are greeted by enthusiastic data miners employing the microformats included in many infoboxes, while criticism is flung by some WikiProjects where the use of infoboxes has created controversy. The work done by WikiProject Infoboxes impacts projects covering nearly every topic, from bioboxes for people to taxoboxes for species to detailed route diagrams for transportation. We interviewed Andy Mabbett (Pigsonthewing), Chris Cunningham (Thumperward), kosboot, Sameboat, Van (Vanisaac), and Daniel Mietchen.

Why are infoboxes beneficial for Wikipedia articles? How can infoboxes be used outside of Wikipedia? What purpose does WikiProject Infoboxes play in improving the infoboxes used throughout the encyclopedia?
Andy Mabbett: The benefits of infoboxes include:
  • A quick and convenient summary of the key facts about a subject in a consistent format and layout
  • Emission of machine readable metadata
    • Infoboxes about people, places, buildings, organisations, products, species and dated events (battles, sports fixtures, record releases, etc.) and more emit microformats; see Wikipedia:microformats
    • Data is made available to third party tools such as DBpedia and Freebase
    • Forthcoming integration with Wikidata
Chris Cunningham: As to what the WikiProject does, its aim is to make infoboxes simpler to create and maintain and to give them a simple and consistent appearance which is as accessible as possible for those accessing the data through means other than a graphical Web browser (to that end, there's an overlap with WP:WPACCESS).
Andy Mabbett: Also see Wikipedia:WikiProject Accessibility/Infobox accessibility.
The Signpost
The nameplate of The Signpost
TypeWeekly newspaper
FormatOnline publication
Owner(s)None
Founder(s)Michael Snow
PublisherWikipedia
EditorThe ed17
Staff writersVolunteers
Founded10 January 2005; 19 years ago (10 January 2005)
HeadquartersThe Signpost Building
123 Fourth Avenue
WikiWorld, Mare Cognitum, The Moon
Circulation7 billion
OCLC number52075003
WebsiteWikipedia:Wikipedia Signpost
The use of infoboxes has not been formally standardized across Wikipedia, inspiring multiple competing essays and becoming a subject of controversy for some WikiProjects. Why have infoboxes stirred such fierce debate? What can be done to soothe concerns about infoboxes?
Andy Mabbett: Essays decrying infoboxes represent the views of a small but vocal minority of editors. Wikipedia has well over a million infoboxes; that demonstrates wide community support for them.
Van: More to the point, essays on unhelpful infoboxes focus on the bad implementations, which only goes to exemplify how helpful the vast majority of infoboxes can truly be. When an editor tries to simplify overly complex information, eliminate nuance, and uses an infobox to avoid actually writing the article, you end up with unhelpful, poor articles - the same outcome as if the editor only focuses on images. Ignoring the actual writing will always result in a bad article, whether you spent your effort on filling out an infobox or not.
Chris Cunningham: In my opinion the degree to which infoboxes are supposedly controversial is significantly overestimated. With regards to the point that infoboxes are often developed to the detriment of the article itself, that is simply because editing an infobox is often the lowest barrier to entry when editing an article: I'm heavily involved in a WikiProject which deals with tens or hundreds of thousands of BLPs, and we rely extremely heavily on casual editors to keep them up-to-date: a great deal of that work is through simple infobox updates. Without a simple, consistent entry point to articles like that, we simply wouldn't get those edits in my opinion.
kosboot: I participate in two projects WP:OPERA and WP:CM where a majority of active participants are vociferously anti-infobox. I think the major problem on Wikipedia regarding infoboxes is that there is not a clear explanation of their purpose. Their purpose is not (or should not be) aesthetic; their fundamental purpose is setting the groundwork for making Wikipedia a repository of structured data in preparation for the Semantic Web. In that regard, they are as important as any markup on a Wikipedia page -- in other words, they should be mandatory. I don't know how this can be emphasized enough other than a reminder on every page that what an editor does is for the future of the web.
There is no shortage of infoboxes available to editors working on articles. Why are there so many different types? Are there any infobox templates that could be considered "typical" or "standard"? To what templates should a contributor look for inspiration when building a new infobox template?
Andy Mabbett: Because anyone can edit Wikipedia! Work continues to merge overly-similar infoboxes, and delete those which are redundant. We need to better educate editors that infoboxes should not be forked just because a minor change is required. The best infoboxes are based on the {{Infobox}} framework and do not unnecessarily override its default style.
Chris Cunningham: One of my pet projects has been the introduction of a "module" system for {{infobox}} which allows smaller "sub-infoboxes" to be plugged into common bases such as {{infobox person}}. This allows for us to move all common biographical detail (such as birthplace, family information, eduction and so on) to be kept in one place, and then for simple additional pieces to be added in for career information and such as required. And on a simpler level, a great many of our infoboxes (such as infoboxes on different types of buildings, or on towns or railway stations in different countries) are pretty much redundant to one another. A huge amount of work in consolidating these templates has already been done, but we've still got a long way to go.
Transportation WikiProjects
WikiProject Transport
WikiProject Cycling
to sports projects
WikiProject Automobiles
WikiProject Trucks
WikiProject Motorcycling
WikiProject Buses
WikiProject Streetcars
WikiProject Trains
WikiProject Stations
WikiProject Rapid Transit
to local/regional projects
WikiProject Bridges
WikiProject Highways
to regional/national projects
WikiProject Aviation
WikiProject Aircraft
WikiProject Airlines
WikiProject Airports
WikiProject Gliding
WikiProject Spaceflight
WikiProject Rocketry
to astronomy projects
WikiProject Water
to waterway projects
WikiProject Ships
WikiProject Shipwrecks
WikiProject Sailing
to regional/historical/business projects
WikiProject Travel and Tourism
to geography/entertainment/culture projects
WikiProject Technology
WikiProject Council
Not to scale
Many articles dealing with transportation include a route diagram template in their infobox. How did this template and its graphical style come into existence? Has there been any collaboration across languages? What can be done to improve this template and route maps in general on Wikipedia?
Sameboat: The route diagram template project (RDT) was started by the German Wikipedians in 2006 and implemented to English Wikipedia in 2007. Technically speaking, projects of both languages continue to advance the template codes independently. However, since all the icons which are used to compose the map are shared in Wikimedia Commons, Wikipedians from different projects collaborate with each other to create new icons. Speaking objectively, the current form of English Wikipedia RDT is more advanced than German for the icon overlaying function. This means if someone is transwikiing the map in English Wikipedia which uses the overlaying function to German Wikipedia, the map has to be redone or create new icon which the chance of being reused in other maps could be very small. The other problem of the RDT that concerns me all this time is the accessibility for visually impaired readers. Although there was an experiment to implement the alt attribute for each icon individually, it is extremely impractical to create the alt text for over 12,000 RDT icons search result. The other problem related to visual accessibility is that the colors used to distinguish the heavy rail (red), metro/light rail (blue) and unused/under construction line (lighter colors) could be very confusing to color blind readers. I don't have any color weakness issue myself, so this is where I want the comment from those color blind readers on RDT maps with mixing icons of more than one kind of shades (for example, {{East London Line original RDT}}).
What difficulties arise from biographical infoboxes? Are some fields in an infobox more necessary than others? What impact has the proliferation of infoboxes about people in different professions had on the consistency of Wikipedia articles?
Van: Biographical infoboxes suffer from the same concerns as biographies in general: unsourced/poorly sourced materials and WP:BLP violations will always be a problem. Many infobox fields are completely inappropriate for some subjects, as different people's lives, work, and relationships vary greatly in complexity. The proliferation of personal infoboxes is equivalent to the the proliferation of other kinds of infoboxes: as a WikiProject, one of our goals is to recombine infoboxes forked for spurious reasons and eliminate the redundancies; but different fields of biography may imply vastly different pieces of relevant information, meaning a different infobox for a Nobel Peace Prize winner and an Academy Award winning director.
Chris Cunningham: In general, Wikipedia articles become more consistent with one another over time, and infoboxes on BLPs are no exception to this. I've noticed a staggering improvement in this area over the last five years or so. With regards to specific details being more or less important in particular cases, views differ on whether this is a matter of style to be enforced socially or a matter of policy to be enforced technically. Particular infoboxes have moved in both directions over time.
How well do geographic infoboxes fulfill their purpose? How do editors determine what information should be readily available to readers at a glance and what information can be left in the article's paragraphs of text?
Van: Geographic infoboxes are some of the most effective, generally well-executed in the entirety of Wikipedia. The project of implementing microformats means that large amounts of data can be machine extracted, and will allow for encyclopedic details across a large set of geographic entities to be collated, searched, and compared. For users, geographic infoboxes enable quick finding of salient details, so that anyone can accumulate a data set for their personal use.
Chris Cunningham: I still feel we have a great deal of work to do in reducing the massive amount of redundancy in per-country infoboxes. However, this leads to another problem in that our base templates in this area are extremely complicated due to the huge number of features and edge cases to be catered for. I'm not sure to what degree we're going to be able to tackle this in the near future, as geographical articles are probably the most common ones to be created en masse through database extraction and as such the work required is staggering.
Are some scientific articles better suited for the use of infoboxes? What benefits and limitations do taxonomy infoboxes (taxoboxes) bestow upon articles about different species? Likewise, what impact do chemical infoboxes (chemboxes) have on articles covering chemical compounds?
Andy Mabbett: Many scientific articles are well-suited to infoboxes. The Taxobox emits a 'species' microformat.
Chris Cunningham: {{Taxobox}} is the ancestor of the infobox system, and as such it's got a great deal of inertia behind it. I previously worked on bringing it more into line with what we typically expect from modern infoboxes, but that work wasn't completed. In a way it's somewhat opposite to {{chembox}}, which provides a dazzling array of information and is frequently (indeed, perhaps mostly) the primary focus of the articles it's contained on. The degree to which a given article suits an infobox depends primarily on how much comparative information there is on it: chemical properties can be compared across a huge range of articles, for instance, and all animals have taxonomy data. Conversely, articles on things like engineering practices are poor fits for infoboxes.
How do you see infoboxes evolving in the future? What new features still need to be developed? How can a new contributor help today?
Andy Mabbett: More non-standard infoboxes should be migrated to the {{Infobox}} framework. Parameter names need to be standardised (for instance, different infoboxes use |URL=, |website=, |homepage= to mean the same thing). Similarly, the community needs to agree sets of standard parameters, for example for biographies, so that we don't have, say, |spouse= for actors, but not musicians.
Daniel Mietchen: I am looking forward to templates - and thus infoboxes - becoming more integrated across languages and perhaps projects. Other than that, I would like to see more multimedia elements in taxoboxes, when it makes sense. For instance, many animals produce sound, but taxoboxes currently do not have a field for sound files (or videos, for that matter).
Chris Cunningham: In terms of growing the project, a visual editor for simple updates to infobox data fields would be one of the biggest boons we could hope for as regards casual contributions. I dearly hope this is accounted for in the present plans for visual editing. As for how editors can help with infoboxes today, there are still plenty of redundant templates that could be consolidated; that has been the major focus in this area for years, and we're not going to be done any time soon.
Andy Mabbett: ...not to mention the many articles lacking infoboxes, to which one can be added by any editor!


Next week, we'll visit one of the transportation projects included in the route diagram above. Until then, draw squiggly lines in the archive.

Reader comments

2013-02-11

Wikipedia mirroring life in island ownership dispute

The location of the disputed islands.

On 5 February 2013, Foreign Policy published a report by Pete Hunt on editing of the Wikipedia articles on the Senkaku Islands and Senkaku Islands dispute. The uninhabited islands are under the control of Japan, but China and Taiwan are asserting rival territorial claims. Tensions have risen of late—and not just in the waters surrounding the actual islands:


As the Foreign Policy article reports, the talk page of the Senkaku Islands article is replete with accusations of bias and censorship, with each side claiming to uphold Wikipedia policy—conduct which, Hunt says, mirrors that of Japanese and Chinese officials citing international law to back up their claims and counterclaims.

The growth of the on-wiki dispute paralleled that of the real-world conflict. Created in 2003, by User:Menchi, the Senkaku Islands article originally gave preference to the traditional Chinese name in its lead sentence, with the Japanese name mentioned second, and it was short, at just 300 words. By January 2010, it had grown to more than ten times that size, with 43 sources cited. In October 2010, User:Tenmei created a standalone article on the conflict.

As the political conflict around the islands intensified, so did the conflict at the Wikipedia article. The first point of contention was the islands' very name—should it be Diaoyutai Islands (the Taiwanese name), Diaoyu Islands (preferred in China), or the Japanese name, Senkaku Islands. Some editors advocated using the English name, Pinnacle Islands, to avoid the appearance of bias, but as Hunt reports:


The second area of dispute was the question who owned the islands, and over time, the article grew to describe, "in long, excessively detailed sections", on which basis three different governments came to argue that the islands were rightfully theirs.

The third point of contention, Hunt says, has been editorial neutrality, with editors using the supposed nationality of their opposite numbers as a focus for attacks. But in the end, Hunt concludes, the unappealing, time-consuming and emotionally exhausting process delivers a result:


Hunt ends with the suggestion that for this and similar political disputes, Wikipedia forms what he calls a "kinetic diplomatic front":


In brief

Orbit of 274301 Wikipedia
  • Wikipedia in space: As reported by NBC News and others on 5 February 2013, an asteroid has been named after Wikipedia. It's a main belt asteroid with a diameter of about a mile, and its official name is now 274301 Wikipedia.
  • Wikipedia: a list of interesting things near you: On 5 February 2013, Atlantic Cities reported on the GeoData extension announced by Wikimedia software engineer Max Semenik at the end of January, which "will include a centralized, structured catalog of geo-coordinates for articles." The extension "will streamline data storage, enabling programmers to mine and map the data quickly and easily through the API." Mobile phone users will be able to access Wikipedia articles on features and buildings close to their location.
  • Why social movements should ignore social media: The New Republic published a review of Steven Johnson's Future perfect:
 The case for progress in a networked age
 on 6 February 2013, authored by Evgeny Morozov. The review was critical of what it called Johnson's "Internet-centrism"—the belief that the Internet holds a hidden meaning: "decentralization beats centralization, networks are superior to hierarchies, crowds outperform experts. To fully absorb the lessons of the Internet, urge the Internet-centrists, we need to reshape our political and social institutions in its image. ... How can we afford not to reform the world around us when we know that something as unlikely as Wikipedia actually works?" Morozov argued that Johnson's "Internet-centric theory of politics is shallow. Wikipedia, remember, is a site that anyone can edit! As a result, Johnson cannot account for the background power conditions and inequalities that structure the environment into which his bright reform ideas are introduced. Once those background conditions are factored in, it becomes far less obvious that increasing decentralization and participation is always desirable. Even Wikipedia tells us a more complex story about empowerment: yes, anyone can edit it, but not anyone can see their edits preserved for posterity. The latter depends, to a large extent, on the politics and the power struggles inside Wikipedia."
  • Signpost special report is going places: The Atlantic and Tested.com picked up the recent special report in the Signpost on Wikipedia's most viewed articles. The Atlantic published an article titled "If you want your Wikipedia page to get a TON of traffic, die while performing at the Super Bowl half-time show" on 6 February 2013, reproducing the Signpost table of most viewed articles. Tested.com published a longer summary, titled "Wikipedia Signpost report peers into the pop culture trends that drive big traffic". Gizmodo also joined in on 11 February, as did WebProNews.
  • Wikipedia for (Muslim) dummies: On 7 February 2013, The Platform, a youth website that was launched in 2010 by the Muslim Council of Britain's Youth Committee and hosts articles by academics, specialists, journalists and politicians, published an introduction to Wikipedia aimed specifically at Muslims. Author Asad Khan, a PhD student, told his readers: "I'm trying to address Muslims and anyone else who is concerned about the worrying way in which Islam and Muslims are misrepresented in wider society and the rising tide of Islamophobia. As a student of the Islamic tradition, I am often deeply saddened by the extent to which Islam's image has been distorted to the point that many Muslims who have not had the opportunity to learn with our great scholars will themselves harbour misconceptions about the religious tradition." The article linked to various YouTube videos explaining how Wikipedia editing works, including a Wikimedia Foundation video featuring a Muslim editor, and emphasised the importance of neutrality in Wikipedia: "Wikipedia is not about advocacy."
  • A WUSTL undergraduate may have written that Wikipedia article you're reading: The Washington University in St. Louis Newsroom reported on 8 February 2013 on a recent behavioral ecology course at the university, taught by Joan Strassman, PhD, that doubled as an "official Wikipedia course". Students were required to "edit an existing Wikipedia entry and then either add 25 references and 2500 words to a second entry or begin a new one. The goal was to bring at least one article up to what is called Good Article status by the end of the course. ... Because the students were writing for Wikipedia their work was much more closely scrutinized than student work usually is. The students had to defend their work not just to fellow class members—each article was reviewed by two other students in the class—but also to Wikipedia editors." According to Gabriel Hassler, one of the students, "There were people who were a little critical, but mostly people were saying I’m going to fix this and this, but overall you did a great job." As a result of the coursework, improvements were made to the Wikipedia article on peafowl for example, and four students succeeded in taking an article to GA status before the course ended: Gabriel Hassler (chacma baboon), Tony Zhang (scaly-breasted munia), Andrew Katim (vervet monkey) and Kevin Li (worker policing). Li was profiled on the Wikimedia Foundation blog in December.
  • Shahbagh protests in Wikipedia: The Bangladeshi Daily Star noted the Wikipedia article on the 2013 Shahbagh Protest on 9 February 2013: "The nearly 1700-word article with an aerial view of the gathering turns the spotlight on how the protests sparked and its historical background." The protests began after a tribunal sentenced Quader Mollah to life imprisonment. The protest, which is calling for the death sentence, was initiated by bloggers and online activists. The paper quotes the Wikipedia article as saying: "Mollah was found guilty of being behind a series of killings including large-scale massacres in the Mirpur area of Dhaka, which earned him the nickname of 'Mirpurer Koshai' – Butcher of Mirpur."

    Reader comments

2013-02-11

UK chapter governance review marks the end of a controversial year

Wikimedia UK (WMUK), the national non-profit organization devoted to furthering the goals of the Wikimedia movement in the United Kingdom, has published the findings of a governance review conducted by management consultancy Compass Partnership.

This review was partially the result of a conflict-of-interest controversy revolving around Roger Bamkin, whose roles as English Wikipedia editor, trustee of WMUK, creator of QRpedia, and paid consultant for MonmouthpediA and GibraltarpediA received much press coverage, including a Signpost report. Bamkin subsequently resigned from WMUK's Board of Trustees.

WMUK's turbulent year was dotted with other trustee resignations as well. Ashley Van Haeften resigned from the position of chair in August 2012 after his ban from the English Wikipedia. Later that month, Joscelyn Upendran resigned from the board itself, stating that "personal loyalties may be getting in the way of what is really best for the charity and of dealing with any actual or perceived conflict of interest issues" in regard to Bamkin's actions.

Following these events, the chapter and the Wikimedia Foundation (WMF) published a joint statement on September 28, 2012, where they laid out their plan to appoint an independent expert to review and report on the governance practices of WMUK, along with its handling of the controversy. The WMF's head of communications Jay Walsh posted a blog post on February 7, which said in part:


Compass Partnership was appointed to do the review selected through a collaborative dialogue between the WMF and WMUK, and their fee was covered in full by the WMF.

Compass reported that while WMUK had conflict of interest guidelines, and individual trustees had typically stated their conflicts of interest—including Bamkin—the former were "not always implemented to the standard expected by the movement" and the latter could have been made much more transparent (pp. 13–14). In particular, with regard to the Bamkin controversy, the report found no "indication that the Wikimedia UK board formally asked to know the monetary value of any personal contracts to permit an assessment of the material extent of Roger Bamkin's consultancy work" (p. 8). While some individuals interviewed by Compass believed that the foundation would have known of the conflicts of interest through various postings on WMUK's website, Compass found that the declarations were only posted after discussions with the WMF had already begun, and there was no reference to conflicts of interests in WMUK's reports to the foundation.

Compass laid out 50 recommendations that it believes WMUK should implement to better capitalize on previous positive actions and tackle areas identified as needing work (pp. 17–26). Conflicts of interest were principally dealt with in recommendations 26 through 32, where Compass stated that WMUK should observe the "highest standard" in dealing with potential conflicts of interest.

To do this, Compass recommended that if WMUK trustees thought that there could be "any potential for the perception of a conflict of interest", they should contact the chair. Furthermore, when judging this, the board should gather all of the necessary information before coming to a decision, which includes "the size and extent of the personal or financial interest and the identity of relevant business associates." If this is not possible, Compass believes that WMUK should automatically assume that there is a conflict, and possibly request the resignation of the trustee.

Roger Bamkin, when contacted by the Signpost, told us that recommendation 32 may make it difficult to use otherwise perfectly suited candidates in the short term, but as recommended by the review, he believes that the "role of trustees will change and staff members will be available to take on more of the management roles." He also found that recommendation 47 (pp. 25–26), which regards the negotiations required for the use of the Wikimedia trademark and the role of conflict of interest declarations in them, "is a very good idea that will add to the important and essential safeguards of due diligence, the need to make no assumptions about contracts, and to check when the trademark agreement is required."

When asked about recommendation 50, which read in part that "Wikimedia UK should swiftly come to agreement with the owners of QRpedia on the future ownership of this software", Bamkin pointed to a recent agreement with WMUK, which will transfer the domain names and intellectual property of QRpedia to WMUK, while allowing Bamkin and its coder, Terence Eden, moral rights of attribution without financial compensation.

The current chair of WMUK's Board of Trustees, Chris Keating, stated to the Signpost via email:


The governance review, which also gave recommendations on items like the size of WMUK's board, how to run board meetings, and the relationship of WMUK with the Wikimedia movement, is available on Commons. A centralized discussion of it is taking place on meta, and there is a questions and answers page on the WMUK blog.

In brief

  • Picture of the Year: The Wikimedia Commons' Picture of the Year contest has entered round two, where editors with more than 75 edits may vote for one picture. Voting will be open until 14 February.
  • Fundraising: The Wikimedia Foundation (WMF) is planning to start testing new fundraising banners on 5% of anonymous users. No banners will be shown to logged-in users, nor those in previously targeted countries. Last year's fundraiser was conducted in December, but only in the top five English-speaking countries: the United States, United Kingdom, Canada, Australia, and New Zealand. The banners were taken down early after the foundation hit its US$25 million target.
  • Echo: The WMF has published a blog post introducing a new notifications system, called Echo. The Editor Engagement Team hopes that it will answer the question, "How can our users learn about events that affect them, so they can contribute more productively to MediaWiki sites like Wikipedia?"
  • Individual Engagement Grants: Applications for IEGs, the new WMF grant scheme, are due by February 15 and can be reviewed on Meta.
  • Steward election: The annual election of stewards, who have complete access on all WMF wikis to deal with transproject vandalism, among other matters, is open for voting until February 27.
  • English Wikipedia

2013-02-11

WebCite proposal; request for adminship reform

Proposals

WebCite proposal
Link rot is a problem for references on Wikipedia. WebCite is currently used to prevent linkrot by providing archives of the links. However, WebCite will stop accepting links if its fundraising goals aren't met, so concerned editors started a proposal for the Wikimedia Foundation to take over the WebCite service.
Adjusting "Era style" section in MOS:NUM
An adjustment to the dates and numbers section of the manual of style is under discussion. This adjustment would aim to end the editwarring that occurs due to confusing wording.

Requests for comment

Request for Adminship reform
The request for adminship process is currently under discussion. This is intended to find out the problems with the process and figure out solutions.
Article feedback
As the scheduled date for full release of the article feedback approaches, users are being asked for their input on concerns regarding the tool.
Meaning of "ambiguous"
Currently ambiguous is defined as, "when [a single term] refers to more than one topic covered by Wikipedia articles." When it comes to disambiguation pages what does the word ambiguous really mean?
Stephen King's signature
Signatures of living people are sometimes used on articles. A concern was brought up that Stephen King's signature was being used for forgeries. A discussion regarding the use of signatures was opened.
Recurring items in the news
Currently there is a notable Recurring items list for the In the news section. This contains a list of pre-approved notable events that would be included in the news section. However, users believe that the list no longer serves its original purpose.
Shared accounts for use by minors
A discussion regarding the editing of Wikipedia by elementary students under direction of their teacher. What should be done about the policy of not sharing accounts?
Failing Good Articles
A change of wording regarding what makes an article automatically fail the standards of the good article review is under discussion.

Reader comments

2013-02-11

Wikidata client rollout stutters

January engineering report published

In January:
  • 112 unique committers contributed patchsets of code to MediaWiki (no change on December)
  • The total number of unresolved commits stood at 650 (no change).
  • About 45 shell requests were processed (up 6).
  • Wikimedia Labs now hosts 155 projects (up 7) and has 931 registered users (up 84).

—Adapted from Engineering metrics, Wikimedia blog

The WMF's engineering report for January was published this week on the Wikimedia blog and on the MediaWiki wiki ("friendly" summary version), giving an overview of all Foundation-sponsored technical operations in that month (as well as brief coverage of progress on Wikimedia Deutschland's Wikidata project, phase 1 is in the process of going live on the English Wikipedia). Of the five headlines picked out for the report, one (the data centre migration) had already received detailed Signpost coverage. The other four highlight, respectively, updates to the mobile site to allow primitive editing, upload and watchlist functionality; "progress on input methods and our upcoming translation interface"; a restructuring of the way MediaWiki stores co-ordinates; and a testing event to assess how VisualEditor handles non-Latin characters.

In many respects, then, January was a quieter month for Wikimedia Engineering, reflecting in part the uncertainty of the data centre migration (though in the event very little actively broke). Of the Foundation's own core projects (that is to say, excluding the Wikimedia Deutschland-led Wikidata project), only the nascent Echo project showed visible improvement over the month. Flow – the Foundation's latest attempt to fix talk pages, particularly with respect to user-to-user communications – did however enter the design stage, while the Visual Editor project saw another month of refinements and bugfixes. In addition, as previously reported, the Foundation's Editor Engagement Experiments (E3) team launched the Guided Tours extension in January, allowing users to be "walked through" their first edit.

In any case, the report allows for a detailed look at some of the smaller-name projects receiving the Foundation's support. In January; that included work on a tool for Unix/Linux users to allow them to import copies of Wikimedia sites more easily by converting the current XML output to a more data-friendly form. The tool came after WMF developers realised the current process for mirroring a Wikimedia wiki was "painful and cumbersome at best, and unfathomable for the end-user in the worst case". The WMF's involvement with the Outreach Program for Women also began on January 3, with six women new to open-source programming taking on three-month micro-projects; this month, the Foundation also reaffirmed its intention to apply to be part of the Google Summer of Code programme, which targets intermediate level developers of either gender.

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

  • English Wikipedia Wikidata deployment stutters: The fourth deployment of the Wikidata client, this time to the English Wikipedia, has proven its most challenging, with two failed attempts to date. Both deployment slots – one on Monday and one on Tuesday – proved insufficient to deal with the complexities of the largest Wikimedia wiki. Wikimedia Deutschland, leading the effort, is expected to try again later in the week, having fixed some of the fatal errors and other problems that plagued the first two attempts, which were aborted after five and ten minutes respectively.

    Reader comments
If articles have been updated, you may need to refresh the single-page edition.