User:Jim Grisham/RFC Drafts/Notability scoring

Source: Wikipedia, the free encyclopedia.

Wikipedia - Article notability

RFC for addressing notability and deletion concerns

URL (initial): https://en.wikipedia.com/wiki/User:Jim_Grisham/RFC_Drafts/Notability_scoring

Other storage locations:

 \.. GitHub: https://github.com/jgrisham/wikipedia-rfc-issues *
         *(proposed; that repository does not yet exist)

Version: 0.1

Created: 2022-07-03 by Jim Grisham Modified:

This proposal implies both technical _and_ policy considerations … I imagine that many proposals for tackling some long-term issues from just one of those dimensions, due to understandable logistical factors (i.e. finding a time when both the editor communities and developers have available resources is probably rare, and hybrid solutions will likely take more time to inact - especially if they are to be integrated into MediaWiki proper -, making them unsuitable for resolving more urgent concerns).

Description

‘Notability score’ for each article

  • e.g. ‘stars’, number (1-3 or 1-5), percentage (0-100%, or just 0-100), etc.
   * … or words: never / maybe / basic / standard / full / absolute
       * Blank = maybe brand-new articles, until a UI button (e.g. “Publish”,  “Go Live”, or “It’s ready”) is pressed by any editor during a page edit.
           * But we already have the ‘Preview’ button!
               * This would allow refining, over multiple edits, before the article is generally available (if for ~no other reason~ than to minimize the effect of browser crashes / tab reloads, or the fear thereof)
                   * (which are notoriously common on mobile devices - yes, I do many of my edits on a iPad or iPhone, and some potential editors may not even have ready access to desktop/laptop computers… at least not during the free time they may be willing to spend on Wikipedia/Wikimedia projects)
                   * This could also aid in loose collaboration between multiple editors who may decide (consciously or not) to create a new article together.
           * This could reduce the complexity (especially cognitively) for newer (or much older) editors of understanding ‘User:’, ‘Sandbox:’, and ‘Draft:’ namespaces and the complexity of moving articles from one namespace to another
           * Compare and contrast ease and breadth of collaboration, for both novice and ‘power’ users, with other systems, e.g.:
               * Analog systems: chalkboards / whiteboards, physical card catalogs, etc.
               * Traditional e-mail (individual and discussion lists) / Usenet / Lotus Notes / Forum software (e.g. PHPbb, traditional pre-WWW BBS systems)
               * GitHub, StackExchange, etc.
               * Social media
               * Office365 / Google Docs / etc.
               * Evernote / OneNote / etc.
               * Slack / MS Teams / etc.
           * Compare and contrast ease of non-collaborate authorship
               * Analog systems (handwritten): notecards, notepads, mindmaps, etc.
               * Analog systems (typed): card catalogs, typewritten documents
               * Word processors, ‘office suites’
               * Movable Type / Wordpress
               * 
       * Never = user page / sandbox / ‘talk’ page
           * Always excluded from Special:Random
       * Maybe = ‘draft’-style articles
           * Might be visible / searchable based on dynamic notability metrics
           * Always excluded from Special:Random
       * Basic = ‘stubs’, auto-patrolled users, articles under a minimum age
           * 
       * Standard = well-established articles with (near?) unanimous consensus of notability 
           * These pages are excluded from many dynamic notability metrics
           * Would be suitable (at least in terms of notability) to be included in a hypothetical CD-rom or downloadable ‘offline’ version of Wikipedia
       * Full = well-established, nearly ‘iron-clad’ notability
           * ‘Override’ - these pages are excluded from ~nearly~ all (or all for now) dynamic notability metrics
           * Would be suitable (at least in terms of notability) to be included in a hypothetical printed version of Wikipedia
           * e.g. United Kingdom, Money
           * includes, for example, all topics that were notable enough to appear in catalogued paper legacy encyclopedias (e.g. List of articles from the 1903 Encyclopedia Brittanica
           * articles from a community-maintained allowlist?
       * Gold = special pages
           * A ‘master override’ - these pages will never be affected by dynamic notability metrics
           * e.g. en:Wikipedia
           *  
  • New articles start at lowest score*
   * Existing NPP process would ‘promote’ a new article to ‘basic’
       * * New articles should probably ~provisionally~ be shown for a period of time to allow for visibility of ‘current-events’ articles, such as an article written 
       * An article written by an editor with the ‘auto-patrol’ permission, or an article where an editor with that permission makes some (e.g. minimum of 1000 characters of non-template visible text 
   * Some sort of (long-term, to minimize gamesmanship) voting process, a-la StackExchange, could also promote or demote articles within the ‘intermediate’ notability levels
   * Notability score could also be weighted by various, long-term, usage analytics; e.g.:
           * number of article views
           * number of unique editors who edit a particular article at least n times on at least m different days / months
           * number of incoming wikilinks
           * presence of active companion articles on other language sites
           * incoming links from a selection of trusted / allowlisted (whitelisted) external sites, such as government domains
           * site search requests over time
               * e.g. if a significant number of searches for a particular term such as “AMC Gremlin” or “Star Wars Episode XV” appear over an extended period (to reduce abuse / bot potential), then that term would be marked for automatic ‘Basic’ notability status if and when it is created (a perfunctory NPP check could still be required, perhaps by a larger group, to guard against arbitrary abuse).
               * This type of anonymous analytics data could also help editors and WikiProjects which topics to prioritize for new article creation.
  • Articles at less than minimum notability level could be excluded from search engine indexing, and would only appear last (or if no other results) on local site search
   * That effectively ‘mutes’ or ‘shadow bans’ low-notability pages
   * Draft namespace articles already do this on redirect pages, if a ‘promising’ tag is present on that redirect
   * Link color for wikilinks targeting the _lowest_ level could be different, too 
       * (e.g. orange or black)
   * Articles with the …?
   * The same sort of search ranking can be applied to article ‘quality ratings’ or other future metrics
  • Per-user settings (with an appropriate default) could hide all articles below a selectable notability level, much like how the ‘age filter’ functions in streaming video platforms
  • Benefits
   * More granularity and transparency, for both editors and readers
   * auto-expiring ‘Draft namespace’ could be retired
   * Existing ‘notability’ / deletion debates may become much less contentious and resource-intense
       * More editor time would then be available for other projects without risking site quality
       * Editor retention and acquisition may increase
       * Debates could be closed faster by requiring a ‘strong’ or ‘nearly unanimous’ level of consensus
           * They could be much longer, too, allowing infrequent editors or even just regular readers to weigh in over months or years
               * most of the current needs for urgency (including editor workload) would likely disappear
                   * this would result in much more opportunities for articles to be improved, or for references to be found, or even for readers to decide to create accounts and become editors
       * Competitive review
           * Social media sites are nearly the most permissive, only restricting or removing obvious bad behavior (copyright violations, gross misinformation, calls to violence, etc.)
           * Some sites, like StackOverflow, do sometimes close questions as off-topic
               * Those closed questions are blocked from further answers (and perhaps demoted in search results?), but any user can see them, and edit/improve them to cause re-opening.
               * All content (and history) remains
           * 
   * Deletion would only be required for ‘speedy delete’ topics, such as
       * obviously fake or joke topic, obviously trivial, obviously abuse; e.g. 
           * List of types of cheese forming the mantle and core of the Moon
           * mass creation of an article for every stoplight in a particular town
   * Articles deemed off-topic, egregiously non-notable, etc. that are not ‘speedily deleted’ could be temporarily locked from new edits (e.g. quality and notability scores cleared; auto-protected for 30 days, but article text (and importantly the full edit history; useful for not just revisions but also for the ‘edit summaries’j) would remain accessible for those who still had a link to the page.
   * An issue with deletion that seems to get less attention is that (usually?) the corresponding ‘Talk’ page, including its edit history and any archive subpages are also deleted.
       * These ‘Talk’ pages could contain useful references or other information, and also may provide more context to why the article was created and deleted than might exist in a publicly viewable ‘AfD’ discussion archive.
  • Drawbacks
   * Requires up-front effort to design policy and technical framework
   * Potentially less ‘manual’ control
  • Why a dynamic system
   * May be seen as less ‘arbitrary’
   * Reduce admin time spent on some bureaucratic matters and policy disputes
   * Limits opportunities for bad-faith admin action - ‘level playing field’
   * Easier to tune than fixed policy,
       * easier to address systemic issues that occur
           * e.g. combatting new methods of policy abuse or circumvention
   * Compare to:
       * Spam Assassin scores and e-mail filters
       * Google Search vs. the original Yahoo Directory
       * A [[Content Management System|CMS]-based web site vs. a Gopher site
       * A Relational database vs. a simple Spreadsheet

Further Details

This kind of notability ‘rating’ would *not* replace article quality metrics.

  • Article quality is much more likely to


  • The technical methods discussed above could control article visibility in a highly parametric way:
   * Taking into account: (ed: illustrate concept with a ‘spider’ graph??)
       * Notability score
       * Quality score
           * If quality score is up-to-date and above a certain level, ‘notability score’ can likely be ignored for those articles (at least until everything else is transitioned and well-proven both technically and policy-wise)
           * Percentage of edits 
       * Completeness / stability score
           * Frequency of edit reversions vs overall edit frequency
           * Talk page (and Talk archive subpage) existence / number of contributors / 
       * Popularity score
           * Already exists in terms of ‘number of recent views’
           * Could be combined with in-bound internal, interwiki, or external links as mentioned above, or could be a separate parameter
               * (i.e. similar to the original Google PageRank system… is that out of patent yet? If not, can the Foundation get a free and perpetual license to use something similar in MediaWiki and any derivative works, perhaps using the established contacts used to set up the 2022-era ‘Wikimedia Enterprise’ program)
       * Category or WikiProject inclusion
       * ‘Locked’ articles
       * User preferences
           * i.e. only show articles with an ‘x’ minimum notability rating and a ‘y’ quality rating, or edits with a minimum age of n, a minimum number of edits of ‘m’, or a minimum number of editors of ‘o’
           * e.g. Jane only wants to see articles that are both High Quality ~and~ High Notability
       * Automatic (and private) determinations using an offshoot of existing ‘CheckUser’ technology
   * Once the framework exists for more granular control of article visibility (crawling, search visibility, and, e.g. wikilink color):
       * additions or changes to the above parameters (and associated ‘policy’-based ‘business rules’) can become trivial from a technical standpoint, and therefore nimble.
           * For example, a contentious policy change can safely be trialed for a short period of time.
           * Both granular, number-line-style (e.g. the proposed notability rating), parameters and boolean parameters (e.g. article locked”, “search-visibility-override”, etc.)
       * For an analogue to such a parametric rating / decision system, think of the operation and logic of e-mail SPAM rating systems (is it SpamAssassin? that does that)
   * There’s likely published ‘Information Science’ research on areas applicable to this proposal and its potential implications and implementation, possibly in other languages (ask other language Wikipedia staff / editors for input & leads on finding non-English-language research )
       * Library Science and CS schools may be willing to provide input and help brainstorm both Info Architecture, policy (including archiving / retention), and technical ideas and model implementations or review our ideas and concerns

Additionally...

It doesn’t all have be done at once!

  • e.g. start with a backend and UI for notability ratings
   * Not actually used, nor any policy changed, during a trial period
  • Enable notability filtering for new articles
   * Roll out in stages to existing articles, before a ‘real-time’ rating system is fully developed & deployed
       * Automatic assignments for the transition:
           * All articles that have already cleared NPP process (and older articles that predate it?) can be auto-assigned a notability rating of ‘Basic’
               * … or perhaps be assigned a ‘virtual’ notability rating such as ‘Basic-legacy’
           * Articles with a top quality level could be assigned a (real or ‘virtual’) notability rating such as ‘Full-legacy’
           * Articles with a minimum number of edits and or editors (and length)
           * …
  • Do what we can, but ~also~ use discussion also as a ‘vision board’ for Wikipedia2025, Wikipedia2040, etc.
  • Reach out creatively to possible stakeholders
   * People who have written past improvement proposals
       * [ed: e.g. the one I saw in late June 2022 about animations and other ‘rich content’]
   * Put article in monthly newsletter (name?)
       * Also in Monthly Admin newsletter Wikipedia:Administrators' newsletter
       * Tech News: Special:MyLanguage/Tech/News/
   * Other communities
       * various StackExchange sites
       * Quora
       * Twitter
       * non-English-language announcements
   * Archivists; e.g. National libraries, Internet Archive, etc.
   * Sitewide banner (in multiple languages)
   * Mass e-mail to inactive, non-banned, editors who 
       * have a non-trivial edit history (> 25 lifetime edits?) 
       * have ~ever~ commented on a ‘Talk’ page (including their own, perhaps with an more than a threshold number of self-edits to it), contributed to a ‘WP:’ namespace article, have other non-article activity, etc.

Process

Keep a positive attitude

  • Improv comedy ‘Yes, and…’ philosophy
   * Even if an particular idea doesn’t appear to be feasible ~now~, from either a policy or technical standpoint
       * Make a note of why that is, but then ask: “But for” that valid concern, how might this idea be of use
       * Doing that with each objection shows respect, builds up a ‘FAQ’-type history for late comers, and perhaps most importantly allows re-visiting the idea far in the future if the conditions predicating the objection no longer exist.
       * Different people’s brains work in all sorts of different ways… sometimes at the end of a ‘ridiculous’ path something of great value, perhaps obvious in retrospect, is found.

Pre-publication Notes

Editor’s notes (remove before publishing final version)

  • How to best present this?
   * Concerns
       * Understandability
           * Executive Summary?
           * Genesis
               * Why did I bother writing this?
                   * What problems do I see that need attention?
               * What disciplines / areas does this affect?
               * Are there any proposed solutions?
               * Am I duplicating anyone else’s past or current work?
           * Complexity of proposal
               * Release in stages?
               * Display in multiple sections or pages?
           * Be mindful of research on attention span
       * Change management for the actual proposal documents
       * 
   * Wikitext
       * Enhanced with collapsable sections
           * Use HTML or MediaWiki templates for this?
       * Copy at my personal site and/or wiki?
   * Threaded discussion forum?
       * a BB
       * something like Phab / GitHub
       * Twitter
   * PDF
       * full proposal or just a summary
   * Outline
       * Interactive?
       * Static?
           * i.e. HTML or PDF
   * Multimedia
       * Static or animated UI mockups / demo
           * (e.g. Sun’s “StarFire”)
       * Charts, graphics, mind-maps, DB schema charts, process flow diagrams, etc.
       * Spreadsheets, tables, or graphs displaying things like cost over time
  • Keep scope of initial proposal tight
   * Some ideas might be better deferred for
       * Future Wiki use
       * Use to enhance existing or create new to non-Wiki computing / knowledge management systems
   * Keep in mind attention span of both individuals ~and~ groups
   * Provide a fully basic, sample proposal, to aid in the explanation
  • Clearly define concerns
   * Axis 1: Separate concerns / solutions 
       * At least in a non-published outline, as a check on logical and rhetorical consistency
   * Axis 2: Separate policy and technology 
  • Estimate costs (both initial and ongoing maintenance)
   * Technical time and money
   * Policy time
   * Implementation / policing
   * Communication
   * Opportunity costs
       * … of inaction
       * … of action
   * Current policies requiring editor or admin effort:
       * Wikipedia:Reviewing pending changes
           * Wikipedia:Requests for permissions/Pending changes reviewer
       * Deletion-related
           * Wikipedia:Requests for undeletion
           * Wikipedia:Deletion review
           * Wikipedia:Proposed deletion
           * Wikipedia:Criteria for speedy deletion
           * Wikipedia:Revision deletion
           * Wikipedia:Deletion process
           * Wikipedia:Guide to deletion
           * Wikipedia:Deletion policy
           * Category:Wikipedia deletion guidelines
       * New-article-related
           * Wikipedia:Articles for creation
               * {{AFC submission}}
               * 
           * Wikipedia:Drafts
           * Help:Userspace draft
           * Wikipedia:Articles for creation
               * Wikipedia:WikiProject Articles for creation
           * Wikipedia:Article Wizard
       * Temporary or special content pages for edits
           * Wikipedia:Workpages
           * Wikipedia:User pages
           * 
               * [[Help:Link#Subpage links]
               * WP:Page name#Subpagename and basepagename
               * Finding subpages
                   * Special:PrefixIndex “All pages with prefix” report
                   * Special:PrefixIndex/fullpagename/ using the search box
                   * Add a ‘Subpages’ link to the user’s ‘Tools’
                   * Active list:
                       * Use the {{list subpages}} template
                       * See also:
                           * {{subpages}}
                           * {{search link}}
           * Special:SpecialPages
           * Special:Search
  • Can I find partners to help with this before publicly announcing the RFC?
   * Share workload
   * Explore different viewpoints - will result in a better product
  • Tools that can help the authoring process
   * Automated / assisted tools and workflows
       * converter / workflow between WikiText markup and:
       * setext / Markdown
       * HTML / XML
       * OneNote
       * OPML
   * Automatically assign formats
       * wikilinks to words/phrases? with matching articles
       * hyperlinks or ref/cite tags for URLs
       * formatting for ordered lists, etc.
   * Other
       * HTTP proxy to allow pasting wikilinks into browser bar (i.e. like how 12ft.io functions)
           * …or a bookmarklet to do so based on the device clipboard or by manually pasting into a dialog box field
   * Change management / version control
       * MediaWiki history
       * Git / GitHub
   * References with no other home:
       * Wikipedia:Shortcut index
   * Random bookmarks of pages in progress:
       * 
       * User talk:AmandaNP#Tech News: 2021-38

Footer

Notes

References

See Also

Scratchpad

(in-article temporary sandbox)

Links to review for inspiration

[1]

  1. ^ "Diversity of perspectives | Birdwatch Guide". twitter.github.io. Retrieved 2022-07-04.