Wikipedia:Articles for deletion/List of human protein-coding genes 1

Source: Wikipedia, the free encyclopedia.
The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.

The result was keep. Sandstein 21:49, 17 January 2020 (UTC)[reply]

List of human protein-coding genes 1

List of human protein-coding genes 1 (edit | talk | history | protect | delete | links | watch | logs | views) – (View log · Stats)
List of human protein-coding genes 2 (edit | talk | history | protect | delete | links | watch | logs | views)
List of human protein-coding genes 3 (edit | talk | history | protect | delete | links | watch | logs | views)
List of human protein-coding genes 4 (edit | talk | history | protect | delete | links | watch | logs | views)

Following discussion at Wikipedia:Bots/Requests for approval/Seppi333Bot, it appears that these huge data tables (#3 has 289,117 bytes of markup, for example) serve no useful purpose for our readers, and possibly fail WP:INDISCRIMINATE. They could, perhaps, be moved to user space if the sole user who maintains and claims to use them wishes to keep them, though the data would be better transferred to Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:31, 2 January 2020 (UTC)[reply]

Was wondering when someone might try to do this.
These pages are a complete list of all known protein-coding genes in the human exome. This webpage provides a straightforward explanation of what an exome is.
As for whether or not the topic of this list is notable per the WP:GNG: this is a PubMed search. FWIW, whole exome sequencing, which is used to analyze an individual's exome, is a highly utilized technology in the biotech industry [1][2].
Now, as for the list entries compliance with WP:CSC bullet 1, there are entire fields of study centered around human genes and proteins; even the less well understood genes/proteins in that list can be expected to have articles in the future based upon that alone, but for the time being, notability for every single entry in the list is easily determinable by clicking the links to the corresponding entries in the gene database and protein database listed alongside each gene in the list; they cite relevant literature since those organizations provide official names for validated human proteins and protein-coding genes.
This list can't be indiscriminate (by definition of that word) given that the list is complete; the selection wasn't random or haphazard, it was systematic.
In any event, keep per above. Seppi333 (Insert ) 13:06, 2 January 2020 (UTC)"[reply]
  • Keep but rename. I was wondering what a "genes 1" was then I realised the list was split apparently for size reasons, and 1 means "page 1" of the list. Perhaps the convention of doing these alphabetically would be best e.g. List of human protein-coding genes A-E or by chromosome List of human protein-coding genes on chromosome 1 would be preferable to arbitrary numbers.----Pontificalibus 13:39, 2 January 2020 (UTC)[reply]
    Switching it to a letter-based name actually seems like a good idea. Alternatively, I suppose that could also be indicated in the navbox for the list pages; either way, that sounds like a useful improvement for list navigation. Seppi333 (Insert ) 13:52, 2 January 2020 (UTC)[reply]
  • leaning delete for two reasons. First, frankly this looks like a dump of the HGNC database, which I have to doubt is in the public domain; beyond the copyright issue, I have to question the merits of parallel maintenance of what I presume to be a dynamically expanding list. Second, I don't see the utility as it stands of a list that really doesn't do more than duplicate the corresponding category, except as a housekeeping tool to track uncreated articles. Mangoe (talk) 14:40, 2 January 2020 (UTC)[reply]
    • Three points:
      1. Data release policy
        No restrictions are imposed on access to, or use of, the data provided by the HGNC, which are provided to enhance knowledge and encourage progress in the scientific community. The HGNC provide these data in good faith, but make no warranty, express or implied, nor assume any legal liability or responsibility for any purpose for which they are used.
        Guidelines on use of data in publications (copyright and licensing)
        It is a condition of our funding from NIH and the Wellcome Trust that the nomenclature and information we provide is freely available to all. Anyone may use the HGNC data, but we request that they reference the "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" and the website where possible.
        per https://www.genenames.org/about/. It'd be rather absurd for the organization that assigns the official name and symbol to all human genes to claim their nomenclature as their intellectual property; it wouldn't even be feasible because no researchers would submit their research and request a gene symbol for the purpose of having an HGNC-copyrighted symbol assigned.
      2. There is no category for human protein-coding genes. Even if there were, if list-category overlap were a valid WP:DEL-REASON for lists, then thousands of list articles would be subject to deletion given how common it is for a category and list article on the same topic to exist.
      3. The scope of the largest gene category on WP - Category:Genes - encompasses all genes in all organisms (hence the Category:Viral genes and Category:Prokaryote genes subcategories), but the sum of all pages in that category and all of its subcategories is still >3000 less than the number of bluelinks in these lists (~11500) and they'll gain another ~2000 bluelinks in the near future to boot; edit: the largest category is Category:Human genes, which is comparable in size; besides protein-coding genes, it includes pseudogenes, non-coding RNAs, multi-protein complexes, and phenotypes. The nonexistent human protein-coding gene category wouldn't serve as a viable alternative to these lists, because article navigation isn't the point of list articles in general or this one in particular. Seppi333 (Insert ) 17:42, 2 January 2020 (UTC)[reply]
  • leaning keep This is something of a special case. I don't think we have a precedent for "tiny set of pages consisting of a massive data mirror, with their own attendant bot to groom them every few days". But taken on their own merit, I'd say these pages are a useful addition to WP. Having a sortable list of coding human genes where each entry (potentially) has a link to its own WP article: that is something the original database cannot provide. This link function on its own makes them worth having. Automatic bot updates also make a qualitative difference to the manually updated, selective lists at Chromosome 1 etc. The duplication aspect to these does rub me the wrong way, but I'd rather not have such housekeeping concerns prevent the addition of good encyclopedic material. --Elmidae (talk · contribs) 04:25, 4 January 2020 (UTC)[reply]
Note: This discussion has been included in the list of Medicine-related deletion discussions. Robert McClenon (talk) 18:31, 7 January 2020 (UTC)[reply]
Note: This discussion has been included in the list of Lists-related deletion discussions. Robert McClenon (talk) 18:31, 7 January 2020 (UTC)[reply]
  • comment I can see the argument for retaining the tables, but the division into four arbitrary chunks is a still a problem. Really, the best organization is not to break it up at all, because the sort-by-header feature is otherwise broken. Mangoe (talk) 13:24, 8 January 2020 (UTC)[reply]
Combining everything into one article would probably yield something too large for comfort. These are ~280k each at the moment; combining them would more than double the currently largest article size on the project (see Special:LongPages) and come with a range of potential problems - mostly, you are screwed if you are on dial-up... --Elmidae (talk · contribs) 14:00, 8 January 2020 (UTC)[reply]
Relisted to generate a more thorough discussion and clearer consensus.
Please add new comments below this notice. Thanks, Sandstein 12:21, 10 January 2020 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.