User:ProteinBoxBot/Specs

Source: Wikipedia, the free encyclopedia.

ProteinBoxBot specs

  • Content has already been assembled as part of a non-WP project. Data will be provided to ProteinBoxBot as an XML or CSV file. Images will be provided in a zip file or local directory.
  • For each mammalian gene with significant available annotation, a new gene page will be created that corresponds to the HUGO-approved symbol.
    • If a page with that name already exists:
      • If page contains a Protein infobox or GNF_Protein_box, then changes will overwrite previous infobox but leave surrounding content intact
      • If not, the gene will be flagged for manual review. Log entry and proceed to next gene
    • Image (when available from RSCB according to public domain use) will be uploaded.
    • A protein infobox will be created and populated with relevant data. (Manually-created example: ITK (gene)
    • A redirect will be created from the full gene name. (For example: IL2-inducible T-cell kinase)
      • If a page with the full gene name already exists, gene will be flagged for manual review
    • Free-text summary will be included from NCBI page, add wikilinks if appropriate.
    • Create references section based on gene2pubmed and/or generifs


  • In trial phase, only 10 gene pages will be created. If necessary to better define how much information is necessary for a useful stub, a secondary trial period for ~100 genes will be proposed.
  • Bot will check User_talk:ProteinBoxBot and stop with any new messages.
  • Bot will cap edits at 10 per minute.
  • New protein infoboxes will contain notice that changes can/will be overwritten on further bot updates
  • If bot encounters agreed flag (e.g., "<!-- NO_BOT_EDITS -->") then entry will be logged and skipped.
  • Bot will maintain log of all edits and edit times.
  • Add all modified pages to ProteinBoxBot's watchlist to track further page edits.