Wikipedia:Link rot/URL change requests: Difference between revisions

Source: Wikipedia, the free encyclopedia.
Content deleted Content added
→‎Twitter: Thanks!
Siska88 (talk | contribs)
Tag: Reverted
Line 16: Line 16:
* https://stats.iihf.com/Hydra/609/
* https://stats.iihf.com/Hydra/609/
* http://webarchive.iihf.com/competition/609/
* http://webarchive.iihf.com/competition/609/
* https://slotmpoo.web.app/
I think webarchive is best first choice since it has a header and looks like the intended page. It has to be verified though as sometimes it works for one both or neither. -- [[User:GreenC|<span style="color: #006A4E;">'''Green'''</span>]][[User talk:GreenC|<span style="color: #093;">'''C'''</span>]] 17:55, 16 July 2022 (UTC)
I think webarchive is best first choice since it has a header and looks like the intended page. It has to be verified though as sometimes it works for one both or neither. -- [[User:GreenC|<span style="color: #006A4E;">'''Green'''</span>]][[User talk:GreenC|<span style="color: #093;">'''C'''</span>]] 17:55, 16 July 2022 (UTC)



Revision as of 04:59, 11 November 2022

This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.

www.iihf.com/competition

The "www.iihf.com/competition" URL is dead. Many of these references can no longer be recovered. However, there are two exceptions:

  • www.iihf.com/competition/385/ (←space or end of URL)
  • www.iihf.com/competition/385/statistics.html

These can be recovered under a new name:

  • stats.iihf.com/Hydra/385/ (without statistics.html)

Put any number from 1 to 999 there. I hope this can be done. Thanks, Maiō T. (talk) 17:15, 18 June 2022 (UTC)[reply]

For URL www.iihf.com/competition/609/ it can convert to either one:

I think webarchive is best first choice since it has a header and looks like the intended page. It has to be verified though as sometimes it works for one both or neither. -- GreenC 17:55, 16 July 2022 (UTC)[reply]


@GreenC: Thank you, I almost forgot about this request. I only remembered it today when I needed to write a new one. Sorry about that. I checked here regularly for the first month to see if you had replied. Were there any problems with it? Maiō T. (talk) 15:46, 11 September 2022 (UTC)[reply]

Sometimes it takes me a while to get to the request, as it takes time to focus on it. No problems that I recall. -- GreenC 16:17, 11 September 2022 (UTC)[reply]

wizards.com/Magic/Magazine

There seem to be around 200 dead links to http://www.wizards.com/Magic/Magazine/Article.aspx?x=... It appears that the relevant page has moved several times, most recently to https://magic.wizards.com/go/magazine/article.aspx?x=<the same thing>, which redirects to a URL in an entirely different format. (Although that link itself sometimes soft 404s)

There are probably a lot of other dead wizards.com links lurking, but this one has an obvious replacement pattern that I found. * Pppery * it has begun... 15:34, 22 August 2022 (UTC)[reply]

Looking at it more closely: The entire http://www.wizards.com/Magic path appears to be dead, as well as http://www.wizards.com/default.asp Sometimes the same or similar pattern replacement will work, other times the content appears to have been entirely removed from the site. * Pppery * it has begun... 15:51, 22 August 2022 (UTC)[reply]

Done For soft-404 links which was pretty complicated. They are all now archives, or marked dead. The site changes so often and has pages with content drift due to the sport-scores of Magic Gathering I think it's best not to mess with moving URLs for now. I can go back and do it on another pass if it's thought urgent. The domain exists in about 1,000 articles. It added 2,340 archive URLs. Updated 1,548 URLs in the IABot database. Adding a 'periodic required' the soft-404 problem will continue each time they change the site. @Pppery: -- GreenC 17:08, 27 August 2022 (UTC)[reply]

Works for me. * Pppery * it has begun... 17:14, 27 August 2022 (UTC)[reply]
Periodic required.

lexico.com, oxforddictionaries.com

Lexico.com, formerly at oxforddictionaries.com (and askoxford.com before that), has been redirected to Dictionary.com, which does not provide the content that was available on Lexico, so citations need to be replaced with archives. Not sure what to do with {{Cite Lexico}}, but I assume subst'ing the transclusions and adding archive links is the way to go. Nardog (talk) 02:16, 26 August 2022 (UTC)[reply]

@Nardog: For the template see edit Special:Diff/1055780831/1106756789 .. it's a hack solution since it doesn't deal with missing archives or ability to control timestamps, but better than nothing. Ideally every instance would be converted to a {{cite dictionary}} a standardized format that regular tools can maintain without custom coding. For the rest, I think all three should be processed as dead. If no archive exists add a {{dead link}}. -- GreenC 05:52, 30 August 2022 (UTC)[reply]
@GreenC: So... can you help? Nardog (talk) 05:45, 1 September 2022 (UTC)[reply]
Special:Diff/1098865079/1107817768 - Not done yet. -- GreenC 06:01, 1 September 2022 (UTC)[reply]
As for {{cite Lexico}}, I think it should be expanded to a CS1 template ({{cite web}} or {{cite dictionary}}) with the URL that would have been used as of the date provided in |access-date= or, if absent, when the template was inserted. Nardog (talk) 05:49, 1 September 2022 (UTC)[reply]

Done

@Nardog: If see anything else let me know. , thanks. -- GreenC 17:30, 2 September 2022 (UTC)[reply]

Thank you!
  • The entry for purple patch was in fact archived at [1], with an underscore instead of the plus sign (which IIRC simply redirected to the canonical URL). I assume most (all?) phrases are affected by this.
  • Were previous URL schemes (https://www.oxforddictionaries.com/definition/english/..., https://www.oxforddictionaries.com/definition/american_english/..., https://en.oxforddictionaries.com/definition/..., https://en.oxforddictionaries.com/definition/us/... ) not used for old transclusions (or transclusions with old access dates)? I believe they should be, at least in |url=, because otherwise a citation would say the source was retrieved before it existed.
  • developer.oxforddictionaries.com and premium.oxforddictionaries.com are still live, so (though these specific subdomains are rarely cited) you might want to exclude them or, perhaps preferably, stop assuming all subdomains are dead.
Nardog (talk) 21:55, 2 September 2022 (UTC)[reply]
  • There are 97 URLs with "+" and marked {{dead link}}. 35 are saved by converting to "_". Example
  • I'm not sure how to address the transclusions.
  • There are 4 URLs that need to be made live. You fixed 3 in Lexico, plus one more in Pendekar.
-- GreenC 02:53, 3 September 2022 (UTC)[reply]
Actually the one in Pendekar is a soft-404. -- GreenC 16:23, 3 September 2022 (UTC)[reply]
Oh, so you simply ignored |access(-)date=? That seems... inadvisable. Nardog (talk) 11:31, 4 September 2022 (UTC)[reply]
No I didn't ignore it. However there is no guarantee the Wayback retrieved one close to it. The problem is I don't now what your talking about "transclusion", I honestly can not follow what your saying above at all. -- GreenC 15:57, 4 September 2022 (UTC)[reply]
Here's an example from Zymogen. It's a two-step process first it converts {{OxfordDictionaries.com|access-date=2016-01-24|zymogen}} to {{Cite dictionary |url=https://web.archive.org/web/20160124000000/http://www.lexico.com/definition/zymogen |title=zymogen |dictionary=[[Lexico|Oxford Dictionaries]] UK English Dictionary |publisher=[[Oxford University Press]]}} .. the snapshot is the access-date with 6 trailing 0's. If you try that URL, the Wayback redirects to https://web.archive.org/web/20200322182724/https://www.lexico.com/definition/zymogen .. that's the final URL. This is typical, most of them ended up at 2020 and 2021. -- GreenC 16:14, 4 September 2022 (UTC)[reply]
An access date and an archive date are two separate things. The former is when the information was verified, the latter is when the archive was made. The resulting {{Cite dictionary}} should retain |access-date=2016-01-24 and say the dictionary is "Oxford Dictionaries", not "Lexico" (which didn't exist in 2016), much like the version before the bot edited it, and |url= should be set to https://www.oxforddictionaries.com/definition/english/.... Nardog (talk) 02:10, 5 September 2022 (UTC)[reply]
Alright, three issues:
  • |access-date=: I didn't transfer because I don't think it's important once the link is dead and archived. Some might disagree but I think it's a tradeoff with clutter and ease of reading comprehension.
  • |url=: The {{OxfordDictionaries.com}} was producing lexico.com and that's what the conversion did 1:1.
  • |dictionary=Lexico. I wrote code to use |dictionary=Oxford Dictionaries if the access-date is older than 2019-06-11 mirroring how the template worked, as you see in the above intermediary step. There was another bit of code, after the archive URL was finally settled on, that looked at the snapshot date of the archive and if it was later than 2019-06-11 the name was changed to Lexico. As it should be since both the url the archive URL is Lexico, not Oxford.
In theory could restore the old template as it previously existed - according to logs there are 432 with a pre-2019-06-11 access-date - then re-run the conversion with the new assumptions programmed in ie. use the correct |access-date=, |url= and |dictionary=. I'll need to understand when to use the two variations of oxforddictionaries.com you listed above (ie. www vs. en) . And which one's to do this for: only those with an old access-date, or anything that uses a template name of "Oxford Dictionaries": {{OxfordDictionaries.com}}, {{Oxford Dictionaries}}, {{Cite Oxford Dictionaries}}, or anything with old access-date and the Oxford name.. there are a number of permutations and assumptions here. -- GreenC 04:19, 5 September 2022 (UTC)[reply]

www.co.summit.oh.us

http://www.co.summit.oh.us/ now goes to https://www.eyemg.com/, the website of some web design firm. The correct address appears to be https://co.summitoh.net/. The link is used on Summit County, Ohio as well as a number of related articles. RTao (talk) 04:56, 30 August 2022 (UTC)[reply]

@RTao: this is used on 16 pages. Please do it manually, not a bot job. Due to the work required. -- GreenC 06:00, 30 August 2022 (UTC)[reply]
@GreenC: Ah, I wasn't aware. I'll do that. Thanks for letting me know. RTao (talk) 06:10, 30 August 2022 (UTC)[reply]
Thanks. Plus you'll be able to do a better job with manual review. -- GreenC 14:58, 30 August 2022 (UTC)[reply]

FABLE

Not a direct request, but thought people here would be interested in User:FABLEBot/New URLs for permanently dead external links * Pppery * it has begun... 20:53, 31 August 2022 (UTC)[reply]

asp → aspx

I would need to fix URLs with .asp extension. References in several articles no longer display these pages. For example,
https://www.eurobasket.com/United-Kingdom/basketball-National-Team.asp?Age=16 is wrong, and
https://www.eurobasket.com/United-Kingdom/basketball-National-Team.aspx?Age=16 is correct. Interestingly, both versions (asp & aspx) work on non-European sites. Thank you for your efforts. Maiō T. (talk) 15:45, 11 September 2022 (UTC)[reply]

Hi Maiō T. Can you confirm this is just for, but all of, eurobasket.com that contain a .asp ? Looks like around 6,700 pages. -- GreenC 16:27, 11 September 2022 (UTC)[reply]
@GreenC: No. I meant only those articles that contain the chain "basketball-National-Team.asp" Maiō T. (talk) 16:38, 11 September 2022 (UTC)[reply]
Ah glad I asked. Narrows down to 88 pages. GreenC 17:52, 11 September 2022 (UTC
Maiō T. , something has changed because both asp and aspx return content now. The content is different and I don't know which is preferred. -- GreenC 20:11, 14 September 2022 (UTC)[reply]
@GreenC: The "...National-Team.asp?Age=16" pages don't exist so the program redirects them to the main page (with adult men's national team). The "...National-Team.aspx?Age=16" pages are correct; they deal with the under-16 national team. So the "aspx" version is preferred. Maiō T. (talk) 12:29, 16 September 2022 (UTC)[reply]
OK it is done. -- GreenC 01:17, 17 September 2022 (UTC)[reply]
@GreenC: Thank you, good job! Maiō T. (talk) 12:32, 17 September 2022 (UTC)[reply]

Hello

I noticed that this link to 2012 Guyana census downloads a broken archive:

It should be repalced by this one:

(notice the capital B in "Population_By_Village")

I think it is present on most Guyana settlements pages (and maybe some other articles about Guyana), but I wasn't able to make a list.

Regards,  Şÿℵדαχ₮ɘɼɾ๏ʁ 16:30, 14 September 2022 (UTC)[reply]

User:SyntaxTerror, this is done it edited 137 pages. There was one page Guyana that had an archive URL that was deleted. -- GreenC 20:13, 14 September 2022 (UTC)[reply]
Thanks GreenC.  Şÿℵדαχ₮ɘɼɾ๏ʁ 20:16, 14 September 2022 (UTC)[reply]

As of last week, it seems that Emporis has been shut down, and all of its links have gone dead. Every single link to the website now leads to an error page that looks like this. I'm not sure how many articles are affected by Emporis's shutdown, but I believe this issue affects thousands of links. – Epicgenius (talk) 13:57, 20 September 2022 (UTC)[reply]

@Epicgenius, Jklamo, and BD2412: I finished most of them last night per initial request at User_talk:BrownHairedGirl#Emporis.com_has_gone,_but_is_preserved. This will be the new "official thread". There's another 20% I need to custom program for, the Wayback Machine has them but they are kind of hidden from API view. In addition there is a request at Wikipedia_talk:WikiProject_Skyscrapers#Emporis_end to convert the {{Emporis}} template which I'll be working on. -- GreenC 15:12, 20 September 2022 (UTC)[reply]
Great work - barnstars and medals all around! BD2412 T 16:48, 20 September 2022 (UTC)[reply]
Great work, @GreenC. And there's that old 80:20 rule poking its annoying head in again to make more work for you. BrownHairedGirl (talk) • (contribs) 17:50, 20 September 2022 (UTC)[reply]
Right! hah whoever invented that rule much prefer 95/5 with the 5 being so hard you can safely skip it. -- GreenC 20:57, 20 September 2022 (UTC)[reply]

Report

  • Converted 1,204 {{Emporis}} to {{Cite web}}: Example
  • Added archive-url to 7,097 citations: Example
  • Wayback CDX trawling that saved 251 citations: Example

Anything else, let me know. -- GreenC 19:48, 22 September 2022 (UTC)[reply]

The domain is kalkionline.com. All links are dead, so anything tagged as url-status=live may be changed to dead. Kailash29792 (talk) 04:59, 24 September 2022 (UTC)[reply]

User:Kailash29792, thanks for the reminder! I just ran the first 20 articles. In Special:Contributions/GreenC_bot from Thudikkum Karangal to Magalir Mattum (1994 film). Can you take a look and provide any feedback before proceeding further? I see a bunch don't have archive.today available. I could add the {{dead link}} now, and go back later to add the archive if or when it becomes available. -- GreenC 03:35, 25 September 2022 (UTC)[reply]
Yeah, and I regret not having archived them before. Thankfully, the Internet Archive has a bunch of them. You may continue tagging the dead links while I manually replace the links. Kailash29792 (talk) 04:38, 25 September 2022 (UTC)[reply]
I can provide a list of the dead links, if that helps. -- GreenC 14:42, 25 September 2022 (UTC)[reply]
Sure. Kailash29792 (talk) 15:12, 25 September 2022 (UTC)[reply]

It seems the site is dead with covering the entertainment side; none of the links covering films work. I therefore request that anything tagged as url-status=live be changed to dead. Kailash29792 (talk) 05:19, 29 September 2022 (UTC)[reply]

I processed sify.com in April 2021 Wikipedia:Link_rot/URL_change_requests/Archives/2021/April#sify.com and got some things done. If I recall, the site has soft-404s and probably requires periodic reprocessing to find new soft-404s. Will run again. -- GreenC 01:42, 30 September 2022 (UTC)[reply]

Results

  • Articles edited: 2726
  • Added new archive URL: 3059
  • Switched |url-status=live to dead: 971
Periodic required.

Gutenberg.net -> .org

Site changed to .org and old .net links no longer work. 73 -- GreenC 13:15, 30 September 2022 (UTC)[reply]

Done. -- GreenC 04:24, 4 October 2022 (UTC)[reply]

Baseballlibrary.com -> Baseballbiography.com

Baseballlibrary.com is dead, but Baseballbiography.com is the replacement. Exact same information available. Tons of links need to be updated to the new site.

I noticed that this link to baseball library is dead: http://www.baseballlibrary.com/ballplayers/player.php?name=Dean_Chance_1941 It should be replaced by this one: https://baseballbiography.com/dean-chance-1941/ Thebaseball10 (talk) 20:14, 13 October 2022 (UTC)[reply]

User:Thebaseball10: I actually went ahead and programmed the whole thing in which took the better part of an evening. Then I noticed the new site has a lot less material than the old. Compare old with new. I don't think it's gaining anything by switching to the new site. Am I wrong? Wat do you think? -- GreenC 03:08, 14 October 2022 (UTC)[reply]
User talk:GreenC: I noticed the same, but might be worth it to have a live website, though. New site has a lot of the same info, missing some things, but at least its live and searchable. Just my 2cents. For me, I loved having a live website, and not having to wait ages to look through wayback to find info on other players, etc. -- User:Thebaseball10
User:Thebaseball10: OK did what I could get. Roughly it converted about 1,000 links and added archives to about 100. -- GreenC 16:14, 16 October 2022 (UTC)[reply]

ndb.nal.usda.gov

http://ndb.nal.usda.gov is dead. Special:LinkSearch/http://ndb.nal.usda.gov currently shows 352 links in all namespaces. Some pages are now at https://fdc.nal.usda.gov but not the same path. Many of our broken links contain a socalled NDB number which can be searched with some work. For example, Cauliflower currently has http://ndb.nal.usda.gov/ndb/search/list?qlookup=11135&format=Full. The search https://fdc.nal.usda.gov/fdc-app.html#/?query=11135 says "0 results" but there is a tab saying "SR Legacy Foods (1)" where "(1)" apparently indicates a search result. The tab has a link to the wanted page on "Cauliflower, raw": https://fdc.nal.usda.gov/fdc-app.html#/food-details/169986/nutrients. I only investigated this briefly and there may be better ways to find replacement links. PrimeHunter (talk) 14:02, 18 October 2022 (UTC)[reply]

User:PrimeHunter: I'm having trouble web scraping the search result URL because it uses JavaScript. I tried a headless browser, but the data in the "SR Legacy Foods" tab doesn't come through. So I found a convenient Datasets page, the file "SR Legacy - April 2019 (CSV – 6.1MB)" has a table mapping of the old ("11135") to new ("169986") so should be possible. I'll work on it. -- GreenC 17:03, 19 October 2022 (UTC)[reply]
Thanks. Good find. 156 of the 352 links start with http://ndb.nal.usda.gov/ndb/search/list?qlookup=nnnnn. There are also many like http://ndb.nal.usda.gov/ndb/foods/show/1903 in Amaranth grain. The citation title is "Cereals, whole wheat hot natural cereal, cooked with water, without salt". A search on that at https://fdc.nal.usda.gov/fdc-app.html#/ finds https://fdc.nal.usda.gov/fdc-app.html#/food-details/171668/nutrients with NDB number 8145 but no mention of 1903. I don't know how to handle such cases with a script. PrimeHunter (talk) 17:34, 19 October 2022 (UTC)[reply]
Some of the numbers don't map, the files are incomplete. -- GreenC 20:00, 19 October 2022 (UTC)[reply]

Results

PrimeHunter: If you see anything missed or that might be done yet by bot let me know! -- GreenC 20:53, 19 October 2022 (UTC)[reply]

Dinamalar Nellai

While the main site still works, the same cannot be said for its subsidiary which now seems to house a different website. All dinamalarnellai refs tagged as live should be changed to dead. Kailash29792 (talk) 06:28, 22 October 2022 (UTC)[reply]

This is one of the WP:JUDI sites. Needs to be usurped. Working on it now plus 26 other judi-usurped domains. -- GreenC 15:34, 22 October 2022 (UTC)[reply]
Done. -- GreenC 23:35, 22 October 2022 (UTC)[reply]

Warshipsww2.eu

This domain has been usurped or rendered unfit now giving out German gambling spam. It's not related to WP:JUDI

Lyndaship (talk) 11:45, 29 October 2022 (UTC)[reply]

Added to JUDI, it's easier to process as a batch, the process is functionally the same. Also added the title string "Roulette Blog" to check for. Thanks! -- GreenC 14:41, 29 October 2022 (UTC)[reply]

Unearthed Arcana article series on dnd.wizards.com

Looks like Wizards of the Coast pulled all Unearthed Arcana articles (https://dnd.wizards.com/articles/unearthed-arcana) articles from before 2020 (ex: Waterborne Adventures (2015), Psionics and the Mystic – Take Two (2016), Dragonmarks (2018), etc). I started to manually update this at List of Dungeons & Dragons rulebooks#Unearthed Arcana (since it seems all of them have been archived) but it's a lot of articles and I'm not sure what other Wikipedia articles use the UA articles as sources. Thanks! Sariel Xilo (talk) 20:29, 1 November 2022 (UTC)[reply]

How would you know if the article is pre-2020 vs post-2019? I guess pre-2020 if it redirects to https://dnd.wizards.com/news/archive?category=unearthed-arcana -- GreenC 02:24, 3 November 2022 (UTC)[reply]

It's only in 9 articles:

Good to know it's not as widespread as I feared (List of Dungeons & Dragons rulebooks#Unearthed Arcana has ~85 UA articles listed)! What I didn't realize is that Wizards didn't do redirects for all the post-2019 articles (ex: the Jan 2020 article original link goes to the UA archive redirect instead of to https://dnd.wizards.com/unearthed-arcana/subclasses-part-1). So I think everything with https://dnd.wizards.com/articles/unearthed-arcana is dead. Sariel Xilo (talk) 02:58, 3 November 2022 (UTC)[reply]
Alright it's done. -- GreenC 03:28, 3 November 2022 (UTC)[reply]

Twitter

Can/should archive links be added to sources from Twitter which don't already have them via a bot? Media coverage on the company is showing it is increasingly unstable and I'm a bit concerned about the potential future link rot. Thanks! Sariel Xilo (talk) 02:25, 11 November 2022 (UTC)[reply]

That would be a really big project I don't want to get into unless there is evidence of a big link rot problem. Proactively, Internet Archive should already be archiving Tweets into the Wayback Machine, they'll be available if/when the links dies. -- GreenC 02:59, 11 November 2022 (UTC)[reply]
Sounds good! Sariel Xilo (talk) 03:06, 11 November 2022 (UTC)[reply]