Wikipedia talk:Requests for comment/NOINDEX

Page contents not supported in other languages.
Source: Wikipedia, the free encyclopedia.

Delay when noindex is removed?

How quickly would the major search engines (especially Google) typically take to discover that noindex has been removed from a page where the search engine has previously registered the noindex tag? It would be unfortunate if they wait a long time to revisit a noindexed page to test its current status. Could we do something to tell search engines to quickly visit and index a page after noindex is removed? PrimeHunter (talk) 14:06, 20 March 2012 (UTC)[reply]

It's pretty much instant. Google, has a special arrangement; they use the RecentChanges feed to identify when something has altered and needs an updated cache. Okeyes (WMF) (talk) 14:19, 20 March 2012 (UTC)[reply]
OK. I didn't know whether they also did it for noindexed pages. PrimeHunter (talk) 14:30, 20 March 2012 (UTC)[reply]
It's worth making sure that the RC feed will pass noindex information along to Google (and that they'll know what to do with it), but this sounds like it should work. Shimgray | talk | 20:28, 20 March 2012 (UTC)[reply]

Increase the new pages patrol backlog

Have the backlog have the option to go back to "30 days" instead of just 15 days so that pages that have slipped through the new pages patrol can be reviewed and marked as patrolled instead of never being found afterwards. Regards, Whenaxis (contribs) 21:11, 21 March 2012 (UTC)[reply]

I'm not sure I quite understand; the backlog is 30 days long by default. Do you mean the links on the top-right of the interface? Okeyes (WMF) (talk) 09:59, 22 March 2012 (UTC)[reply]
Yeah. And if the backlog maximum is already 30 days, is it possible to extend it so we can get any leftover pages that have seeped through the new pages patrol. Whenaxis (contribs) 20:36, 22 March 2012 (UTC)[reply]
That's the plan! It will go up to 60 days for patrolled articles, and unlimited for unpatrolled as part of WP:NPT :). Okeyes (WMF) (talk) 21:06, 22 March 2012 (UTC)[reply]
Sounds good! The interface will be updated so it says, "Yellow highlights indicate pages that have not yet been patrolled. Please consider patrolling pages from the back of the unpatrolled backlog. Other options: 1 hour • 1 day • 5 days • 10 days • 15 days • 30 days • 60 days", right? Thank you, Whenaxis (contribs) 23:31, 22 March 2012 (UTC)[reply]
No, it's tied into WP:NPT, which reformats Special:NewPages. Do please give it a read and tell me on the talkpage what you think :). Okeyes (WMF) (talk) 00:02, 23 March 2012 (UTC)[reply]
Ah. I read it.. makes sense now. No further questions :) Thanks, Whenaxis (contribs) 01:02, 23 March 2012 (UTC)[reply]
Neat! And if you have any questions about NPT or want to comment, do please leave any ideas you have there :). Okeyes (WMF) (talk) 16:57, 24 March 2012 (UTC)[reply]

Is this even possible?

NOINDEX magic word doesn't work in article space. Has this been changed? Or is there an implication that this will be changed based on this RfC? Gigs (talk) 19:34, 11 April 2012 (UTC)[reply]

I have confirmed that NOINDEX has no effect on articles. Some of the CSD templates are already using it, like blatant advertising and copyright infringement. And it's not doing anything, those articles are still in Google. Gigs (talk) 19:39, 11 April 2012 (UTC)[reply]
NOINDEX is currently disabled in mainspace; there is a new method which will allow it to be selectively used (via certain templates only), which will be enabled if this proposal succeeds. Shimgray | talk | 20:37, 11 April 2012 (UTC)[reply]
What he said :) Okeyes (WMF) (talk) 21:02, 11 April 2012 (UTC)[reply]
Can you elaborate on this "new method"? It sounds like vaporware. Do you have a link to relevant documentation or discussion of it? --MZMcBride (talk) 17:52, 12 April 2012 (UTC)[reply]
More of this technical detail needs to be up near the top of the RfC description, as well. This is a technical RfC without any technical detail given in its introduction. The policy question of "do we want spam and slander in google" is a non-question compared to the technical implementation details, where the devils might be hidden. Gigs (talk) 18:33, 12 April 2012 (UTC)[reply]
I'll poke Ian, the guy who worked out how to do it. Okeyes (WMF) (talk) 18:35, 12 April 2012 (UTC)[reply]
I talked with User:Catrope about this, and we arrived at a solution wherein we create a protected wikipage in the Mediawiki namespace that lists templates that should trigger noindex. Then, we use a parser hook to check the included template list against that list. If, when parsing a page, one of the noindex'd templates shows up in the list, we flip the noindex header on. It's pretty straightforward. The tricky detail here is that when the list is edited, we'll have to make sure to flush any pages that include templates that were removed or added. I don't expect this to be a serious problem since the list will probably change infrequently, and I suspect there will be an amount of process around deciding which templates to include. It can probably be done with a bot, which could also be set up to poke search engines to reindex the article sooner.
This comes from the assumption (correct me if I'm wrong) that the NOINDEX magic word is disabled in article space because it's so easy to use maliciously. Restricting the list of templates that can trigger the noindex header in article space to only those that already have a bunch of built-in oversight (CSD, AfD, etc.) should solve that problem. raindrift (talk) 19:31, 12 April 2012 (UTC)[reply]
Ah, thanks. So a system similar to MediaWiki:Disambiguationspage, basically. Okay, that makes more sense.
Yes, the article namespace restriction is to prevent pages such as Barack Obama or Abortion from suddenly dropping out of search engine indices due to vandalism or carelessness or what-have-you with the __NOINDEX__ magic word. --MZMcBride (talk) 19:48, 12 April 2012 (UTC)[reply]


Next Generation Search

I'm working on specifing the next generation search. One of the goals of the next generation search engine is to minimize edit to search time so that the index goes live faster.