This article about a search engine vulnerability appeared on Slashdot, but it's so poorly written that I had to read it three times to figure out just what they're worried about. Here's my take on what it actually means.
Let's say you have a page that attracts a lot of search engine traffic. Some spammer wants a piece of that action. So they create a page on their own site that they want to substitute for yours. The spammer site detects who is coming in hitting their page, and if the visitor is the Google bot, then their site sends a 302 redirect response pointing at your page. Google reads your page, and adds it to its index, but at the original URL, namely that of the spammer's page. Then people doing Google searches that would otherwise see your page, see the entry in Google's index (which looks like an entry for your page) and click on it and are sent to the spammer's URL. And the redirect only functions for the Google bot, so an ordinary user hitting that URL sees the spammer's page instead of yours.
That's the scam. I'm not sure why it works better than the spammer just making a duplicate copy of your page on their own server and serving that conditionally to the Google bot, but I'm willing to believe that the redirect game does work better for the moment. It can't work very well because the spammer still needs to get the Google bot to hit their page at their URL, and then they have to get their redirect to your page ranked higher than your real directly-served copy of your own page; that seems to depend on your PageRank being somehow underrated. It may be that Google does some honeypot-like things to detect the "Google bot sees a different page from what the rest of the world sees, but an actual page instead of a redirect" type of scam, and punishes that harshly, making this one preferable - even though a number of semi-legitimate Wikipedia "mirror" sites seem to be getting away with the older scam at least some of the time. Anyway, I don't imagine that this latest scam will be hard for Google to detect and punish. I just hope they don't do something that will break the legitimate uses of redirection and conditional page display.
UPDATE: Probably the easiest way for Google to solve the problem would be to treat temporary redirects out of domain as if they were permanent redirects. Then (apparently - this depends on assumptions about how the Google bot works which I don't think Google is willing to confirm) it'll index the page at the target URL ("yours" in the example) instead of the original ("spammer's") URL. I think that could break some systems that might have a genuine temporary redirect to another domain owned by the same people - as might happen in some load-balancing situations - but it seems like it would be an improvement.
The article linked above makes a big deal of the possibility of "innocent" sites also causing you grief, inadvertently; part of the confusion comes from the article describing that as the usual case with the (much more threatening) spamming case explained in terms of its delta from the "innocent" case. The "innocent" case works like this: a Webmaster somewhere else, operating a more or less legitimate site, wants to create a link to you but wants to track how many people click on the link. That's a common thing to do and the way it's usually done is for their link to point at a script on their site, which counts the click and then sends a redirect to your page with either 301 (permanent) or 302 (temporary). The claim is that 301 is the "right" way to do it, but I'm not convinced that there's anything wrong with 302 for this purpose except that it triggers bad behaviour on the part of Google. Anyway, if the other site is popular, and it's using 302, then its "click counter" URL pointing at you might get indexed in preference to your site. Since the click counter redirects visitors to your site anyway, that's no big deal; but if your site and the click counter both get indexed, Google might detect them as duplicates and penalize both. I'm dubious about how big a problem that really is; the "spam" scenario, where there's deliberate malice involved, seems like a much bigger problem to me. Both can be fixed by indexing sites at their final addresses instead of at redirect addresses, at least if those two addresses are on different domains.
This form is for posting public comments to be read by other people who visit this Web site. If you have a software support question, or other material directed to the page author instead of to the general public, please send email instead.
All the data you enter, and your IP address, will be saved and displayed. Don't enter secret information. HTML is not accepted; it will be displayed as plain text. Your comment will only be added if you enter valid data in all required fields; if it isn't, use the back button and try again.
I, and I alone, reserve the right to remove postings for any reason.
No comments yet.