Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

Correct. This contribution is simply an update for the spiders.txt file that is included in the osC distribution. It adds new spiders and is optimized. No code is changed.

Share this post


Link to post
Share on other sites

hi Steve,

Thanks for the quick response. I just installed your contribution a few minutes ago, by copying the spiders.txt file into my /catalog/includes DIR, and turning "prevent spider sessions" to TRUE. Is there any way that I can test to see that it is working OK? I don't want to "scare" away the spiders!

thanks,

Ray

Share this post


Link to post
Share on other sites

Got Firefox? Install the "User Agent Switcher" extension and set the useragent to "Googlebot". That's how I test it. Be sure you close your browser session and reopen it to clear session cookies. Then try adding something to your cart.

 

Note that this does not prevent spiders from indexing your store. All it does is keep them from obtaining sessions.

Share this post


Link to post
Share on other sites
Got Firefox? Install the "User Agent Switcher" extension and set the useragent to "Googlebot". That's how I test it. Be sure you close your browser session and reopen it to clear session cookies. Then try adding something to your cart.

 

Note that this does not prevent spiders from indexing your store. All it does is keep them from obtaining sessions.

Steve,

I actually use IE on both my PC's. Is there any other way to test?

One thing I just noticed while looking at my "Who's online, is that there are four "Mozilla" bots from the same IP checking out various links. Three of them have a "yes" under the session column and are checking out products, while the fourth has a "no". The one with the no is viewing "/catalog/cookie_usage.php.

Is Who's online an accurate way to guage whether this is working OK or not?

thanks.

Ray

www.specopstactical.com

Share this post


Link to post
Share on other sites

"Mozilla" is not a bot. Actually, if you see Mozilla there, you have no idea what it is, since just about every browser includes "Mozilla" in its UA.

 

I tried your store and the Prevent Spider Sessions is working fine.

Share this post


Link to post
Share on other sites
"Mozilla" is not a bot. Actually, if you see Mozilla there, you have no idea what it is, since just about every browser includes "Mozilla" in its UA.

 

I tried your store and the Prevent Spider Sessions is working fine.

Steve,

Thanks a million! I'm glad to hear that the mod is working fine. Now off to further OSc refinements!

-Ray

Share this post


Link to post
Share on other sites

Hi!

 

I like your list! :)

 

Could you please add the following:

  • findlinks/1.1-a8 (+http://wortschatz.uni-leipzig.de/findlinks/) also known as findlinks/1.1.1-a1 (+http://wortschatz.uni-leipzig.de/findlinks/)
  • ilse

Thanks for the great work!

 

- Jasper

Share this post


Link to post
Share on other sites

Please post the complete user agent string as found in your access log. If "ilse" is the one I'm thinking of, it should already be covered by "crawl". findlinks is already there.

Edited by stevel

Share this post


Link to post
Share on other sites

I've got two stores, both are OSCMAX. Both have an updated SPIDERS.TXT in the /includes folder, both have "prevent spider sessions" set to TRUE, yet one store can't see any bots, and the other does. Anything else I need to change?


John Skurka

Share this post


Link to post
Share on other sites

I don't know "OSCMAX". There isn't anything else in a standard osC store to change. What do you mean by "can't see any bots"?

 

You can add debug code to application_top.php to see if you can find out why one store is misbehaving.

 

Note that, unless you're on a Windows host, the case of the filename is important - it is looking for spiders.txt not SPIDERS.TXT.

Share this post


Link to post
Share on other sites
I don't know "OSCMAX". There isn't anything else in a standard osC store to change. What do you mean by "can't see any bots"?

 

I check via "Who's On-Line" and one store shows bots, along with their names in red. The other store, never shows any bots. Always guests. The stores are identical (well, obviously there's something different). I also have a straight OSC MS 2.2 store, which works fine as well. I checked CHMOD settings, and all are identical as well. Hmmmm.

 

You can add debug code to application_top.php to see if you can find out why one store is misbehaving.

 

Note that, unless you're on a Windows host, the case of the filename is important - it is looking for spiders.txt not SPIDERS.TXT.

 

I understand the caps - it is lower case on the server, just wanted to emphasize it in the post.


John Skurka

Share this post


Link to post
Share on other sites

Maybe there are no bots visiting the other store? If instead what you see is that there are visitors that are clearly bots but that have sessions, you have some further analysis to do to find out why. If you'll give me the URL of the store that is a problem, I can check to see if spiders get sessions.

Share this post


Link to post
Share on other sites
If you'll give me the URL of the store that is a problem, I can check to see if spiders get sessions.

 

The store that works: www.atoolcrib.com

 

The store that doesn't: www.vehitronix.com

 

I know some of the "visitors" to Vehitronix are bots, based on the IP address of the visitor as reported in the Who's Online contrib.


John Skurka

Share this post


Link to post
Share on other sites

I tried your site with my user agent set to "Googlebot" and I did not get a session. So whatever issue you have with the "Who's Online" feature, it isn't related to use of spiders.txt.

Share this post


Link to post
Share on other sites
So whatever issue you have with the "Who's Online" feature, it isn't related to use of spiders.txt.

 

You got it - I overwrote the "WOL" Code with the most current version and everything is working correctly now! Thanks for your help in debugging this.


John Skurka

Share this post


Link to post
Share on other sites

There would certainly be a problem with the 3/31 file but there shouldn't be with the newer ones. Please make sure that your spiders.txt does NOT contain the line:

 

ox/

Share this post


Link to post
Share on other sites

I have updated to the latest spiders.txt but have a spider 64.124.140.15x that is making 4 - 5 connections at a time, 24 hours a day for the last few days and is loading up the cart with each connection from what i can see in my Whos Online. Is there anything i can do about this ?

Share this post


Link to post
Share on other sites

I am hoping this is correct, I have never done anything with my access logs before. I downloaded the access log file and found the correct ips and this is what it says

Is this the info you were asking for ?

 

 

64.124.140.150 - - [17/May/2006:10:39:40 -0500] "GET /product_info.php?products_id=6129 HTTP/1.1" 200 50494 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.151 - - [17/May/2006:03:40:39 -0500] "GET /product_info.php?products_id=3258 HTTP/1.1" 200 45109 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.152 - - [17/May/2006:08:19:10 -0500] "GET /product_info.php?products_id=4023 HTTP/1.1" 200 27728 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.153 - - [17/May/2006:08:10:01 -0500] "GET /product_info.php?products_id=4311 HTTP/1.1" 200 28744 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.154 - - [17/May/2006:08:25:39 -0500] "GET /product_info.php?products_id=4309 HTTP/1.1" 200 28188 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.176 - - [17/May/2006:14:37:50 -0500] "GET /index.php?cPath=63 HTTP/1.1" 200 41878 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.177 - - [17/May/2006:14:43:57 -0500] "GET /privacy.php HTTP/1.1" 200 23373 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.178 - - [17/May/2006:14:53:14 -0500] "GET /index.php?cPath=232 HTTP/1.1" 200 41679 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.180 - - [17/May/2006:14:36:48 -0500] "GET /index.php?cPath=231 HTTP/1.1" 200 43138 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.181 - - [17/May/2006:17:37:35 -0500] "GET /index.php?cPath=222 HTTP/1.1" 200 41866 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

Share this post


Link to post
Share on other sites

when i look up fatlens.com it is a event ticket search site, i dont even know why they are on my site that sells dog breed merchandise....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×