stevel Posted May 3, 2008 Author Share Posted May 3, 2008 Your English is fine. Ask your host where your web access logs are stored. This varies from host to host. You will want to find the log from the day of the access. It is a series of lines, one per access. Search it for the IP address and you should find one or more lines showing a GET access from that IP. When you find one, post a sample line here. Here's one from one of my sites, as an example: 66.249.70.76 - - [12/Apr/2008:00:05:53 -0400] "GET /pg-070708.php HTTP/1.1" 200 22812 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Yours may look somewhat different, but it should be recognizable. The various stats and "who's online" additions don't give you this information. Also, I have found that the "who's online enhancements", etc., are often mistaken about bots having sessions. The line from the access log should have a clue about this too. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
papillon Posted May 3, 2008 Share Posted May 3, 2008 ok, will ask my host and reply here with the results as soon as posible, thanks! Quote Link to comment Share on other sites More sharing options...
papillon Posted May 6, 2008 Share Posted May 6, 2008 ok, heres the line 64.124.148.65 - - [04/May/2008:00:16:31 +0200] "GET /tower-crane-p-58.html HTTP/1.1" 200 27735 "-" "Mozilla/5.0 (compatible; FatBot 2.0; http://www.thefind.com/crawler)" so youre right, its fatbot and it should be ok with "tbot", but i dont know why its not working... also these last 2 days, yahoo bot have been crawling my site and it seems is also creating sessions (or at least thats what Who's Online (and also Visitor Web Stats) says... googlebot, on the other hand, is not creating sessions... any ideas? by the way, just to be sure... spiders.txt has to be located in /includes, right? Quote Link to comment Share on other sites More sharing options...
stevel Posted May 6, 2008 Author Share Posted May 6, 2008 It is not getting a session. If it were, you'd see the session ID in the URL. (VERY few if any crawlers accept cookies.) Not only would "tbot" catch this but so would the string "crawl". Yes, spiders.txt goes in /includes and you must enable "Prevent Spider Sessions" in admin. If you'll post or PM me the URL of your store, I'll test it to make sure it's working. But otherwise, I'd say that everything is working the way it should. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
mr_absinthe Posted June 2, 2008 Share Posted June 2, 2008 ScanAlert will already be detected as its UA includes the string "/bot". Any idea why ScanAlert is still regurarly able to start session? se2-scan02.scanalert.comName: Guest ID: 0 IP Address: 209.67.114.33 User Agent: Mozilla/5.0 (compatible; MSIE 7.0; MSIE 6.0; ScanAlert; +http://www.scanalert.com/bot.jsp) Firefox/2.0.0.3 osCsid: 3ee6fd417750254a5b5782dd968dfff1 Quote Absinthe Original Liquor Store Link to comment Share on other sites More sharing options...
stevel Posted June 2, 2008 Author Share Posted June 2, 2008 If ScanAlert accepts cookies (very rare for a spider) and came in with a session ID in the URL, then spiders.txt would be skipped. Another possibility is that the display you're looking at is mistaken. A third is that your store is not properly using spiders.txt. If you'll give me the URL of your store I can test it. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
mr_absinthe Posted June 3, 2008 Share Posted June 3, 2008 Very kind of you, my store is Absinthe Original. Quote Absinthe Original Liquor Store Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2008 Author Share Posted June 3, 2008 From what I can tell, spiders.txt is not being used on your store. I switched the user agent to Googlebot and it still got a session. I did notice with your store something I have seen with others in that even on the first page, the links don't have a session ID in the URL, indicating that a cookie was set initially. I'd be curious to know how that was done. But in any event, the "prevent spider sessions" code is not running. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
mr_absinthe Posted June 3, 2008 Share Posted June 3, 2008 (edited) Hmm, that's strange, I've never seen Googlebot, Yahoo!, msnbot, Jakarta or you name it starting a session... but maybe I made a mistake somewhere in settings or in a contribution usage. This is in my Sessions: Allow Auto Login - true Session Directory - /tmp Force Cookie Use - False Check SSL Session ID - False Check User Agent - False Check IP Address - False Prevent Spider Sessions - True Recreate Session - False It's been a while but I believe that the missing session ID has been achieved by contribution called BR&R. I didn't really like the session ID in the URL and switching Force Cookie Use to "True" wasn't an option thus the BR&R. Edited June 3, 2008 by mr_absinthe Quote Absinthe Original Liquor Store Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2008 Author Share Posted June 3, 2008 I tried setting my user agent to Googlebot and I got a session. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
mr_absinthe Posted June 3, 2008 Share Posted June 3, 2008 Any idea what I might have wrong? Quote Absinthe Original Liquor Store Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2008 Author Share Posted June 3, 2008 The settings don't help because you've clearly changed the code that handles session starting. You'd have to debug the code in application_top.php and follow the code flow. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted July 7, 2008 Share Posted July 7, 2008 (edited) This one has just started spidering my site and seems to be picking up sessions: 64.40.117.118 - - [07/Jul/2008:02:50:51 +0200] "GET /robots.txt HTTP/1.0" 200 367 www.mysite.co.uk "-" "Sphere Scout&v4.0 - scout at sphere dot com" "-" Edited July 7, 2008 by perfectpassion Quote Link to comment Share on other sites More sharing options...
stevel Posted July 10, 2008 Author Share Posted July 10, 2008 I suggest adding "sphere" to the spiders.txt. I'll do that on the next revision. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted July 10, 2008 Share Posted July 10, 2008 thanks - will do Quote Link to comment Share on other sites More sharing options...
SpiceUp Posted July 16, 2008 Share Posted July 16, 2008 (edited) 82.99.30.52 & 82.99.30.13 spiders have a session id and keeps adding stuff to cart. Can some tell me how to stop this Thanks Edited July 16, 2008 by SpiceUp Quote Link to comment Share on other sites More sharing options...
SpiceUp Posted July 16, 2008 Share Posted July 16, 2008 (edited) OrgName: RIPE Network Coordination Centre OrgID: RIPE Address: P.O. Box 10096 City: Amsterdam StateProv: PostalCode: 1001EB Country: NL ReferralServer: whois://whois.ripe.net:43 NetRange: 82.0.0.0 - 82.255.255.255 CIDR: 82.0.0.0/8 NetName: 82-RIPE NetHandle: NET-82-0-0-0-1 Parent: NetType: Allocated to RIPE NCC NameServer: NS-PRI.RIPE.NET NameServer: NS3.NIC.FR NameServer: SEC1.APNIC.NET NameServer: SEC3.APNIC.NET NameServer: SUNIC.SUNET.SE NameServer: TINNIE.ARIN.NET Comment: These addresses have been further assigned to users in Comment: the RIPE NCC region. Contact information can be found in Comment: the RIPE database at http://www.ripe.net/whois RegDate: 2002-11-23 Updated: 2004-03-16 # ARIN WHOIS database, last updated 2008-07-15 19:10 # Enter ? for additional hints on searching ARIN's WHOIS database. What do i add to my spider txt to stop this sider from having a session id Thanks Edited July 16, 2008 by SpiceUp Quote Link to comment Share on other sites More sharing options...
stevel Posted July 16, 2008 Author Share Posted July 16, 2008 That WHOIS information told you nothing useful. Neither does the IP. What is the line from the access log, including the user agent string? Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
SpiceUp Posted July 17, 2008 Share Posted July 17, 2008 Is this what you need 82.99.30.70 - - [16/Jul/2008:13:30:03 -0500] "GET /store/catalog/product_info.php?products_id=792&osCsid=5eb127c7df7b17cad70f1e422efd6e71 HTTP/1.0" 200 34628 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" Quote Link to comment Share on other sites More sharing options...
stevel Posted July 17, 2008 Author Share Posted July 17, 2008 Yes, that's it. Unfortunately, the user agent looks like a normal interactive user. You can add a DenyFrom entry into your .htaccess file (if your web host supports that) to block the 82.99.30.* IP range. spiders.txt won't help you here. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
SpiceUp Posted July 17, 2008 Share Posted July 17, 2008 Thanks i got it, Keep up the good work, :) Quote Link to comment Share on other sites More sharing options...
SpiceUp Posted July 24, 2008 Share Posted July 24, 2008 What do I add to stop this yahoo bot from getting session Id 74.6.17.163 - - [23/Jul/2008:08:38:16 -0500] "GET /store/catalog/login.php HTTP/1.0" 302 0 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" Thanks in advance Quote Link to comment Share on other sites More sharing options...
germ Posted July 24, 2008 Share Posted July 24, 2008 Have you tried to disallow that page in your "robots.txt" file? :unsure: Quote If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you. "Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice." - Me - "Headers already sent" - The definitive help "Cannot redeclare ..." - How to find/fix it SSL Implementation Help Like this post? "Like" it again over there > Link to comment Share on other sites More sharing options...
SpiceUp Posted July 24, 2008 Share Posted July 24, 2008 (edited) All other Yahoo bots are ok, just this one gets session id. Edited July 24, 2008 by SpiceUp Quote Link to comment Share on other sites More sharing options...
SpiceUp Posted July 24, 2008 Share Posted July 24, 2008 Have you tried to disallow that page in your "robots.txt" file?:unsure: Can you please tell me how to do that Thanks Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.