warrenerjm Posted October 10, 2009 Share Posted October 10, 2009 Hi Steve I can't access my logs at the moment, but a "spider?" has been on my site for the last 3 days filling a basket, but all is seen is AToulouse-253-1-7-80.w92-146.abo.wanadoo.fr which when I click on goes to this address http://www.showmyip.com/?ip=92.146.202.80&get=nmap and geospoofing comes up. I suspect you can't do anything until I can look at my logs, but has anyone had this IP address show up doing the same thing please? Thanks Julie Quote Link to comment Share on other sites More sharing options...
Guest Posted November 21, 2009 Share Posted November 21, 2009 Sorry newbie question. Why do i need spiders.txt but i want my website index on search engine such as google, yahoo? Please help......... Quote Link to comment Share on other sites More sharing options...
stevel Posted January 3, 2010 Author Share Posted January 3, 2010 I wish this forum didn't keep expiring my subscriptions. Andrei, you want spiders.txt so that Google and Yahoo can index your site without picking up session IDs, which then get added to the search results. It also prevents search engines from following "add to cart" links and filling up carts with items. It does not prevent any search engine from indexing your site - it helps it do so efficiently. Julie, sorry I didn't see your post until now. If you're still having trouble, post again. I just uploaded a new update, but only one new string. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Fiana Posted February 22, 2010 Share Posted February 22, 2010 I've read the better part of this thread as I have a prob with my shop, but I haven't found the answer. Some, maybe even many, of my customers cannot login/register nor see the shopping cart Now I have been told this is related to the robot.txt, so I removed it, but the customers still have same problem. After reading this thread, I believe it is more to do with the spider.txt? It mentions 'google' and 'googlebot'. Is this correct? For clarity I added my spider.txt underneath. I hope you can help me out with his! Fiana abachoabcdatos abcsearch acoon adsarobot aesop ah-ha alkalinebot almaden altavista antibot anzwerscrawl aol search appie arachnoidea araneo architext ariadne arianna ask jeeves aspseek asterias astraspider atomz augurfind backrub baiduspider bannana_bot bbot bdcindexer blindekuh boitho boito borg-bot bsdseek christcrawler computer_and_automation_research_institute_crawler coolbot cosmos crawler crawler@fast crawlerboy cruiser cusco cyveillance deepindex denmex dittospyder docomo dogpile dtsearch elfinbot entire web esismartspider exalead excite ezresult fast fast-webcrawler fdse felix fido findwhat finnish firefly firstgov fluffy freecrawl frooglebot galaxy gaisbot geckobot gencrawler geobot gigabot girafa goclick goliat google googlebot griffon gromit grub-client gulliver gulper henrythemiragorobot hometown hotbot htdig hubater ia_archiver ibm_planetwide iitrovatore-setaccio incywincy incrawler indy infonavirobot infoseek ingrid inspectorwww intelliseek internetseer ip3000.com-crawler iron33 jcrawler jeeves jubii kanoodle kapito kit_fireball kit-fireball ko_yappo_robot kototoi lachesis larbin legs linkwalker lnspiderguy look.com lycos mantraagent markwatch maxbot mercator merzscope meshexplorer metacrawler mirago mnogosearch moget motor msn msnbot muscatferret nameprotect nationaldirectory naverrobot nazilla ncsa beta netnose netresearchserver ng/1.0 northerlights npbot nttdirectory_robot nutchorg nzexplorer odp openbot openfind osis-project overture perlcrawler phpdig pjspide polybot pompos poppi portalb psbot quepasacreep rabot raven rhcs robi robocrawl robozilla roverbot scooter scrubby search.ch search.com.ua searchfeed searchspider searchuk seventwentyfour sidewinder sightquestbot skymob sleek slider_search slurp solbot speedfind speedy spida spider_monkey spiderku stackrambler steeler suchbot suchknecht.at-robot suntek szukacz surferf3 surfnomore surveybot suzuran synobot tarantula teomaagent teradex t-h-u-n-d-e-r-s-t-o-n-e tigersuche topiclink toutatis tracerlock turnitinbot tutorgig uaportal uasearch.kiev.ua uksearcher ultraseek unitek vagabondo verygoodsearch vivisimo voilabot voyager vscooter w3index w3c_validator wapspider wdg_validator webcrawler webmasterresourcesdirectory webmoose websearchbench webspinne whatuseek whizbanglab winona wire wotbox wscbot www.webwombat.com.au xenu link sleuth xyro yahoobot yahoo! slurp yandex yellopet-spider zao/0 zealbot zippy zyborg Quote Link to comment Share on other sites More sharing options...
shopgrl Posted March 5, 2010 Share Posted March 5, 2010 I am having problems with the msn bot initiating session ids when it visits my site. It is also getting into the shopping cart, which I tried to block using robots.txt, but it is ignoring it. I also have the updated spiders.txt file. So what do I do now. Should I just block every msn bot that does this? Quote Link to comment Share on other sites More sharing options...
stevel Posted March 30, 2010 Author Share Posted March 30, 2010 Fiana, in order to determine what the problem is, we would need to see the record from your web server log for the users in question - this will have the user agent string that is used to search spiders.txt. It did come to my attention yesterday that the string "ng/" was causing some legitimate web browsers to be tagged as spiders, so I have just uploaded an update that removes that string. shopgrl, how do you know that it is msnbot and that it is getting session IDs? Can you show me lines from your access log that show this? I'll note that if msnbot already has a session ID and includes it in the URL, it will keep it unless you take additional actions. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Ken44 Posted March 30, 2010 Share Posted March 30, 2010 Hi Steve Thanks for all your hard work keeping this list up to date I have also noticed MSN bot coming to my site with a session (although most of the time it is recognised as a spider) I saw it happen again this morning and so I have attached part of my logfile in the hope that it will help to solve the problem The Spiders.txt at the time this logfile was created was dated 26/9/09. (Long version) I updated to the latest spiders.txt this afternoon. Regards Ken Line 826: mysite.co.uk 207.46.199.40 - - [29/Mar/2010:02:16:48 +0000] "GET /product_info.php?products_id=162 HTTP/1.1" 200 8020 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1110: mysite.co.uk 207.46.199.49 - - [29/Mar/2010:02:53:52 +0000] "GET / HTTP/1.1" 200 9106 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1322: mysite.co.uk 207.46.13.51 - - [29/Mar/2010:03:59:20 +0000] "GET /tutorial_telescope.php HTTP/1.1" 200 13447 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1327: mysite.co.uk 207.46.204.192 - - [29/Mar/2010:04:08:24 +0000] "GET /product_info.php?products_id=535 HTTP/1.1" 200 7055 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1413: mysite.co.uk 207.46.195.231 - - [29/Mar/2010:04:41:24 +0000] "GET /index.php?cPath=31 HTTP/1.1" 200 5966 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1422: mysite.co.uk 207.46.199.38 - - [29/Mar/2010:05:04:23 +0000] "GET /index.php?cPath=33 HTTP/1.1" 200 7972 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1482: mysite.co.uk 207.46.13.93 - - [29/Mar/2010:05:19:55 +0000] "GET /product_info.php?products_id=800 HTTP/1.1" 200 7675 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1552: mysite.co.uk 207.46.13.146 - - [29/Mar/2010:05:36:24 +0000] "GET /tutorial_binoculars.php HTTP/1.1" 200 11806 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 1632: mysite.co.uk 207.46.204.230 - - [29/Mar/2010:06:17:25 +0000] "GET /product.php?id_product=293 HTTP/1.1" 404 6453 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 2212: mysite.co.uk 65.55.106.135 - - [29/Mar/2010:07:39:32 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 2216: mysite.co.uk 207.46.204.191 - - [29/Mar/2010:07:41:55 +0000] "GET /index.php?cPath=9_10 HTTP/1.1" 200 8105 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 2317: mysite.co.uk 207.46.195.241 - - [29/Mar/2010:08:42:48 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 2415: mysite.co.uk 86.164.128.118 - - [29/Mar/2010:09:08:04 +0000] "GET /images/infobox/arrow_right.gif HTTP/1.1" 200 852 "http://www.mysite.co.uk/product_info.php?products_id=279" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YPC 3.0.2; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; msn OptimizedIE8;ENGB)" Line 2436: mysite.co.uk 86.164.128.118 - - [29/Mar/2010:09:08:22 +0000] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YPC 3.0.2; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; msn OptimizedIE8;ENGB)" Line 2554: mysite.co.uk 207.46.13.135 - - [29/Mar/2010:09:15:49 +0000] "GET /product_info.php?products_id=475 HTTP/1.1" 200 7771 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 2984: mysite.co.uk 207.46.199.51 - - [29/Mar/2010:10:13:20 +0000] "GET /product_info.php?products_id=528 HTTP/1.1" 200 7373 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 3516: mysite.co.uk 207.46.204.185 - - [29/Mar/2010:10:26:25 +0000] "GET /product_info.php?products_id=838 HTTP/1.1" 200 7287 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 3546: mysite.co.uk 65.55.230.156 - - [29/Mar/2010:10:30:19 +0000] "GET / HTTP/1.1" 200 9071 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)" Line 3727: mysite.co.uk 207.46.195.230 - - [29/Mar/2010:10:36:55 +0000] "GET /product_info.php?products_id=255 HTTP/1.1" 200 7409 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 3738: mysite.co.uk 207.46.204.228 - - [29/Mar/2010:10:55:01 +0000] "GET / HTTP/1.1" 200 9095 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 3740: mysite.co.uk 207.46.199.193 - - [29/Mar/2010:10:59:32 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 3882: mysite.co.uk 207.46.204.241 - - [29/Mar/2010:11:04:57 +0000] "GET /product_reviews_info.php?products_id=304&reviews_id=24 HTTP/1.1" 200 7259 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 4829: mysite.co.uk 207.46.199.41 - - [29/Mar/2010:12:17:20 +0000] "GET /tutorial_telescope.php HTTP/1.1" 200 13434 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 5382: mysite.co.uk 207.46.204.230 - - [29/Mar/2010:12:58:51 +0000] "GET /index.php?cPath=1_7 HTTP/1.1" 200 8784 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 5635: mysite.co.uk 207.46.204.233 - - [29/Mar/2010:13:45:51 +0000] "GET /index.php?cPath=23_24 HTTP/1.1" 200 8269 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 6192: mysite.co.uk 65.55.106.228 - - [29/Mar/2010:14:47:53 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 6200: mysite.co.uk 207.46.199.37 - - [29/Mar/2010:14:56:03 +0000] "GET /index.php?cPath=1_17 HTTP/1.1" 200 7147 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 6670: mysite.co.uk 207.46.199.46 - - [29/Mar/2010:15:24:05 +0000] "GET /tutorial_glossary.php HTTP/1.1" 200 10966 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 6683: mysite.co.uk 65.55.207.101 - - [29/Mar/2010:15:29:11 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 6898: mysite.co.uk 207.46.13.43 - - [29/Mar/2010:15:54:10 +0000] "GET / HTTP/1.1" 200 9086 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 6947: mysite.co.uk 207.46.199.44 - - [29/Mar/2010:15:56:04 +0000] "GET /index.php?cPath=31_32 HTTP/1.1" 200 8031 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 7008: mysite.co.uk 207.46.199.48 - - [29/Mar/2010:16:12:05 +0000] "GET /product_info.php?products_id=143 HTTP/1.1" 200 7456 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 7081: mysite.co.uk 207.46.13.43 - - [29/Mar/2010:16:33:04 +0000] "GET /product_info.php?products_id=347 HTTP/1.1" 200 8096 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 7454: mysite.co.uk 207.46.13.132 - - [29/Mar/2010:17:11:03 +0000] "GET /index.php?manufacturers_id=22 HTTP/1.1" 200 8584 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 7457: mysite.co.uk 207.46.13.47 - - [29/Mar/2010:17:12:34 +0000] "GET /product_info.php?products_id=829 HTTP/1.1" 200 7005 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 7523: mysite.co.uk 207.46.199.53 - - [29/Mar/2010:17:14:36 +0000] "GET /telescopes-c-1.html HTTP/1.1" 404 6463 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 8343: mysite.co.uk 207.46.204.191 - - [29/Mar/2010:18:09:34 +0000] "GET /index.php?cPath=23 HTTP/1.1" 200 6178 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 9159: mysite.co.uk 207.46.13.139 - - [29/Mar/2010:19:11:04 +0000] "GET /product_info.php?products_id=188 HTTP/1.1" 200 7893 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 9462: mysite.co.uk 207.46.13.141 - - [29/Mar/2010:19:44:05 +0000] "GET /index.php?cPath=9_10 HTTP/1.1" 200 8052 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 9815: mysite.co.uk 65.55.207.27 - - [29/Mar/2010:20:07:27 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 10855: mysite.co.uk 207.46.204.229 - - [29/Mar/2010:22:20:46 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Line 10881: mysite.co.uk 65.55.207.75 - - [29/Mar/2010:22:59:35 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" Quote Link to comment Share on other sites More sharing options...
stevel Posted March 30, 2010 Author Share Posted March 30, 2010 Ken, What evidence do you have that msnbot is getting a session? I don't see that in your log. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Ken44 Posted March 30, 2010 Share Posted March 30, 2010 What evidence do you have that msnbot is getting a session? I don't see that in your log. Hi Steve I agree, I can’t see anything in the log either. The only evidence I have is that sometimes I see MSN Bot in ‘Who's Online 3.5.4’ with a session. If I see it happen again I will try to get a screenshot Regards Ken Quote Link to comment Share on other sites More sharing options...
stevel Posted March 30, 2010 Author Share Posted March 30, 2010 (edited) I don't trust that contribution. The only thing I would consider proof is to see either a session ID in the URL in the second page visited or a visit to a page that requires a session. Edited March 30, 2010 by stevel Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Ken44 Posted March 30, 2010 Share Posted March 30, 2010 Hi Steve Here is a sample of MSN showing a session. in ‘Who's Online 3.5.4’ And the associated logfile entries for products_id=790 (1 hour different) mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:12 +0000] "GET /product_info.php?products_id=790 HTTP/1.1" 200 8185 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:15 +0000] "GET /stylesheet.css HTTP/1.1" 200 26299 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:15 +0000] "GET /images/js/jquery.js HTTP/1.1" 200 57254 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:24 +0000] "GET /images/js/jquery.fancybox.js HTTP/1.1" 200 15976 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:25 +0000] "GET /images/js/jquery.easing.js HTTP/1.1" 200 8097 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:25 +0000] "GET /images/js/jquery.easydrag.handler.beta2.js HTTP/1.1" 200 4507 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:26 +0000] "GET /images/js/fancybox.css HTTP/1.1" 200 4861 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:28 +0000] "GET /product_thumb.php?img=images/10221.jpg&w=185&h=185 HTTP/1.1" 200 5727 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)" 207.46.12.160 IP address does belong to Microsoft Registry Whois OrgName: Microsoft Corp OrgID: MSFT Address: One Microsoft Way City: Redmond StateProv: WA PostalCode: 98052 Country: US However I cannot see any session created, so I guess ‘Who's Online 3.5.4’ is giving a false output Regards Ken Quote Link to comment Share on other sites More sharing options...
stevel Posted March 30, 2010 Author Share Posted March 30, 2010 I don't know how it tells that there is a session for that IP - I have not looked at the code. Since there is no sID in the URL, msnbot would have to be setting a cookie and I very much doubt that. If you don't see msnbot filling up a cart, I would just ignore this. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Ken44 Posted March 30, 2010 Share Posted March 30, 2010 Hi Steve It does appear that there is nothing to worry about as I have never seen msnbot with an item in the shopping cart I thought this was a spiders.txt problem because IP 207.46.12.160 is seen as a guest. I will take a look at the ‘Who's Online 3.5.4’ code and see if I can find the problem Thanks for your help Regards Ken Quote Link to comment Share on other sites More sharing options...
Lary_an Posted April 10, 2010 Share Posted April 10, 2010 Hi Steve, I think that Googlebot is still getting a session id on my site, i have the latest spiders.txt (short) and i have prevent spider sessions set to true. It shows a lot of "unreachable" pages in my Google Webmaster Tools, and most of them have session id: like the ones below: http://babyfavorsandgifts.com/product_reviews.php?osCsid=m7olfuopm0t8vcmit9huet8cu7 http://babyfavorsandgifts.com/product_reviews_info.php?osCsid=2025e1gnmm4jvcsucg15c9fkr7 http://babyfavorsandgifts.com/reviews.php?page=2&osCsid=q7kkphtapgqg7emvmaoabn1cg0 http://babyfavorsandgifts.com/reviews.php?page=3&osCsid=q7kkphtapgqg7emvmaoabn1cg0 http://babyfavorsandgifts.com/reviews.php?page=4&osCsid=q7kkphtapgqg7emvmaoabn1cg0 http://babyfavorsandgifts.com/reviews.php?page=5&osCsid=q7kkphtapgqg7emvmaoabn1cg0 http://babyfavorsandgifts.com/round-cake-pr-63.html?pID=63&osCsid=nc0upf01g50kdugiql3n7lal37 http://babyfavorsandgifts.com/toys-c-30.html?osCsid=5t0nu9jcjgj4truariuseq1fo2 http://babyfavorsandgifts.com/tricycle-diaper-cake-p-67.html?imagesID=420&osCsid=4jvjslhe1po00i9idtiurt3o34 http://babyfavorsandgifts.com/tricycle-diaper-cake-p-67.html?imagesID=521 http://babyfavorsandgifts.com/tricycle-diaper-cake-p-67.html?pID=67&osCsid=gikjf8ei9uu1jndrpp3709av80 http://babyfavorsandgifts.com/washcloth-cakes-p-117.html?osCsid=o4q93bmtnden799kgnn91h80r0 http://babyfavorsandgifts.com/washcloth-cakes-set-of-5-p-117.html?osCsid=0t23s3pk5muikpe2nho44s9ri3 http://babyfavorsandgifts.com/washcloth-cakes-set-of-5-p-117.html?osCsid=pslh69ftolqe8s2rvbik5qqg56 Am i missing something? I would really appreciate the help. Thanks. Quote Link to comment Share on other sites More sharing options...
stevel Posted April 10, 2010 Author Share Posted April 10, 2010 I tested your site and spiders.txt is working correctly. What may have happened is that Google indexed your site before you enabled "Prevent Spider Sessions" and it indexed links with session IDs. Unfortunately, if the link comes in with an ID, spiders.txt is skipped and it recreates the session. To fix that there is a separate add-on, Spider Session Destroyer (I think), that requires changes to the .htaccess file. Even that will take quite a while to purge session IDs from the index. Another approach you can take is to change the name of the session tag from oscSid to something else - that is done in includes/functions/sessions.php Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Lary_an Posted April 11, 2010 Share Posted April 11, 2010 Hi Steve, Thanks for quick reply. It is definitely it i looked more carefully at the pages and noticed some pages i don't have for months. Guess i have to wait for it to go away. Thanks again. Quote Link to comment Share on other sites More sharing options...
craigan Posted June 1, 2010 Share Posted June 1, 2010 Is there a way I can add my own IP to the spiders file so that when I visit it thinks I'm a spider? Quote Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2010 Author Share Posted June 3, 2010 spiders.txt doesn't do anything with IPs - it looks at the "user agent" string. What I do to test it is install the "User Agent Switcher" add-on for Firefox, then select Googlebot as my new user agent. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
craigan Posted June 3, 2010 Share Posted June 3, 2010 spiders.txt doesn't do anything with IPs - it looks at the "user agent" string. What I do to test it is install the "User Agent Switcher" add-on for Firefox, then select Googlebot as my new user agent. Thanks for your reply, Steve. I'm am trying to figure out a way to get my site to think I'm a spider and not assign a session ID when I visit the site. Back to the drawing board... :) Quote Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2010 Author Share Posted June 3, 2010 If you want this all the time, it's straightforward to add a test for your IP to application_top.php that deletes the session if present. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Francys Posted June 3, 2010 Share Posted June 3, 2010 The proper configuration of Admin>Configuration>Sessions it this isnt it ?? Force Cookie Use False Check User Agent True Prevent Spider Sessions True Validate the clients browser user agent on every page request True. Quote Link to comment Share on other sites More sharing options...
stevel Posted June 3, 2010 Author Share Posted June 3, 2010 The proper configuration of Admin>Configuration>Sessions it this isnt it ?? Force Cookie Use False Check User Agent True Prevent Spider Sessions True Validate the clients browser user agent on every page request True. Well, yes - if you want to use spiders.txt at all, then you have to set "Prevent Spider Sessions" to True. The other settings have other purposes, though the settings you list here are what I would use. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Francys Posted June 14, 2010 Share Posted June 14, 2010 (edited) Well, yes - if you want to use spiders.txt at all, then you have to set "Prevent Spider Sessions" to True. The other settings have other purposes, though the settings you list here are what I would use. Thanks stevel Regarding this topic: i have found that the bots from www.feedvalidator.org the tool for you to check if the feeds are compliant when reaching the shopsite are not recognized as bots, but as clients, check the user agent: FeedValidator/1.3 i didn't remember the rest like IP and name... congratulations for spiders.txt it's so usefull. I have had recently a user not showing anything in my website (no IP, no name, no user agent, and it said reserved) what kind of firewall is this. Where can we retrieve the logfiles of the visits without using cpanel. thanks Francys Edited June 14, 2010 by Francys Quote Link to comment Share on other sites More sharing options...
stevel Posted June 14, 2010 Author Share Posted June 14, 2010 You will need to ask your web host provider how to get at access logs. In general I don't add "validators" to the list because they usually don't go beyond the home page. There is a non-zero cost to each additional entry. The primary purpose of spiders.txt is to prevent session IDs from appearing in search index results, and secondarily to prevent bots from following "add to cart" links. Validators don't tend to show this problem. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Androider Posted July 2, 2010 Share Posted July 2, 2010 Can someone please explain this to me. So I downloaded this and put the spider.txt file in catalogue/includes folder. But upon reading the readme.txt file I found this to be somewhat confusing. "For example, "ebot" matches Googlebot, "nbot" matches msnbot." What does this mean? Does this mean that because I have "ebot" in spider.txt file googlebot will not index my site on their search engine? Or is the "Googlebot" a harmful robot that just takes my bandwidth and its not from the company Google? Please someone explain this to me, I just wanted yandex.ru off my site because they were always there... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.