Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

Hi Steve

 

I can't access my logs at the moment, but a "spider?" has been on my site for the last 3 days filling a basket, but all is seen is AToulouse-253-1-7-80.w92-146.abo.wanadoo.fr which when I click on goes to this address http://www.showmyip.com/?ip=92.146.202.80&get=nmap and geospoofing comes up.

I suspect you can't do anything until I can look at my logs, but has anyone had this IP address show up doing the same thing please?

 

Thanks

Julie

Share this post


Link to post
Share on other sites

Sorry newbie question.

Why do i need spiders.txt but i want my website index on search engine such as google, yahoo?

Please help.........

Share this post


Link to post
Share on other sites

I wish this forum didn't keep expiring my subscriptions.

 

Andrei, you want spiders.txt so that Google and Yahoo can index your site without picking up session IDs, which then get added to the search results. It also prevents search engines from following "add to cart" links and filling up carts with items. It does not prevent any search engine from indexing your site - it helps it do so efficiently.

 

Julie, sorry I didn't see your post until now. If you're still having trouble, post again.

 

I just uploaded a new update, but only one new string.

Share this post


Link to post
Share on other sites

I've read the better part of this thread as I have a prob with my shop, but I haven't found the answer.

Some, maybe even many, of my customers cannot login/register nor see the shopping cart Now I have been told this is related to the robot.txt, so I removed it, but the customers still have same problem. After reading this thread, I believe it is more to do with the spider.txt? It mentions 'google' and 'googlebot'. Is this correct?

For clarity I added my spider.txt underneath.

I hope you can help me out with his!

 

Fiana

 

abacho

abcdatos

abcsearch

acoon

adsarobot

aesop

ah-ha

alkalinebot

almaden

altavista

antibot

anzwerscrawl

aol search

appie

arachnoidea

araneo

architext

ariadne

arianna

ask jeeves

aspseek

asterias

astraspider

atomz

augurfind

backrub

baiduspider

bannana_bot

bbot

bdcindexer

blindekuh

boitho

boito

borg-bot

bsdseek

christcrawler

computer_and_automation_research_institute_crawler

coolbot

cosmos

crawler

crawler@fast

crawlerboy

cruiser

cusco

cyveillance

deepindex

denmex

dittospyder

docomo

dogpile

dtsearch

elfinbot

entire web

esismartspider

exalead

excite

ezresult

fast

fast-webcrawler

fdse

felix

fido

findwhat

finnish

firefly

firstgov

fluffy

freecrawl

frooglebot

galaxy

gaisbot

geckobot

gencrawler

geobot

gigabot

girafa

goclick

goliat

google

googlebot

griffon

gromit

grub-client

gulliver

gulper

henrythemiragorobot

hometown

hotbot

htdig

hubater

ia_archiver

ibm_planetwide

iitrovatore-setaccio

incywincy

incrawler

indy

infonavirobot

infoseek

ingrid

inspectorwww

intelliseek

internetseer

ip3000.com-crawler

iron33

jcrawler

jeeves

jubii

kanoodle

kapito

kit_fireball

kit-fireball

ko_yappo_robot

kototoi

lachesis

larbin

legs

linkwalker

lnspiderguy

look.com

lycos

mantraagent

markwatch

maxbot

mercator

merzscope

meshexplorer

metacrawler

mirago

mnogosearch

moget

motor

msn

msnbot

muscatferret

nameprotect

nationaldirectory

naverrobot

nazilla

ncsa beta

netnose

netresearchserver

ng/1.0

northerlights

npbot

nttdirectory_robot

nutchorg

nzexplorer

odp

openbot

openfind

osis-project

overture

perlcrawler

phpdig

pjspide

polybot

pompos

poppi

portalb

psbot

quepasacreep

rabot

raven

rhcs

robi

robocrawl

robozilla

roverbot

scooter

scrubby

search.ch

search.com.ua

searchfeed

searchspider

searchuk

seventwentyfour

sidewinder

sightquestbot

skymob

sleek

slider_search

slurp

solbot

speedfind

speedy

spida

spider_monkey

spiderku

stackrambler

steeler

suchbot

suchknecht.at-robot

suntek

szukacz

surferf3

surfnomore

surveybot

suzuran

synobot

tarantula

teomaagent

teradex

t-h-u-n-d-e-r-s-t-o-n-e

tigersuche

topiclink

toutatis

tracerlock

turnitinbot

tutorgig

uaportal

uasearch.kiev.ua

uksearcher

ultraseek

unitek

vagabondo

verygoodsearch

vivisimo

voilabot

voyager

vscooter

w3index

w3c_validator

wapspider

wdg_validator

webcrawler

webmasterresourcesdirectory

webmoose

websearchbench

webspinne

whatuseek

whizbanglab

winona

wire

wotbox

wscbot

www.webwombat.com.au

xenu link sleuth

xyro

yahoobot

yahoo! slurp

yandex

yellopet-spider

zao/0

zealbot

zippy

zyborg

Share this post


Link to post
Share on other sites

I am having problems with the msn bot initiating session ids when it visits my site. It is also getting into the shopping cart, which I tried to block using robots.txt, but it is ignoring it. I also have the updated spiders.txt file. So what do I do now. Should I just block every msn bot that does this?

Share this post


Link to post
Share on other sites

Fiana, in order to determine what the problem is, we would need to see the record from your web server log for the users in question - this will have the user agent string that is used to search spiders.txt. It did come to my attention yesterday that the string "ng/" was causing some legitimate web browsers to be tagged as spiders, so I have just uploaded an update that removes that string.

 

shopgrl, how do you know that it is msnbot and that it is getting session IDs? Can you show me lines from your access log that show this? I'll note that if msnbot already has a session ID and includes it in the URL, it will keep it unless you take additional actions.

Share this post


Link to post
Share on other sites

Hi Steve

 

Thanks for all your hard work keeping this list up to date

 

I have also noticed MSN bot coming to my site with a session (although most of the time it is recognised as a spider)

 

I saw it happen again this morning and so I have attached part of my logfile in the hope that it will help to solve the problem

 

 

The Spiders.txt at the time this logfile was created was dated 26/9/09. (Long version) I updated to the latest spiders.txt this afternoon.

 

 

Regards

 

Ken

 

 

 

	Line 826: mysite.co.uk 207.46.199.40 - - [29/Mar/2010:02:16:48 +0000] "GET /product_info.php?products_id=162 HTTP/1.1" 200 8020 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1110: mysite.co.uk 207.46.199.49 - - [29/Mar/2010:02:53:52 +0000] "GET / HTTP/1.1" 200 9106 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1322: mysite.co.uk 207.46.13.51 - - [29/Mar/2010:03:59:20 +0000] "GET /tutorial_telescope.php HTTP/1.1" 200 13447 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1327: mysite.co.uk 207.46.204.192 - - [29/Mar/2010:04:08:24 +0000] "GET /product_info.php?products_id=535 HTTP/1.1" 200 7055 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1413: mysite.co.uk 207.46.195.231 - - [29/Mar/2010:04:41:24 +0000] "GET /index.php?cPath=31 HTTP/1.1" 200 5966 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1422: mysite.co.uk 207.46.199.38 - - [29/Mar/2010:05:04:23 +0000] "GET /index.php?cPath=33 HTTP/1.1" 200 7972 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1482: mysite.co.uk 207.46.13.93 - - [29/Mar/2010:05:19:55 +0000] "GET /product_info.php?products_id=800 HTTP/1.1" 200 7675 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1552: mysite.co.uk 207.46.13.146 - - [29/Mar/2010:05:36:24 +0000] "GET /tutorial_binoculars.php HTTP/1.1" 200 11806 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 1632: mysite.co.uk 207.46.204.230 - - [29/Mar/2010:06:17:25 +0000] "GET /product.php?id_product=293 HTTP/1.1" 404 6453 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 2212: mysite.co.uk 65.55.106.135 - - [29/Mar/2010:07:39:32 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 2216: mysite.co.uk 207.46.204.191 - - [29/Mar/2010:07:41:55 +0000] "GET /index.php?cPath=9_10 HTTP/1.1" 200 8105 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 2317: mysite.co.uk 207.46.195.241 - - [29/Mar/2010:08:42:48 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 2415: mysite.co.uk 86.164.128.118 - - [29/Mar/2010:09:08:04 +0000] "GET /images/infobox/arrow_right.gif HTTP/1.1" 200 852 "http://www.mysite.co.uk/product_info.php?products_id=279" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YPC 3.0.2; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; msn OptimizedIE8;ENGB)"
Line 2436: mysite.co.uk 86.164.128.118 - - [29/Mar/2010:09:08:22 +0000] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YPC 3.0.2; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; msn OptimizedIE8;ENGB)"
Line 2554: mysite.co.uk 207.46.13.135 - - [29/Mar/2010:09:15:49 +0000] "GET /product_info.php?products_id=475 HTTP/1.1" 200 7771 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 2984: mysite.co.uk 207.46.199.51 - - [29/Mar/2010:10:13:20 +0000] "GET /product_info.php?products_id=528 HTTP/1.1" 200 7373 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 3516: mysite.co.uk 207.46.204.185 - - [29/Mar/2010:10:26:25 +0000] "GET /product_info.php?products_id=838 HTTP/1.1" 200 7287 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 3546: mysite.co.uk 65.55.230.156 - - [29/Mar/2010:10:30:19 +0000] "GET / HTTP/1.1" 200 9071 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
Line 3727: mysite.co.uk 207.46.195.230 - - [29/Mar/2010:10:36:55 +0000] "GET /product_info.php?products_id=255 HTTP/1.1" 200 7409 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 3738: mysite.co.uk 207.46.204.228 - - [29/Mar/2010:10:55:01 +0000] "GET / HTTP/1.1" 200 9095 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 3740: mysite.co.uk 207.46.199.193 - - [29/Mar/2010:10:59:32 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 3882: mysite.co.uk 207.46.204.241 - - [29/Mar/2010:11:04:57 +0000] "GET /product_reviews_info.php?products_id=304&reviews_id=24 HTTP/1.1" 200 7259 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 4829: mysite.co.uk 207.46.199.41 - - [29/Mar/2010:12:17:20 +0000] "GET /tutorial_telescope.php HTTP/1.1" 200 13434 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 5382: mysite.co.uk 207.46.204.230 - - [29/Mar/2010:12:58:51 +0000] "GET /index.php?cPath=1_7 HTTP/1.1" 200 8784 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 5635: mysite.co.uk 207.46.204.233 - - [29/Mar/2010:13:45:51 +0000] "GET /index.php?cPath=23_24 HTTP/1.1" 200 8269 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 6192: mysite.co.uk 65.55.106.228 - - [29/Mar/2010:14:47:53 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 6200: mysite.co.uk 207.46.199.37 - - [29/Mar/2010:14:56:03 +0000] "GET /index.php?cPath=1_17 HTTP/1.1" 200 7147 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 6670: mysite.co.uk 207.46.199.46 - - [29/Mar/2010:15:24:05 +0000] "GET /tutorial_glossary.php HTTP/1.1" 200 10966 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 6683: mysite.co.uk 65.55.207.101 - - [29/Mar/2010:15:29:11 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 6898: mysite.co.uk 207.46.13.43 - - [29/Mar/2010:15:54:10 +0000] "GET / HTTP/1.1" 200 9086 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 6947: mysite.co.uk 207.46.199.44 - - [29/Mar/2010:15:56:04 +0000] "GET /index.php?cPath=31_32 HTTP/1.1" 200 8031 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 7008: mysite.co.uk 207.46.199.48 - - [29/Mar/2010:16:12:05 +0000] "GET /product_info.php?products_id=143 HTTP/1.1" 200 7456 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 7081: mysite.co.uk 207.46.13.43 - - [29/Mar/2010:16:33:04 +0000] "GET /product_info.php?products_id=347 HTTP/1.1" 200 8096 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 7454: mysite.co.uk 207.46.13.132 - - [29/Mar/2010:17:11:03 +0000] "GET /index.php?manufacturers_id=22 HTTP/1.1" 200 8584 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 7457: mysite.co.uk 207.46.13.47 - - [29/Mar/2010:17:12:34 +0000] "GET /product_info.php?products_id=829 HTTP/1.1" 200 7005 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 7523: mysite.co.uk 207.46.199.53 - - [29/Mar/2010:17:14:36 +0000] "GET /telescopes-c-1.html HTTP/1.1" 404 6463 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 8343: mysite.co.uk 207.46.204.191 - - [29/Mar/2010:18:09:34 +0000] "GET /index.php?cPath=23 HTTP/1.1" 200 6178 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 9159: mysite.co.uk 207.46.13.139 - - [29/Mar/2010:19:11:04 +0000] "GET /product_info.php?products_id=188 HTTP/1.1" 200 7893 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 9462: mysite.co.uk 207.46.13.141 - - [29/Mar/2010:19:44:05 +0000] "GET /index.php?cPath=9_10 HTTP/1.1" 200 8052 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 9815: mysite.co.uk 65.55.207.27 - - [29/Mar/2010:20:07:27 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 10855: mysite.co.uk 207.46.204.229 - - [29/Mar/2010:22:20:46 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
Line 10881: mysite.co.uk 65.55.207.75 - - [29/Mar/2010:22:59:35 +0000] "GET /robots.txt HTTP/1.1" 200 5183 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"

Share this post


Link to post
Share on other sites

What evidence do you have that msnbot is getting a session? I don't see that in your log.

 

 

Hi Steve

 

I agree, I can’t see anything in the log either.

 

The only evidence I have is that sometimes I see MSN Bot in ‘Who's Online 3.5.4’ with a session.

 

If I see it happen again I will try to get a screenshot

 

Regards

 

Ken

Share this post


Link to post
Share on other sites

I don't trust that contribution. The only thing I would consider proof is to see either a session ID in the URL in the second page visited or a visit to a page that requires a session.

Edited by stevel

Share this post


Link to post
Share on other sites

Hi Steve

 

Here is a sample of MSN showing a session. in ‘Who's Online 3.5.4’

 

 

msn3.jpg

 

 

And the associated logfile entries for products_id=790 (1 hour different)

 

mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:12 +0000] "GET /product_info.php?products_id=790 HTTP/1.1" 200 8185 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:15 +0000] "GET /stylesheet.css HTTP/1.1" 200 26299 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:15 +0000] "GET /images/js/jquery.js HTTP/1.1" 200 57254 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:24 +0000] "GET /images/js/jquery.fancybox.js HTTP/1.1" 200 15976 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:25 +0000] "GET /images/js/jquery.easing.js HTTP/1.1" 200 8097 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:25 +0000] "GET /images/js/jquery.easydrag.handler.beta2.js HTTP/1.1" 200 4507 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:26 +0000] "GET /images/js/fancybox.css HTTP/1.1" 200 4861 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"
mysite.co.uk 207.46.12.160 - - [30/Mar/2010:19:11:28 +0000] "GET /product_thumb.php?img=images/10221.jpg&w=185&h=185 HTTP/1.1" 200 5727 "http://www.mysite.co.uk/product_info.php?products_id=790" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4325;  .NET CLR 2.0.50727;  .NET CLR 3.0.04506.648)"

 

 

 

207.46.12.160 IP address does belong to Microsoft

 

 

Registry Whois

 

OrgName: Microsoft Corp

OrgID: MSFT

Address: One Microsoft Way

City: Redmond

StateProv: WA

PostalCode: 98052

Country: US

 

 

 

However I cannot see any session created, so I guess ‘Who's Online 3.5.4’ is giving a false output

 

 

Regards

 

Ken

Share this post


Link to post
Share on other sites

I don't know how it tells that there is a session for that IP - I have not looked at the code. Since there is no sID in the URL, msnbot would have to be setting a cookie and I very much doubt that. If you don't see msnbot filling up a cart, I would just ignore this.

Share this post


Link to post
Share on other sites

Hi Steve

 

It does appear that there is nothing to worry about as I have never seen msnbot with an item in the shopping cart

 

I thought this was a spiders.txt problem because IP 207.46.12.160 is seen as a guest.

 

I will take a look at the ‘Who's Online 3.5.4’ code and see if I can find the problem

 

Thanks for your help

 

Regards

 

Ken

Share this post


Link to post
Share on other sites

Hi Steve,

 

I think that Googlebot is still getting a session id on my site, i have the latest spiders.txt (short) and i have prevent spider sessions set to true.

 

It shows a lot of "unreachable" pages in my Google Webmaster Tools, and most of them have session id: like the ones below:

http://babyfavorsandgifts.com/product_reviews.php?osCsid=m7olfuopm0t8vcmit9huet8cu7

http://babyfavorsandgifts.com/product_reviews_info.php?osCsid=2025e1gnmm4jvcsucg15c9fkr7

http://babyfavorsandgifts.com/reviews.php?page=2&osCsid=q7kkphtapgqg7emvmaoabn1cg0

http://babyfavorsandgifts.com/reviews.php?page=3&osCsid=q7kkphtapgqg7emvmaoabn1cg0

http://babyfavorsandgifts.com/reviews.php?page=4&osCsid=q7kkphtapgqg7emvmaoabn1cg0

http://babyfavorsandgifts.com/reviews.php?page=5&osCsid=q7kkphtapgqg7emvmaoabn1cg0

http://babyfavorsandgifts.com/round-cake-pr-63.html?pID=63&osCsid=nc0upf01g50kdugiql3n7lal37

http://babyfavorsandgifts.com/toys-c-30.html?osCsid=5t0nu9jcjgj4truariuseq1fo2

http://babyfavorsandgifts.com/tricycle-diaper-cake-p-67.html?imagesID=420&osCsid=4jvjslhe1po00i9idtiurt3o34

http://babyfavorsandgifts.com/tricycle-diaper-cake-p-67.html?imagesID=521

http://babyfavorsandgifts.com/tricycle-diaper-cake-p-67.html?pID=67&osCsid=gikjf8ei9uu1jndrpp3709av80

http://babyfavorsandgifts.com/washcloth-cakes-p-117.html?osCsid=o4q93bmtnden799kgnn91h80r0

http://babyfavorsandgifts.com/washcloth-cakes-set-of-5-p-117.html?osCsid=0t23s3pk5muikpe2nho44s9ri3

http://babyfavorsandgifts.com/washcloth-cakes-set-of-5-p-117.html?osCsid=pslh69ftolqe8s2rvbik5qqg56

 

Am i missing something?

 

I would really appreciate the help.

 

Thanks.

Share this post


Link to post
Share on other sites

I tested your site and spiders.txt is working correctly. What may have happened is that Google indexed your site before you enabled "Prevent Spider Sessions" and it indexed links with session IDs. Unfortunately, if the link comes in with an ID, spiders.txt is skipped and it recreates the session. To fix that there is a separate add-on, Spider Session Destroyer (I think), that requires changes to the .htaccess file. Even that will take quite a while to purge session IDs from the index.

 

Another approach you can take is to change the name of the session tag from oscSid to something else - that is done in includes/functions/sessions.php

Share this post


Link to post
Share on other sites

Hi Steve,

 

Thanks for quick reply.

 

It is definitely it i looked more carefully at the pages and noticed some pages i don't have for months.

 

Guess i have to wait for it to go away.

 

Thanks again.

Share this post


Link to post
Share on other sites

Is there a way I can add my own IP to the spiders file so that when I visit it thinks I'm a spider?

Share this post


Link to post
Share on other sites

spiders.txt doesn't do anything with IPs - it looks at the "user agent" string. What I do to test it is install the "User Agent Switcher" add-on for Firefox, then select Googlebot as my new user agent.

Share this post


Link to post
Share on other sites

spiders.txt doesn't do anything with IPs - it looks at the "user agent" string. What I do to test it is install the "User Agent Switcher" add-on for Firefox, then select Googlebot as my new user agent.

 

Thanks for your reply, Steve. I'm am trying to figure out a way to get my site to think I'm a spider and not assign a session ID when I visit the site. Back to the drawing board... :)

Share this post


Link to post
Share on other sites

The proper configuration of Admin>Configuration>Sessions it this isnt it ??

 

Force Cookie Use False

Check User Agent True

Prevent Spider Sessions True

Validate the clients browser user agent on every page request True.

Share this post


Link to post
Share on other sites

The proper configuration of Admin>Configuration>Sessions it this isnt it ??

 

Force Cookie Use False

Check User Agent True

Prevent Spider Sessions True

Validate the clients browser user agent on every page request True.

 

Well, yes - if you want to use spiders.txt at all, then you have to set "Prevent Spider Sessions" to True. The other settings have other purposes, though the settings you list here are what I would use.

Share this post


Link to post
Share on other sites

Well, yes - if you want to use spiders.txt at all, then you have to set "Prevent Spider Sessions" to True. The other settings have other purposes, though the settings you list here are what I would use.

 

Thanks stevel

 

Regarding this topic: i have found that the bots from www.feedvalidator.org the tool for you to check if the feeds are compliant when reaching the shopsite are not recognized as bots, but as clients, check the user agent: FeedValidator/1.3 i didn't remember the rest like IP and name... congratulations for spiders.txt it's so usefull.

 

I have had recently a user not showing anything in my website (no IP, no name, no user agent, and it said reserved) what kind of firewall is this.

 

Where can we retrieve the logfiles of the visits without using cpanel.

 

thanks

Francys

Edited by Francys

Share this post


Link to post
Share on other sites

You will need to ask your web host provider how to get at access logs.

 

In general I don't add "validators" to the list because they usually don't go beyond the home page. There is a non-zero cost to each additional entry. The primary purpose of spiders.txt is to prevent session IDs from appearing in search index results, and secondarily to prevent bots from following "add to cart" links. Validators don't tend to show this problem.

Share this post


Link to post
Share on other sites

Can someone please explain this to me.

 

So I downloaded this and put the spider.txt file in catalogue/includes folder.

 

But upon reading the readme.txt file I found this to be somewhat confusing.

 

"For example, "ebot" matches Googlebot, "nbot" matches msnbot."

 

What does this mean?

Does this mean that because I have "ebot" in spider.txt file googlebot will not index my site on their search engine?

 

Or is the "Googlebot" a harmful robot that just takes my bandwidth and its not from the company Google?

 

Please someone explain this to me, I just wanted yandex.ru off my site because they were always there...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×