Jump to content
stevel

Updated spiders.txt Official Support Topic

Recommended Posts

Christine,

 

There is a robots.txt contibution and, unlike spiders.txt, it does not need to be kept updated. The robots.txt file goes at the top level of your site, so that it can be found with the url http://www.example.com/robots.txt for a site example.com Note that you may have to adjust the contents of robots.txt to show the path you use for your store.

 

I recommend that you subscribe to the announcement topic so you can learn of updates.

 

By the way, I occasionally see others post updates of their own, but most of these seem to seriously misunderstand the way this file is used, with most of the additions being redundant or incorrect.

Share this post


Link to post
Share on other sites

Hi Steve, and thank you also..

 

I can say this much.. I'm glad I'm learning about this now before my new and updated store opens up.. If I can prevent this sort of malicious behavior from happening.. I will.. :)

 

Thanks for all of your help Steve.. I've subscribed to the above link mentioned by you.. And will be keeping up to date with this for sure..

 

Oh, and I did have to do a little work on the robots.txt Being that I don't use the catalog folder.. Thanks for pointing that out. :thumbsup:

 

Christine


If it ain't broke, don't fix it! :)

Share this post


Link to post
Share on other sites

Sigh - subscribing to the announcement topic doesn't do anything useful anymore. Harald took a major step backwards in usability by preventing contrib authors from updating the announcements topics. Instead, any new or changed contrib gets a new thread automatically added. It's now impossible to track individual contributions. You get all or none.

 

I understand the problem he was trying to solve, but in my view the cure is worse than the disease.

 

So perhaps what you should do is set a reminder for yourself once a month to check the contrib page for updates. Usually I update twice a month, but it depends on how many new spiders I see.

Share this post


Link to post
Share on other sites
Sigh - subscribing to the announcement topic doesn't do anything useful anymore. Harald took a major step backwards in usability by preventing contrib authors from updating the announcements topics. Instead, any new or changed contrib gets a new thread automatically added. It's now impossible to track individual contributions. You get all or none.

 

I understand the problem he was trying to solve, but in my view the cure is worse than the disease.

 

So perhaps what you should do is set a reminder for yourself once a month to check the contrib page for updates. Usually I update twice a month, but it depends on how many new spiders I see.

 

Thanks again Steve.. :)

 

I've already started a folder in my favorites in IE.. So I'm adding anything important that I need referring to, and also checking in on the latest contributions every day too.. :P

 

Thanks

 

Christine


If it ain't broke, don't fix it! :)

Share this post


Link to post
Share on other sites

Cookies warning page in Firefox.

 

I updated the spiders.txt file and changed the "prevent spider sessions" to true. After changing the setting to false, I got a cookies warning box when trying to add a product to my cart. I am using firefox, which allows for cookies in its preferences, but somehow I was unable to add a product.

 

Have I done something wrong in installing / applying this update? I saw several references in the forum that this does not prevent spiders from accessing websites, but I am still unsure of what I am doing. I saw that there are some words in the text files that will keep out mozilla and firefox users. Do I need to edit these?

 

I am very new to learning about spiders, robots and the like. Any help is greatly appreciated.

 

Thank you,

Leslie

Share this post


Link to post
Share on other sites
Christine,

 

There is a robots.txt contibution and, unlike spiders.txt, it does not need to be kept updated. The robots.txt file goes at the top level of your site, so that it can be found with the url http://www.example.com/robots.txt for a site example.com Note that you may have to adjust the contents of robots.txt to show the path you use for your store.

Hi

 

I have "prevent spider sessions" set to true. Do I use the newest contribution (2/4/06) & replace my existing file in catalog/includes/spiders.txt? I am assuming this is a full list? This is the first time I have touched this file & it is as default. Once I have done this, do I need to check this contribution for updates every so many weeks & replace the file again?

 

I do not have a robots.txt, so do I create(upload) the file to catalog/ or/? What amendments do | need to do in here, as I am a little confused? I am still amending my site before moving it all up a level.

 

Thanks for your help.

Julie

Share this post


Link to post
Share on other sites

Yes, you replace the existing file includes/spiders.txt with the one in the latest contribution, which is a fill list. I recommend checking once a month or so for updates.

 

robots.txt is a different file for a different purpose. That file goes in the home directory of your web site and is used by search engines. There is a contribution with a suggested robots.txt - the idea is that you want to ask robots to not bother with certain parts of your site, especially those that only an active customer would use (cart, login, my account, write review, etc.)

Share this post


Link to post
Share on other sites

Thank you for this contribution. I discovered it after finding that MSN had a ton of links to my site with session id's. I do have a couple of questions.

 

1. Now that I have updated the spiders.txt, will the MSN links with id's eventually go away?

 

2. Is there any anything I can do in the meantime to avoid customers getting mixed up and viewing each others info?

 

 

Thanks again

Share this post


Link to post
Share on other sites

The MSN links may or may not go away over time. See http://www.oscommerce.com/community/contributions,2819 for a contribution that will remove SIDs coming from search engine links. It is not foolproof, though.

 

One alternative I thought of, but have not tried, is to change the string that osC uses to identify the SID = "osCsid". If you change this to something else ("xyzsid" or whatever strikes your fancy), then all of the indexed links with SIDs will no longer have sessions. The risk is if you have customers currently in the store, they will lose their sessions as soon as this happens.

Share this post


Link to post
Share on other sites

Thanks Steve. I might give that a try in the late hours when traffic is slow. I don't know much about this, so this may be a silly question, but would the links with the "osCsid" still work if I cange to "xyZsid"?

Share this post


Link to post
Share on other sites

i can give you only this from whos online .. i don't have other access ..

 

Active with Cart 01:09:06 Guest 195.250.198.45 20:52:14 22:01:20 /ask_a_question.php?products_id=298&language=It Yes Not Found

Share this post


Link to post
Share on other sites
Yes, you replace the existing file includes/spiders.txt with the one in the latest contribution, which is a fill list. I recommend checking once a month or so for updates.

 

robots.txt is a different file for a different purpose. That file goes in the home directory of your web site and is used by search engines. There is a contribution with a suggested robots.txt - the idea is that you want to ask robots to not bother with certain parts of your site, especially those that only an active customer would use (cart, login, my account, write review, etc.)

Thanks Steve

 

Sorry it has taken a while to reply to your reply.

 

Is this the contribution you would suggest for the robots.txt?

 

http://www.oscommerce.com/community/contri...l/search,robots

 

Thanks

Julie

Share this post


Link to post
Share on other sites
Ok - add the line:

 

interseek

 

I'll add this to my next update.

 

i added interseek .. but Najdi.si bot is still not recognized ... and it is green .. all bots (zyborg, google, yahoo) are red in Who's online?

Share this post


Link to post
Share on other sites

I can't help further unless I know what user agent string is being presented by this bot. According to the research I did, this bot includes the string "interseek" in its UA.

 

I'll comment that if the bot is followng a URL with a session ID, it will keep the session even if listed in spiders.txt.

Share this post


Link to post
Share on other sites
i added interseek .. but Najdi.si bot is still not recognized ... and it is green .. all bots (zyborg, google, yahoo) are red in Who's online?

Gegor,

 

When that bot comes back, click the Customer profile link in whos online (I assume you're using whos online enhanced since you mentioned green vs red). That should give you the user agent. Get that and paste it here for Steve.

 

ed


Answers to osCommerce's most persistent questions! Tips & Tricks | Configuration | Common Problems.

Seek and ye shall find Contributions.

My Contributions

My Blog

Share this post


Link to post
Share on other sites

it says it is a guest .. but i found following DAQ on their homepage ..

 

Q: What is User Agent of Najdi.si spiders?

A: User agent of Najdi.si spiders is Mozilla/5.0 (compatible; Najdi.si/3.1)

 

.. this helps?

 

... btw. Najdi.si is biggest slovenian search engine ... so, it is important :)

Edited by gregy

Share this post


Link to post
Share on other sites

Hello,

I am considering using this contribution, and it seems easy enough to implement. Just so I am clear as to its purpose, I am under the impression that the contribution is designed to prevent spiders from creating SID's, which show up when I am found on a google search. I am currently running the following contributions, hopefully they won't conflict:

1. Ultimate URL

2. Header tags

3. Who's online

 

Am I on the right track?

thanks.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×