Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

stevel

Pioneers
  • Posts

    2,821
  • Joined

  • Last visited

Everything posted by stevel

  1. Well, that's a bit odd. While the hostname has msnbot in it, the user agent string just looks like MSIE. There's no way to detect that as a bot.
  2. That's not the user agent string. You want a line that looks something like this: 220.181.7.44 - - [12/Apr/2010:02:32:03 -0400] "GET /robots.txt HTTP/1.1" 200 451 www.example.com "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)" "-" See that string that starts "Baiduspider"? That's the user agent. If you're using awstats, you should be able to locate the access log. If you want to block Yandex entirely - and posts I have read suggest that is a good idea, add this to your robots.txt: User-agent: Yandex Disallow: /
  3. I would remove as13448 from spiders.txt. You can use robots.txt to slow down a spider - read the link I posted. AS13448.com is operated by a company called Websense, a company that sells web filtering devices and services. Can you show me a line from your server log indicating an as13448.com IP address?
  4. No - you have to look at the user agent string from the server log and see what it has there. It may not have anything you can use to identify it if it is not a well-behaved bot. Is it causing trouble for you? Yes, you will still see the bots on Who's Online. From experience, I'd say to NOT trust what that says for whether or not the visitor has a session.
  5. spiders.txt does not block search engines from your site. What it does is prevent them from creating sessions so that they are unable to do "add to cart", go places only humans can go, and, most importantly, it prevents URLs in their index from containing session IDs. When a "bot" visits your site, it supplies a user agent string that identifies it (usually). Since a lot of bots have the string "ebot" in their UA strings, this is used to detect all of them. Googlebot is just one. Similarly, "nbot" detects MSNbot and any other with "nbot" in the UA string. These bots are not bad - in fact they are good - you want your site indexed. You just don't want them following "add to cart" links and leaving session IDs in URLs. If you actually want to block a bot, the first thing is to add an entry to robots.txt. All well-behaved bots will honor this. See this Wikipedia article for more info. I don't know if Yandex honors this - it probably does. You may have to visit its web site to see what to put in robots.txt.
  6. You will need to ask your web host provider how to get at access logs. In general I don't add "validators" to the list because they usually don't go beyond the home page. There is a non-zero cost to each additional entry. The primary purpose of spiders.txt is to prevent session IDs from appearing in search index results, and secondarily to prevent bots from following "add to cart" links. Validators don't tend to show this problem.
  7. A blank page usually indicates a PHP syntax error. You may want to edit your php.ini file to permit PHP errors to be displayed on the page - offhand I don't recall what that setting is, but I'm sure a web search can find it. You may also want to check your server error logs to see if anything is reported there. I see lots of people are having difficulties with the AJAX-modified version of my contribution. I can't really help with those.
  8. Well, yes - if you want to use spiders.txt at all, then you have to set "Prevent Spider Sessions" to True. The other settings have other purposes, though the settings you list here are what I would use.
  9. If you want this all the time, it's straightforward to add a test for your IP to application_top.php that deletes the session if present.
  10. spiders.txt doesn't do anything with IPs - it looks at the "user agent" string. What I do to test it is install the "User Agent Switcher" add-on for Firefox, then select Googlebot as my new user agent.
  11. Someone familiar with the AJAX version will have to help here.
  12. Yes - this contribution, as originally written, allows for multiple countries. When the customer changes the country using the dropdown for it, the page refreshes and the state field becomes a dropdown if zones are defined, or a text field if not. The AJAX version does this without a refresh, but I did not write that and am not familiar with it.
  13. Ok. The line: $country="223"; is forcing the country to be US. Try just removing that (or commenting it out) and see what happens. There should be code elsewhere for doing a pull_down list of countries to allow selecting the country - that may have been removed or commented out. Go look at the original edited sources from the contribution and compare to what you have.
  14. I tested your site and spiders.txt is working correctly. What may have happened is that Google indexed your site before you enabled "Prevent Spider Sessions" and it indexed links with session IDs. Unfortunately, if the link comes in with an ID, spiders.txt is skipped and it recreates the session. To fix that there is a separate add-on, Spider Session Destroyer (I think), that requires changes to the .htaccess file. Even that will take quite a while to purge session IDs from the index. Another approach you can take is to change the name of the session tag from oscSid to something else - that is done in includes/functions/sessions.php
  15. I guess that got removed from PWA. I do have it working with PWA but I did this years ago.
  16. Kira, The code you show doesn't look like any version of this contribution I have seen. The version I developed was last posted as "1.4.1". It is what I would recommend you start with. It is a full version, though the edits are based on 2.2MS2. The later RC is not a lot different in the places where the edits are made.
  17. I don't know how it tells that there is a session for that IP - I have not looked at the code. Since there is no sID in the URL, msnbot would have to be setting a cookie and I very much doubt that. If you don't see msnbot filling up a cart, I would just ignore this.
  18. I don't trust that contribution. The only thing I would consider proof is to see either a session ID in the URL in the second page visited or a visit to a page that requires a session.
  19. Ken, What evidence do you have that msnbot is getting a session? I don't see that in your log.
  20. If you see FILENAME_PROTECTION, you did not edit admin/includes/filenames.php As noted earlier in this thread, some web hosts do not allow PHP code to do the chmod to protect the file. If that is the case, you'll have to find another way to do the protection.
  21. Fiana, in order to determine what the problem is, we would need to see the record from your web server log for the users in question - this will have the user agent string that is used to search spiders.txt. It did come to my attention yesterday that the string "ng/" was causing some legitimate web browsers to be tagged as spiders, so I have just uploaded an update that removes that string. shopgrl, how do you know that it is msnbot and that it is getting session IDs? Can you show me lines from your access log that show this? I'll note that if msnbot already has a session ID and includes it in the URL, it will keep it unless you take additional actions.
  22. Petre, I have integrated the contribution with sites running RC2a, but haven't updated the instructions. The basic concepts are unchanged.
  23. I wish this forum didn't keep expiring my subscriptions. Andrei, you want spiders.txt so that Google and Yahoo can index your site without picking up session IDs, which then get added to the search results. It also prevents search engines from following "add to cart" links and filling up carts with items. It does not prevent any search engine from indexing your site - it helps it do so efficiently. Julie, sorry I didn't see your post until now. If you're still having trouble, post again. I just uploaded a new update, but only one new string.
  24. As you say, the string "nbot" should pick up msnbot, and it does in my tests. I can't speak for those other mods.
  25. Ok, thanks. Odd that I can't find this in lists or discussions of known spiders.
×
×
  • Create New...