stevel Posted June 22, 2007 Author Share Posted June 22, 2007 Check your access log for the accesses by the payment service and look at the user agent string. I was unaware that the presence of java/ was a problem. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
kev@num Posted June 22, 2007 Share Posted June 22, 2007 thanks steve i've have a little search and post back if i find out :) Quote Link to comment Share on other sites More sharing options...
dca Posted July 7, 2007 Share Posted July 7, 2007 Hello, Lately I have the following : http://www.66-195-77-130.static.twtelecom.net/ regularly visiting and adding items to the shopping cart....a headacje really........ I would just like that this user not get the session ID ...... how can I do that ? What do I put in spider.txt ? dca Quote Link to comment Share on other sites More sharing options...
stevel Posted July 7, 2007 Author Share Posted July 7, 2007 Nothing - spiders.txt is not for banning individual users. But what you can do is look for the following code in includes/application_top.php: if ($spider_flag == false) { tep_session_start(); $session_started = true; } Just before this insert: $ip_address = tep_get_ip_address(); if ($ip_address == '66.195.77.130') $spider_flag = true; This will prevent that specific IP from getting a session. If you want to add more, just repeat the "if" line and specify another IP. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Victor Wise Posted July 7, 2007 Share Posted July 7, 2007 Thanks Steve :) I just installed the new spiders.txt file :thumbsup: Quote Best Regards, Victor Wise Link to comment Share on other sites More sharing options...
kunal247 Posted July 9, 2007 Share Posted July 9, 2007 Hi Steve, I have this IP address creating sessions on my site all the time and adding random products on my site. 208.99.195.54 - - [09/Jul/2007:08:48:52 +0100] "GET /interactive-whiteboards-c-60.html?osCsid= HTTP/1.0" 200 46338 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" It used to crawl my sites about 6 months ago and then stopped now it seems to be back again in a big way. Can you help? Kunal Quote Link to comment Share on other sites More sharing options...
kunal247 Posted July 9, 2007 Share Posted July 9, 2007 Thanks Steve, I'll try to sus out what I need to do. I've had some bot type behaviour on my site but it's getting a session ID - there's no user agent but it comes from ip 208.99.195.54 How would I know if this is a bot? It's not someone browsing the store, I'm sure. Thanks Tiger Tiger, I am also getting the same IP address coming up on my site and adding products. It has been doing this for a while now. I am pretty convinced it is not a customer but a bot. Can make out these things by the kind of items it adds to its cart. Kunal Quote Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2007 Author Share Posted July 9, 2007 If it's a bot, it's trying hard to pretend it isn't one. You can't use spiders.txt for this but you can use the IP test in application_top.pho that I suggested earlier. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
kunal247 Posted July 9, 2007 Share Posted July 9, 2007 If it's a bot, it's trying hard to pretend it isn't one. You can't use spiders.txt for this but you can use the IP test in application_top.pho that I suggested earlier. Steve, Thanks for your response. Is there a way to add multiple IP addresses in the format you gave earlier? Also can we some how block unknow IP addresses? or refuse people entering the site if the IP address is unknown. Kunal Quote Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2007 Author Share Posted July 9, 2007 As I suggested earlier, just add another "if" line for another IP. I don't know what you mean by an "unknown" IP. You ALWAYS know the IP of the connection. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
kunal247 Posted July 9, 2007 Share Posted July 9, 2007 (edited) As I suggested earlier, just add another "if" line for another IP. I don't know what you mean by an "unknown" IP. You ALWAYS know the IP of the connection. I have for the last 3 months an unknown IP address roaming on my site which creates sessions. 00:00:00 Guest unknown 17:22:16 17:22:16 Belkin F1U125UKIT - F1U125UKIT (Product) Yes Not Found The above line is from my Who is online and that is what I see. I have checked my access log too but on that it some how does not appear. appreciate any help on this as I cant get rid of it at all. Kunal Edited July 9, 2007 by kunal247 Quote Link to comment Share on other sites More sharing options...
stevel Posted July 9, 2007 Author Share Posted July 9, 2007 I don't know what triggers the "unknown" in that line, but it certainly isn't that the IP address is not known. Perhaps it's not finding a hostname translation, though these are unreliable in many cases. You'll have to study the actual access log to see what you can do about it. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
tigergirl Posted July 26, 2007 Share Posted July 26, 2007 Ok, my error. You have to make TWO changes to tep_create_sort_heading in general.php. Change this: global $PHP_SELF; $sort_prefix = ''; $sort_suffix = ''; if ($sortby) { to this: global $PHP_SELF; global $session_started; $sort_prefix = ''; $sort_suffix = ''; if ($sortby && $session_started) { Hi Steve, I tried this code in includes/functions/general.php but the bots are still sorting products - did I do something wrong? Thanks Tiger Quote I'm feeling lucky today......maybe someone will answer my post! I do try and answer a simple post when I can just to give something back. ------------------------------------------------ PM me? - I'm not for hire Link to comment Share on other sites More sharing options...
stevel Posted July 26, 2007 Author Share Posted July 26, 2007 If they have remembered URLs with sort values, they'll keep using them. But not displaying the sort values will help down the road. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
tigergirl Posted July 27, 2007 Share Posted July 27, 2007 If they have remembered URLs with sort values, they'll keep using them. But not displaying the sort values will help down the road. Thanks Steve, I see, I wish my memory was as good as the bots!! I guess there's no way to wipe their memory? Not sure if this is off topic but I tried to create a site-map after I installed the sort code above. The xml-sitemaps bot concerned must have gotten a SID as he was sorting products and adding items to the cart. The sitemap is then useless as it's got sort-links. Any way to stop him getting a SID? Thanks again Tiger Quote I'm feeling lucky today......maybe someone will answer my post! I do try and answer a simple post when I can just to give something back. ------------------------------------------------ PM me? - I'm not for hire Link to comment Share on other sites More sharing options...
stevel Posted July 27, 2007 Author Share Posted July 27, 2007 There's a contribution "spider session remover" which uses rewrite rules in .htaccess to remove SIDs from incoming spider links, with the disadvantage that you have to name the spiders (Yahoo, Google, etc.) You could extend that to sort links. A drastic thing you can do but one that will kill all existing SIDs spiders have is to change the name of the session ID name string, default is osCsid. This is defined in the session code somewhere (I'm away from my sources.) As for the sort tags, one thing I did for another store was, after I disabled the sort links for spiders, add a new parameter to the URL, such as &nn=1. If the incoming URL had a sort tag and NOT nn, it got a 404 response. (One could also do a 301 permanent redirect, but I have found some spiders to ignore that.) This was custom code I no longer have handy, so I'll have to leave it as an exercise for the reader. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
tigergirl Posted July 29, 2007 Share Posted July 29, 2007 Thanks Steve, I have Spider Session Remover but don't know how to adapt it for sort links. Also, how can I stop site map crawlers getting SIDs? I can't seem to find their User Agent. Thanks again Tiger Quote I'm feeling lucky today......maybe someone will answer my post! I do try and answer a simple post when I can just to give something back. ------------------------------------------------ PM me? - I'm not for hire Link to comment Share on other sites More sharing options...
stevel Posted July 29, 2007 Author Share Posted July 29, 2007 If crawlers are still getting SIDs, then perhaps you have not properly enabled "prevent spider sessions". If you'll give me your store URL I'll check. I'm sorry to say that giving you detailed instructions on eliminating sort links is beyond what I'll be able to do for you. If you're using the Spider Session Remover, the idea is to use the same way it removes "osCsid" for "sort". Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
tigergirl Posted July 30, 2007 Share Posted July 30, 2007 Thanks again, I'll have a look again at the code for the htaccess file. I have set prevent spider sessions to true which works well - just that the ones crawling for site maps seem to get a SID - I'm asssuming that's because they're not listed in the spiders.txt but I don't know their name anyhow. Tiger Quote I'm feeling lucky today......maybe someone will answer my post! I do try and answer a simple post when I can just to give something back. ------------------------------------------------ PM me? - I'm not for hire Link to comment Share on other sites More sharing options...
stevel Posted July 30, 2007 Author Share Posted July 30, 2007 How do you know they're crawling the site map? If you find the entries in your web server access log, there will be a user agent string. If the spider is well-intentioned, it will have a UA string that identifies itself (and isn't empty or pretending to be MSIE, for example.) Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
tigergirl Posted July 31, 2007 Share Posted July 31, 2007 How do you know they're crawling the site map? If you find the entries in your web server access log, there will be a user agent string. If the spider is well-intentioned, it will have a UA string that identifies itself (and isn't empty or pretending to be MSIE, for example.) I am trying to make a site map, so I enter my url and they crawl the site in order to make the site map. Some have no User Agent, they get SIDs, add stuff to cart, see sort links, try to write reviews etc etc etc. Most of the site map generator only allow 500 links and when they do all the thing I mentioned, I always reach the max links allowed but have a site map with all the links I don't want or need. Maybe I'm just using rubbish site map generators? Anyone know of any good ones? Quote I'm feeling lucky today......maybe someone will answer my post! I do try and answer a simple post when I can just to give something back. ------------------------------------------------ PM me? - I'm not for hire Link to comment Share on other sites More sharing options...
stevel Posted July 31, 2007 Author Share Posted July 31, 2007 Oh, I get it now. The sitemap generator ought to supply a distinct user agent. If it doesn't, find another. However, I find that the All Products Page contribution works just fine for making it easy for spiders to walk the site. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
tigergirl Posted August 2, 2007 Share Posted August 2, 2007 I didn'y know about "all products" so I'll take a look, thanks for the tip. Tiger Quote I'm feeling lucky today......maybe someone will answer my post! I do try and answer a simple post when I can just to give something back. ------------------------------------------------ PM me? - I'm not for hire Link to comment Share on other sites More sharing options...
imusorka Posted August 18, 2007 Share Posted August 18, 2007 Hi Steve. Thanks for the file. Could you please clarify how much slower is the page load with the "large" file? Quote Link to comment Share on other sites More sharing options...
stevel Posted August 18, 2007 Author Share Posted August 18, 2007 Not really. I just figure it's a lot more data to read and loop through on every page access. Quote Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.