mugitty Posted March 7, 2003 Share Posted March 7, 2003 Yeah, I know, you can't find that thread here - it's at: http://www.oscommerce.com/forums/viewtopic.php?t=36577 (forum rules - no questions in Tips & Tricks) I note in Burt's spider string 2 entries that I just want to ask if they are correct, as 1 has a space and 1 an @: appie 1.1[email protected] is the above how they should appear in the code? And a clarification, if you don't mind... Is the ONLY thing being used for SID control then the block below with the spiders' string added to html_output.php? Nothing at all from the other SID Killer code or in any other files? $spiders = array("almaden.ibm.com", "appie 1.1", "asterias2.0", "augurfind", "baiduspider", "bannana_bot", "bdcindexer", "docomo", "fast-webcrawler", "frooglebot", "geobot", "googlebot", "henrythemiragorobot", "infoseek", "sidewinder", "lachesis", "mercator", "moget/1.0", "nationaldirectory-webspider", "naverrobot", "ncsa beta", "netresearchserver", "ng/1.0", "osis-project", "polybot", "pompos", "scooter", "seventwentyfour", "slurp/si", "[email protected]", "steeler/1.3", "szukacz", "teoma", "turnitinbot", "vagabondo", "w3c_validator", "zao/0", "zyborg/1.0"); foreach($spiders as $Val) { if (ereg($Val, strtolower($_SERVER["HTTP_USER_AGENT"]))) { // Edit out one of these as necessary depending upon your version of html_output.php //$sess = NULL; $sid = NULL; break; } } Thanks :!: ... if you want to REALLY see something that doesn't set up right out of the box without some tweaking, try being a Foster Parent! Link to comment Share on other sites More sharing options...
cpv1st Posted March 7, 2003 Share Posted March 7, 2003 yes this would be helpful information...i have searched the archives and there is not really a definitive approach to this matter....i see all different types of scripts...? which one is the best.....i see bad and good on ians sid killer.... I am just a newbee.... Link to comment Share on other sites More sharing options...
mugitty Posted March 7, 2003 Author Share Posted March 7, 2003 ...sorry, 1 other question... Where can one go to check if this is working? I notice that if I check it at http://www.searchengineworld.com/cgi-bin/s.../sim_spider.cgi that the SIDs show - is there another place I can check? ... if you want to REALLY see something that doesn't set up right out of the box without some tweaking, try being a Foster Parent! Link to comment Share on other sites More sharing options...
cpv1st Posted March 7, 2003 Share Posted March 7, 2003 where can you go to check how many time a particular spider is hitting your website and where can you go to see all the sessions I am just a newbee.... Link to comment Share on other sites More sharing options...
mugitty Posted March 7, 2003 Author Share Posted March 7, 2003 OK, found a sim that seems to actually sim one of the spiders in the list: http://www.webconfs.com/search-engine-spid...r-simulator.php Without Burt's code - all kinds of SIDs appearing With the code - No SIDs appear I think I'm a happy camper :?: :) ... if you want to REALLY see something that doesn't set up right out of the box without some tweaking, try being a Foster Parent! Link to comment Share on other sites More sharing options...
cpv1st Posted March 7, 2003 Share Posted March 7, 2003 i am not to sure who burt is .....did you just insert some code or was it code and ians sid killer......thanks mugitty for the response in advance...you are always alot of help... I am just a newbee.... Link to comment Share on other sites More sharing options...
mugitty Posted March 7, 2003 Author Share Posted March 7, 2003 My understanding (but I'm waiting for someone to confirm it) is that there is NONE of Ian's SID Killer code used in this case and the ONLY change is in catalog/includes/functions/html_output.php (around line 54 in mine) after // Add the session ID when moving from HTTP and HTTPS servers or when SID is defined if ( (ENABLE_SSL == true ) && ($connection == 'SSL') && ($add_session_id == true) ) { $sid = tep_session_name() . '=' . tep_session_id(); } elseif ( ($add_session_id == true) && (tep_not_null(SID)) ) { $sid = SID; } add this: // Add spider stopper code $spiders = array("almaden.ibm.com", "appie 1.1", "asterias2.0", "augurfind", "baiduspider", "bannana_bot", "bdcindexer", "docomo", "fast-webcrawler", "frooglebot", "geobot", "googlebot", "henrythemiragorobot", "infoseek", "sidewinder", "lachesis", "mercator", "moget/1.0", "nationaldirectory-webspider", "naverrobot", "ncsa beta", "netresearchserver", "ng/1.0", "osis-project", "polybot", "pompos", "scooter", "seventwentyfour", "slurp/si", "[email protected]", "steeler/1.3", "szukacz", "teoma", "turnitinbot", "vagabondo", "w3c_validator", "zao/0", "zyborg/1.0"); foreach($spiders as $Val) { if (ereg($Val, strtolower($_SERVER["HTTP_USER_AGENT"]))) { // Edit out one of these as necessary depending upon your version of html_output.php //$sess = NULL; $sid = NULL; break; } } // End spider stopper code ... if you want to REALLY see something that doesn't set up right out of the box without some tweaking, try being a Foster Parent! Link to comment Share on other sites More sharing options...
burt Posted March 7, 2003 Share Posted March 7, 2003 My understanding (but I'm waiting for someone to confirm it) is that there is NONE of Ian's SID Killer code used in this case and the ONLY change is in catalog/includes/functions/html_output.php (around line 54 in mine) Correct. [EDIT: no commercial postings allowed] And btw it is OK to ask questions in the Tips and Tricks Forum. See http://www.oscommerce.com/forums/viewtopic.php?t=33668 for more on that. Link to comment Share on other sites More sharing options...
cpv1st Posted March 7, 2003 Share Posted March 7, 2003 burt did you see this post...it suggest modifying eregi like this // Add more Spiders as you find them. MAKE SURE THEY ARE LOWER CASE! $spiders = array("googlebot", "teomaagent", "zyborg", "gulliver", "architext", "fast-WebCrawler", "slurp", "ask jeeves", "ia_archiver", "scooter", "mercator", "crawler@fast", "crawler", "infoseek sidewinder", "lycos_spider", "fluffy the spider", "ultraseek", "mantraagent", "moget", "t-h-u-n-d-e-r-s-t-o-n-e", "muscatferret", "voilabot", "sleek spider", "kit_fireball", "webcrawler"); // get useragent and force to lowercase just once $useragent = strtolower(getenv("HTTP_USER_AGENT")); foreach($spiders as $Val) { if (!(strpos($Val, $useragent) === false)) { // found a spider, kill the sid/sess // Edit out one of these as necessary depending upon your version of html_output.php $sess = NULL; // $sid = NULL; break; } } http://www.oscommerce.com/forums/viewtopic.php...+win2k&start=10 i am getting there i never knew this was such a problem...thanks burt and mugitty... I am just a newbee.... Link to comment Share on other sites More sharing options...
Ajeh Posted March 7, 2003 Share Posted March 7, 2003 Burt ... I love you! :onfire: I needed a fast, quick fix as google has knocked a 5+ Gig bandwidth hit on a site in an hour or twos time the last 2 days. This is nice, clean and takes a second to paste into various snapshots and seems to work happily. Now the real test ... open a few more doors that the spider is allowed to touch and make sure the touch then get out ... 8) Link to comment Share on other sites More sharing options...
burt Posted March 7, 2003 Share Posted March 7, 2003 Burt ... I love you! :onfire:Woohoo ! :oops: :twisted: This is nice, clean and takes a second to paste into various snapshots and seems to work happily.I have this working on an MS1, and on various 2.2 (pre and post Nov) snapshots all the way back to June 2002, and I have soak tested this mod for hours on end. I've not seen any increase in server load, even when mimicing Google more than 100 times (at once)... Link to comment Share on other sites More sharing options...
cpv1st Posted March 7, 2003 Share Posted March 7, 2003 what did you think about harolds security proposal listed here ...its sound goo....but i think i might just go with burt on this one http://www.oscommerce.com/forums/viewtopic.php...der=asc&start=0 I am just a newbee.... Link to comment Share on other sites More sharing options...
mattice Posted March 7, 2003 Share Posted March 7, 2003 Anybody with a lot of time on their hands? http://www.robotstxt.org/wc/active/html/index.html "Politics is the art of preventing people from taking part in affairs which properly concern them" Link to comment Share on other sites More sharing options...
Ajeh Posted March 7, 2003 Share Posted March 7, 2003 Anybody with a lot of time on their hands?http://www.robotstxt.org/wc/active/html/index.html I was playing there the other night really hard trying to gleen some new tips about the robots.txt file. Very handy site. Link to comment Share on other sites More sharing options...
cpv1st Posted March 7, 2003 Share Posted March 7, 2003 Ajeh...so did you use burts solution or the robot.txt solution or both..i am getting ready to go live and i want to get spidered but i dont want google to get stuck and i dont want them to ignore me either...... wasnt burts fix different from the robot.txt idea I am just a newbee.... Link to comment Share on other sites More sharing options...
Guest Posted March 7, 2003 Share Posted March 7, 2003 cpv1st - you can use the visitors contribution to see how many times each user and robot has visited your site http://www.oscommerce.com/community/contributions,949 Link to comment Share on other sites More sharing options...
Ajeh Posted March 7, 2003 Share Posted March 7, 2003 I have tested the code with both. When I have time to babysit the site that got so slammed I will remove the robots.txt file and see how it handles google. Unfortunately you don't know when the creatures are coming ... :shock: Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 7, 2003 Share Posted March 7, 2003 There is approximatly 2 weeks before the next google crawl. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
Ajeh Posted March 8, 2003 Share Posted March 8, 2003 Anyone happen to know if this is a spider? :shock: 61.117.202.18 I never was good at resolving these things if I cannot do a quick /dns on IRC ... :D Link to comment Share on other sites More sharing options...
toolcrazy Posted March 8, 2003 Share Posted March 8, 2003 This is what I came up with: http://gnomovision.com/ Via this http://61.117.202.18/ Steve ------------------------- Link to comment Share on other sites More sharing options...
Guest Posted March 8, 2003 Share Posted March 8, 2003 well thats an interesting site :shock: Link to comment Share on other sites More sharing options...
Ajeh Posted March 8, 2003 Share Posted March 8, 2003 I happened to be peeking at the whois on a client's computer due to this spider pounding he has been receiving and there were 20+ of these entries. I do not know was that just a bug in the whois ... this is an older snapshot ... or was that a hit by something. I am not finding a major slam on the site, but there were so many entries from that address I was not sure what to think. Link to comment Share on other sites More sharing options...
mugitty Posted March 8, 2003 Author Share Posted March 8, 2003 From the APNIC Whois Database http://www.apnic.net/apnic-bin/whois2.pl inetnum: 61.117.202.16 - 61.117.202.23netname: CWT-NET descr: BirdView Co.,LTD country: JP admin-c: TA802JP tech-c: MF1412JP remarks: This information has been partially mirrored by APNIC from remarks: JPNIC. To obtain more specific information, please use the remarks: JPNIC whois server at whois.nic.ad.jp. (This defaults to remarks: Japanese output, use the /e switch for English output) changed: [email protected] 20021107 remarks: This information has been partially mirrored by APNIC from remarks: JPNIC. To obtain more specific information, please use the remarks: JPNIC whois server at whois.nic.ad.jp. (This defaults to remarks: Japanese output, use the /e switch for English output) changed: [email protected] 20030305 source: JPNIC ... if you want to REALLY see something that doesn't set up right out of the box without some tweaking, try being a Foster Parent! Link to comment Share on other sites More sharing options...
Ajeh Posted March 8, 2003 Share Posted March 8, 2003 So perhaps it's just a whois bug or a lamer? :shock: Link to comment Share on other sites More sharing options...
Guest Posted March 8, 2003 Share Posted March 8, 2003 from that webpage i would opt for lamer...... Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.