Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

RE: New Spider


mugitty

Recommended Posts

Yeah, I know, you can't find that thread here - it's at:

http://www.oscommerce.com/forums/viewtopic.php?t=36577

(forum rules - no questions in Tips & Tricks)

 

I note in Burt's spider string 2 entries that I just want to ask if they are correct, as 1 has a space and 1 an @:

 

is the above how they should appear in the code?

 

And a clarification, if you don't mind...

 

Is the ONLY thing being used for SID control then the block below with the spiders' string added to html_output.php? Nothing at all from the other SID Killer code or in any other files?

$spiders = array("almaden.ibm.com", "appie 1.1", "asterias2.0", "augurfind", "baiduspider", "bannana_bot", "bdcindexer", "docomo", "fast-webcrawler", "frooglebot", "geobot", "googlebot", "henrythemiragorobot", "infoseek", "sidewinder", "lachesis", "mercator", "moget/1.0", "nationaldirectory-webspider", "naverrobot", "ncsa beta", "netresearchserver", "ng/1.0", "osis-project", "polybot", "pompos", "scooter", "seventwentyfour", "slurp/si", "[email protected]", "steeler/1.3", "szukacz", "teoma", "turnitinbot", "vagabondo", "w3c_validator", "zao/0", "zyborg/1.0"); 

    foreach($spiders as $Val) { 

        if (ereg($Val, strtolower($_SERVER["HTTP_USER_AGENT"]))) { 

            // Edit out one of these as necessary depending upon your version of html_output.php 

            //$sess = NULL; 

            $sid = NULL; 

            break; 

        } 

    }

 

Thanks :!:

... if you want to REALLY see something that doesn't set up right out of the box without some tweaking,

try being a Foster Parent!

Link to comment
Share on other sites

  • Replies 72
  • Created
  • Last Reply

yes this would be helpful information...i have searched the archives and there is not really a definitive approach to this matter....i see all different types of scripts...? which one is the best.....i see bad and good on ians sid killer....

I am just a newbee....

Link to comment
Share on other sites

...sorry, 1 other question...

 

Where can one go to check if this is working? I notice that if I check it at http://www.searchengineworld.com/cgi-bin/s.../sim_spider.cgi that the SIDs show - is there another place I can check?

... if you want to REALLY see something that doesn't set up right out of the box without some tweaking,

try being a Foster Parent!

Link to comment
Share on other sites

OK, found a sim that seems to actually sim one of the spiders in the list:

 

http://www.webconfs.com/search-engine-spid...r-simulator.php

 

Without Burt's code - all kinds of SIDs appearing

With the code - No SIDs appear

 

I think I'm a happy camper :?: :)

... if you want to REALLY see something that doesn't set up right out of the box without some tweaking,

try being a Foster Parent!

Link to comment
Share on other sites

i am not to sure who burt is .....did you just insert some code or was it code and ians sid killer......thanks mugitty for the response in advance...you are always alot of help...

I am just a newbee....

Link to comment
Share on other sites

My understanding (but I'm waiting for someone to confirm it) is that there is NONE of Ian's SID Killer code used in this case and the ONLY change is in catalog/includes/functions/html_output.php (around line 54 in mine)

 

after

// Add the session ID when moving from HTTP and HTTPS servers or when SID is defined

   if ( (ENABLE_SSL == true ) && ($connection == 'SSL') && ($add_session_id == true) ) {

     $sid = tep_session_name() . '=' . tep_session_id();

   } elseif ( ($add_session_id == true) && (tep_not_null(SID)) ) {

     $sid = SID;

   }

add this:

// Add spider stopper code

   $spiders = array("almaden.ibm.com", "appie 1.1", "asterias2.0", "augurfind", "baiduspider", "bannana_bot", "bdcindexer", "docomo", "fast-webcrawler", "frooglebot", "geobot", "googlebot", "henrythemiragorobot", "infoseek", "sidewinder", "lachesis", "mercator", "moget/1.0", "nationaldirectory-webspider", "naverrobot", "ncsa beta", "netresearchserver", "ng/1.0", "osis-project", "polybot", "pompos", "scooter", "seventwentyfour", "slurp/si", "[email protected]", "steeler/1.3", "szukacz", "teoma", "turnitinbot", "vagabondo", "w3c_validator", "zao/0", "zyborg/1.0");

    foreach($spiders as $Val) {

        if (ereg($Val, strtolower($_SERVER["HTTP_USER_AGENT"]))) {

            // Edit out one of these as necessary depending upon your version of html_output.php

            //$sess = NULL;

            $sid = NULL;

            break;

        }

    }

// End spider stopper code

... if you want to REALLY see something that doesn't set up right out of the box without some tweaking,

try being a Foster Parent!

Link to comment
Share on other sites

My understanding (but I'm waiting for someone to confirm it) is that there is NONE of Ian's SID Killer code used in this case and the ONLY change is in catalog/includes/functions/html_output.php (around line 54 in mine)

 

Correct. [EDIT: no commercial postings allowed]

 

And btw it is OK to ask questions in the Tips and Tricks Forum. See http://www.oscommerce.com/forums/viewtopic.php?t=33668 for more on that.

Link to comment
Share on other sites

burt did you see this post...it suggest modifying eregi like this

 

// Add more Spiders as you find them.  MAKE SURE THEY ARE LOWER CASE! 

$spiders = array("googlebot", "teomaagent", "zyborg", "gulliver", "architext", "fast-WebCrawler", 

"slurp", "ask jeeves", "ia_archiver", "scooter", "mercator", "crawler@fast", 

"crawler", "infoseek sidewinder", "lycos_spider", "fluffy the spider", "ultraseek", 

"mantraagent", "moget", "t-h-u-n-d-e-r-s-t-o-n-e", "muscatferret", "voilabot", 

"sleek spider", "kit_fireball", "webcrawler"); 



// get useragent and force to lowercase just once 

$useragent = strtolower(getenv("HTTP_USER_AGENT")); 



foreach($spiders as $Val) { 

   if (!(strpos($Val, $useragent) === false)) { 

     // found a spider, kill the sid/sess 

     // Edit out one of these as necessary depending upon your version of html_output.php 

     $sess = NULL; 

     // $sid = NULL; 

     break; 

   } 

}

 

http://www.oscommerce.com/forums/viewtopic.php...+win2k&start=10

 

i am getting there i never knew this was such a problem...thanks burt and mugitty...

I am just a newbee....

Link to comment
Share on other sites

Burt ... I love you! :onfire:

 

I needed a fast, quick fix as google has knocked a 5+ Gig bandwidth hit on a site in an hour or twos time the last 2 days.

 

This is nice, clean and takes a second to paste into various snapshots and seems to work happily.

 

Now the real test ... open a few more doors that the spider is allowed to touch and make sure the touch then get out ... 8)

Link to comment
Share on other sites

Burt ... I love you! :onfire:
Woohoo ! :oops: :twisted:
This is nice, clean and takes a second to paste into various snapshots and seems to work happily.
I have this working on an MS1, and on various 2.2 (pre and post Nov) snapshots all the way back to June 2002, and I have soak tested this mod for hours on end.

 

I've not seen any increase in server load, even when mimicing Google more than 100 times (at once)...

Link to comment
Share on other sites

Ajeh...so did you use burts solution or the robot.txt solution or both..i am getting ready to go live and i want to get spidered but i dont want google to get stuck and i dont want them to ignore me either......

 

wasnt burts fix different from the robot.txt idea

I am just a newbee....

Link to comment
Share on other sites

I have tested the code with both.

 

When I have time to babysit the site that got so slammed I will remove the robots.txt file and see how it handles google.

 

Unfortunately you don't know when the creatures are coming ... :shock:

Link to comment
Share on other sites

There is approximatly 2 weeks before the next google crawl.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Anyone happen to know if this is a spider? :shock:

 

61.117.202.18

 

I never was good at resolving these things if I cannot do a quick /dns on IRC ... :D

Link to comment
Share on other sites

I happened to be peeking at the whois on a client's computer due to this spider pounding he has been receiving and there were 20+ of these entries.

 

I do not know was that just a bug in the whois ... this is an older snapshot ... or was that a hit by something.

 

I am not finding a major slam on the site, but there were so many entries from that address I was not sure what to think.

Link to comment
Share on other sites

From the APNIC Whois Database

http://www.apnic.net/apnic-bin/whois2.pl

 

inetnum: 61.117.202.16 - 61.117.202.23

netname: CWT-NET

descr: BirdView Co.,LTD

country: JP

admin-c: TA802JP

tech-c: MF1412JP

remarks: This information has been partially mirrored by APNIC from

remarks: JPNIC. To obtain more specific information, please use the

remarks: JPNIC whois server at whois.nic.ad.jp. (This defaults to

remarks: Japanese output, use the /e switch for English output)

changed: [email protected] 20021107

remarks: This information has been partially mirrored by APNIC from

remarks: JPNIC. To obtain more specific information, please use the

remarks: JPNIC whois server at whois.nic.ad.jp. (This defaults to

remarks: Japanese output, use the /e switch for English output)

changed: [email protected] 20030305

source: JPNIC

... if you want to REALLY see something that doesn't set up right out of the box without some tweaking,

try being a Foster Parent!

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...