henri Posted March 29, 2003 Share Posted March 29, 2003 http://www.robotstxt.org/wc/active/html/slurp.html I would try using 'slurp' but this should be in my contribution. Link to comment Share on other sites More sharing options...
Guest Posted March 29, 2003 Share Posted March 29, 2003 i am using both spider killers - yours and the html_output version - neither one seems to be getting inktomi - will check that link thanks Link to comment Share on other sites More sharing options...
Guest Posted March 29, 2003 Share Posted March 29, 2003 this is the list I am using: $spiders = array("almaden.ibm.com", "appie", "arachnophilia", "arale", "inktomi", "araneo", "architext", "aretha", "ariadne", "arks", "aspider", "atn", "atomz", "auresys", "backrub", "bigbrother", "bjaaland", "blackwidow", "asterias2.0", "ahoy", "AlkalineBOT", "Anthill", "augurfind", "baiduspider", "blindekuh", "Bloodhound", "Ukonline", "borg-bot", "brightnet", "bspider", "cactvschemistryspider", "calif", "cassandra", "cgireader", "checkbot", "christcrawler", "churl", "cienciaficcion", "cmc", "Collective", "combine", "conceptbot", "CoolBot", "cosmos", "cruiser", "cusco", "cyberspyder", "deweb", "dienstspider", "digger", "diibot", "directhit", "dnabot", "download_express", "dragonbot", "dwcp", "e-collector", "ebiness", "eit", "elfinbot", "emacs", "emcspider", "esther", "fastcrawler", "roadrunner", "bannana_bot", "bdcindexer", "docomo", "fast-webcrawler", "frooglebot", "geobot", "googlebot", "henrythemiragorobot", "infoseek", "sidewinder", "lachesis", "mercator", "moget/1.0", "nationaldirectory-webspider", "naverrobot", "ncsa beta", "netresearchserver", "ng/1.0", "osis-project", "polybot", "pompos", "scooter", "inktomisearch", "seventwentyfour", "slurp/si", "slurp", "[email protected]", "steeler/1.3", "szukacz", "teoma", "turnitinbot", "vagabondo", "zao/0", "zyborg/1.0", "semanticdiscovery/0.1", "an-zyborg-g01.looksmart.com"); EDITED: to fit the screen. Thanks, Linda Link to comment Share on other sites More sharing options...
henri Posted March 29, 2003 Share Posted March 29, 2003 You can also bann the IP, if you have it (works similar). perhaps they have an other referer now. Link to comment Share on other sites More sharing options...
drakonan Posted April 11, 2003 Share Posted April 11, 2003 What makes Ian's Spider killer in theory better than Burt's? - - - - Sometimes, ignorance is bliss. Link to comment Share on other sites More sharing options...
Ajeh Posted April 11, 2003 Share Posted April 11, 2003 In theory, Burt's is better than Ian's ... :shock: However, Ian or Burt can better explain the differences than I can. There were some situations where Ian's caused irratic behavior. Link to comment Share on other sites More sharing options...
Ajeh Posted April 11, 2003 Share Posted April 11, 2003 Note: the newer snapshots include a spider killer. Link to comment Share on other sites More sharing options...
Guest Posted April 11, 2003 Share Posted April 11, 2003 Ian's has a problem under https connections, where Burts does not 8) Ian is aware of the issue Link to comment Share on other sites More sharing options...
Guest Posted April 11, 2003 Share Posted April 11, 2003 Ian's has a problem under https connections, where Burts does not 8) Ian is aware of the issue Link to comment Share on other sites More sharing options...
drakonan Posted April 11, 2003 Share Posted April 11, 2003 Oh ok, I thought I'd read somewhere that while Ian's in theory was better Burt's doesn't have that aforementioned prob... What is this talk about user agents anyway? I vaguely understand the concept, each spider sends it the the web server at the time of web page request? Does everyone send a user agent? Could we just make a list of user agents that people use, therefore blocking anything else that doesn't use these "good" user agents? (I tried looking up in google exactly how it worked but wasn't very successful) Thanks - - - - Sometimes, ignorance is bliss. Link to comment Share on other sites More sharing options...
Ajeh Posted April 11, 2003 Share Posted April 11, 2003 Ian's has a problem under https connections, where Burts does not 8) Ian is aware of the issue Thanks, Jeff ... I knew it was something along those lines, but I hate to state things incorrectly. 8) Link to comment Share on other sites More sharing options...
Guest Posted April 11, 2003 Share Posted April 11, 2003 a 'user agent' is simply a variable that tells you what the 'user' is, and search engine spiders show up as user agent = robot so they are easy to spot there is a visitor status contribution that shows Id. + ?Last Click?- + ?Access?- + ?IP Address?- + ?Browser Language?- + ?Site Language?- + ?Entry URI?- + ?Referer?- the 'browser/language' field uses the user agent variable to display info on the user Link to comment Share on other sites More sharing options...
drakonan Posted April 11, 2003 Share Posted April 11, 2003 So there are too many "good" user agents? I was thinking maybe one could pass parts that would be the users for sure, you know, all the browsers that are compatible with osCommerce. As an example if some browser gave off the mozilla code instead of say googlebot, than allow session ids. What about storing the session ID in a mysql variable. Does that remove it from the hyperlink? and Linda, you NEVER sleep. - - - - Sometimes, ignorance is bliss. Link to comment Share on other sites More sharing options...
Ajeh Posted April 11, 2003 Share Posted April 11, 2003 I am trying to get out the Attributes Sorter and Copier and Quantity Controller so my hours are rather late, to say the least. Link to comment Share on other sites More sharing options...
chfields Posted April 11, 2003 Share Posted April 11, 2003 I am using burts sid killer. I tried the spider simulator mentioned earlier in the thread and find it only shows the text of theindex.html page. Now I do have an index.html page as a front door to my index.php page. Now when I use the URL mrsfieldsgoodies.com/index.php, it comes up with the text of the page. I have allprods, isn't that supposed to be what the spider "sees"? This is the results I get when using mrsfieldsgoodies.com/index.php Spidered Text : Mrs Fields Goodies Top ? Catalog My Profile | Cart Contents | Checkout Categories African/African American HeritagBirds->Business and TravelCandles->Celebrate AmericaCherubs and ChildrenCountry DecorDecorative ClocksDevine InspirationDistinctive Oil BurnersDoll CollectionDolphin PlatesFar Eastern treasuresFashion WatchesFine JewelryFlowers and VasesGarden DecorGlass Creations->Glow in the DarkGolf NoveltiesGone FishinHome Decor->Hong TzeIncense and MoreJewlery Boxes and Cork SculptureKnifes and SwordsLiberty BronzeMajolica StyleMandarin IvoryMedieval LegendsMetal WorksMiniaturesMirrorsNative American HeritageNatures BeautyNautical NoveltiesNight LightsOcean AcrobaticsPhoto FramesToys and GamesUnicorn FantasyUnique MusicalsWater FountainsWhimsical WildlifeWindchimesWorks of Art Products Rooster Wall Plaque$9.95 Bill Blass Luggage Set$349.95 Clear Glass Carved Elephant$32.95 Alabastrite Praying Angel Photo Frame$12.95 Wood Candleholder with Tealights$14.95 Stained Glass Butterfly Wind Chime$12.95 Play-Doh on Keychain$4.99 Spun Glass Angel With Gold Wings$19.95 Earthworm Cat and Fishbowl With Fish$8.95 Patchwork Elephant - American Flag$12.95 Scented Pillar Designer Candle - Ginger & Lily$9.95 10K - Gold Lady's Sapphire Diamond Ring$49.95 Upcoming Products Date Expected Wood Candleholder with Tealights 06/14/2003 Ebony-Look African Mask 05/18/2003 9-Piece Ceramic Mini Cups and Saucers Set 05/10/2003 Spun Glass Sail Boat With Blue Base 05/08/2003 10-Piece Porcelain Mini Tea Set 05/05/2003 Distressed White Metal Chandelier Candle Holder 05/02/2003 Angel Capiz Tea Light Holder 04/30/2003 18" Porcelain Victorian Doll - Rebecca 04/29/2003 18" Porcelain Doll - Maria Isabel 04/29/2003 12-Function Camping Knife 04/29/2003 Login Here E-Mail address: Password: Spidered Links : http://mrsfieldsgoodies.com/index.php http://mrsfieldsgoodies.com http://mrsfieldsgoodies.com/index.php https://host42.ipowerweb.com/~mrsfield//account.php http://mrsfieldsgoodies.com/shopping_cart.php https://host42.ipowerweb.com/~mrsfield//che...ut_shipping.php http://mrsfieldsgoodies.com/index.php/cPath/58 http://mrsfieldsgoodies.com/index.php/cPath/55 http://mrsfieldsgoodies.com/index.php/cPath/34 http://mrsfieldsgoodies.com/index.php/cPath/72 http://mrsfieldsgoodies.com/index.php/cPath/42 http://mrsfieldsgoodies.com/index.php/cPath/64 http://mrsfieldsgoodies.com/index.php/cPath/67 http://mrsfieldsgoodies.com/index.php/cPath/38 http://mrsfieldsgoodies.com/index.php/cPath/66 http://mrsfieldsgoodies.com/index.php/cPath/21 http://mrsfieldsgoodies.com/index.php/cPath/60 http://mrsfieldsgoodies.com/index.php/cPath/78 http://mrsfieldsgoodies.com/index.php/cPath/48 http://mrsfieldsgoodies.com/index.php/cPath/26 http://mrsfieldsgoodies.com/index.php/cPath/25 http://mrsfieldsgoodies.com/index.php/cPath/31 http://mrsfieldsgoodies.com/index.php/cPath/70 http://mrsfieldsgoodies.com/index.php/cPath/62 http://mrsfieldsgoodies.com/index.php/cPath/44 http://mrsfieldsgoodies.com/index.php/cPath/41 http://mrsfieldsgoodies.com/index.php/cPath/40 http://mrsfieldsgoodies.com/index.php/cPath/75 http://mrsfieldsgoodies.com/index.php/cPath/51 http://mrsfieldsgoodies.com/index.php/cPath/46 http://mrsfieldsgoodies.com/index.php/cPath/49 http://mrsfieldsgoodies.com/index.php/cPath/35 http://mrsfieldsgoodies.com/index.php/cPath/56 http://mrsfieldsgoodies.com/index.php/cPath/69 http://mrsfieldsgoodies.com/index.php/cPath/50 http://mrsfieldsgoodies.com/index.php/cPath/45 http://mrsfieldsgoodies.com/index.php/cPath/80 http://mrsfieldsgoodies.com/index.php/cPath/32 http://mrsfieldsgoodies.com/index.php/cPath/76 http://mrsfieldsgoodies.com/index.php/cPath/52 http://mrsfieldsgoodies.com/index.php/cPath/53 http://mrsfieldsgoodies.com/index.php/cPath/39 http://mrsfieldsgoodies.com/index.php/cPath/24 http://mrsfieldsgoodies.com/index.php/cPath/54 http://mrsfieldsgoodies.com/index.php/cPath/22 http://mrsfieldsgoodies.com/index.php/cPath/27 http://mrsfieldsgoodies.com/index.php/cPath/43 http://mrsfieldsgoodies.com/index.php/cPath/59 http://mrsfieldsgoodies.com/index.php/cPath/23 http://mrsfieldsgoodies.com/index.php/cPath/33 http://mrsfieldsgoodies.com/index.php/cPath/71 http://mrsfieldsgoodies.com/index.php/cPath/61 http://mrsfieldsgoodies.com/product_info.p...roducts_id/1082 http://mrsfieldsgoodies.com/product_info.p...roducts_id/1082 http://mrsfieldsgoodies.com/product_info.p...products_id/827 http://mrsfieldsgoodies.com/product_info.p...products_id/827 http://mrsfieldsgoodies.com/product_info.p...roducts_id/1041 http://mrsfieldsgoodies.com/product_info.p...roducts_id/1041 http://mrsfieldsgoodies.com/product_info.p...products_id/943 http://mrsfieldsgoodies.com/product_info.p...products_id/943 http://mrsfieldsgoodies.com/product_info.p...products_id/113 http://mrsfieldsgoodies.com/product_info.p...products_id/113 http://mrsfieldsgoodies.com/product_info.p...products_id/670 http://mrsfieldsgoodies.com/product_info.p...products_id/670 http://mrsfieldsgoodies.com/product_info.p.../products_id/43 http://mrsfieldsgoodies.com/product_info.p.../products_id/43 http://mrsfieldsgoodies.com/product_info.p...products_id/190 http://mrsfieldsgoodies.com/product_info.p...products_id/190 http://mrsfieldsgoodies.com/product_info.p...products_id/164 http://mrsfieldsgoodies.com/product_info.p...products_id/164 http://mrsfieldsgoodies.com/product_info.p...products_id/838 http://mrsfieldsgoodies.com/product_info.p...products_id/838 http://mrsfieldsgoodies.com/product_info.p...products_id/611 http://mrsfieldsgoodies.com/product_info.p...products_id/611 http://mrsfieldsgoodies.com/product_info.p...products_id/280 http://mrsfieldsgoodies.com/product_info.p...products_id/280 http://mrsfieldsgoodies.com/product_info.p...products_id/688 http://mrsfieldsgoodies.com/product_info.p...roducts_id/1274 http://mrsfieldsgoodies.com/product_info.p...products_id/893 http://mrsfieldsgoodies.com/product_info.p...roducts_id/1196 http://mrsfieldsgoodies.com/product_info.p...products_id/894 http://mrsfieldsgoodies.com/product_info.p...products_id/254 http://mrsfieldsgoodies.com/product_info.p...products_id/472 http://mrsfieldsgoodies.com/product_info.p...products_id/945 http://mrsfieldsgoodies.com/product_info.p...products_id/946 http://mrsfieldsgoodies.com/product_info.p...products_id/980 http://mrsfieldsgoodies.com/shipping.php http://mrsfieldsgoodies.com/privacy.php http://mrsfieldsgoodies.com/conditions.php http://mrsfieldsgoodies.com/contact_us.php http://mrsfieldsgoodies.com/allprods.php http://mrsfieldsgoodies.com/catalog_produc...with_images.php http://mrsfieldsgoodies.com/advanced_search.php https://www.paypal.com/xclick/business=chfi...rrency_code=EUR http://www.oscommerce.com Link to comment Share on other sites More sharing options...
jackof Posted April 22, 2003 Share Posted April 22, 2003 Hello there, I m using Stuart Owens's mod for session killing for spiders. I beleive it's working fine. I am also using 'User tracking with admin' mod to track visitors on my site. In the logs i had found 2 to 3 entries of spiders. I found 1 entry of TEOMA.COM and 1 of inktomisearch.com I believe i need to update the codes provided by Stuart to tackle these spiders. Has anybody updated thier codes with latest and important spiders? If yes than please provide. Thanks n Regards, Jack Link to comment Share on other sites More sharing options...
tedmcdonald Posted April 30, 2003 Share Posted April 30, 2003 I went to one of the spider test sites. Here is an example of the output: Spidered Links : http://mywebsite.com/allprods.php http://mywebsite.com/advanced_search.php http://mywebsite.com/default.php/cPath/30 http://mywebsite.com/default.php/cPath/23 http://mywebsite.com/default.php/cPath/63 http://mywebsite.com/default.php/cPath/24 http://mywebsite.com/default.php/cPath/91 http://mywebsite.com/default.php/cPath/48 http://mywebsite.com/default.php/cPath/22 http://mywebsite.com/default.php/cPath/35 http://mywebsite.com/default.php/cPath/69 http://mywebsite.com/default.php/cPath/78 http://mywebsite.com/default.php/cPath/80 https://my-secure-site.com/ssl/account.php/...a34026722bcd729 http://mywebsite.com/shopping_cart.php https://my-secure-site.com/ssl/checkout_pay...a34026722bcd729 http://mywebsite.com/products_new.php http://mywebsite.com/product_info.php/products_id/419 http://mywebsite.com/product_info.php/products_id/419 http://mywebsite.com/shipping.php http://mywebsite.com/pdf.php http://mywebsite.com/privacy.php http://mywebsite.com/conditions.php http://mywebsite.com/contact_us.php http://mywebsite.com/gv_redeem.php http://mywebsite.com/affiliate_info.php https://my-secure-site.com/ssl/affiliate_af...a34026722bcd729 http://mywebsite.com/newsletter.php http://mywebsite.com/default.php/cPath/22 http://mywebsite.com/default.php/cPath/63_77 http://mywebsite.com/default.php/cPath/63 http://mywebsite.com/default.php/cPath/24 http://mywebsite.com/default.php/cPath/23 http://mywebsite.com/default.php/cPath/30 http://mywebsite.com/default.php/cPath/35_90 http://mywebsite.com/default.php/cPath/22_47 http://mywebsite.com/product_info.php/cPat...products_id/248 http://mywebsite.com/default.php/cPath/55 http://mywebsite.com/default.php/cPath/48 http://mywebsite.com/default.php/cPath/22_58 http://mywebsite.com/product_info.php/cPat...products_id/176 http://mywebsite.com/default.php/cPath/69 http://mywebsite.com/default.php/cPath/35 http://mywebsite.com/default.php/cPath/22_67 http://mywebsite.com/default.php/cPath/22 I am using Burt's SID Killer. It looks like links that go to my secure server are getting SIDs. Is there some way to stop this? Best Regards, Ted Link to comment Share on other sites More sharing options...
karcher Posted April 30, 2003 Share Posted April 30, 2003 For the last hour or so I have been reading these spider killer threads. I have installed Henri's contribution, painless, thanks. I did not add Burt mod to the html_output.php as I kinda assumed that one was enough. Please let me know if you think not. I then went to http://www.webconfs.com/search-engine-spid...r-simulator.php and spidered the default.php file. It does turn up a link with a SID. I traced this to the catalog link on the breadcrumb trail in the header bar. Now I don't know where to begin to check whether that has used the tep_href_link function which I guess it needs to for the spider killer to work. I also went to the who's online page and it showed two guests from the appropriate IP address still there. PS These threads have been a real eye opener. I didn't even think that this could be a problem. Thanks! K ..................................................................... When the going get's tough, the tough get going. Link to comment Share on other sites More sharing options...
Justin Posted June 13, 2003 Share Posted June 13, 2003 ...I then went to http://www.webconfs.com/search-engine-spid...r-simulator.php and spidered the default.php file. It does turn up a link with a SID. After struggling with this for a while, I discovered something that isn't being made very clear in a lot of posts here, probably because it's obvious to many and time-consuming to explain. Depending on the spider simulator you use, even after you make the suggested changes to your html_output.php file and check it with a spider simulator, you may still see links with session IDs. Why? The idea is to disable the SID for web spiders, not web browsers, and some spider simulators don't pass along a user-agent that matches something in your list of spiders. For example, the simulator mentioned above seems to mask itself as a spider in my list, since it does not show any SIDs when testing my site. However, another spider simulator showed that I do have SIDs, because it's just passing the same user-agent as the browser I'm using. In that case, I'm supposed to see the SIDs, since I want SIDs for human beings using web browsers. To make sure your change works regardless of the spider simulator you're using, look at the user-agent listed when you submit a site using the SearchEngineWorld tester. You should see something like "Mozilla," "Netscape," or whatever browser you happen to be using. Enter a word from this line in lower case (i.e., "mozilla" or "netscape") into your list of spiders in your modified html_output.php file. Make sure you cookies are turned off, and then test your site again using the above link. You should see that the SIDs are now gone, which means that your modification was successful. Be sure to go back and remove the "mozilla" (or whatever) entry you just added, or else legitimate customers may have trouble using your site. I'm not sure if I've clarified things here or just made them muddier, but hopefully this will help others. I know I was confused until I discovered what was going on. Link to comment Share on other sites More sharing options...
mlulm Posted July 11, 2003 Share Posted July 11, 2003 I'm using burts sid killer and its working great, but I notice that I still get sids in the navagaion bar. They are coming from the header and footer.php files. I don't think they are filtered through the html_output.php file. Is anyone else getting these? Link to comment Share on other sites More sharing options...
henri Posted July 11, 2003 Share Posted July 11, 2003 Yes i figured this out yesterday too (inktomi came): :( I initialized the spiderkiller too late. I wrote in the installation guide that you should include the spider_configure.php at the end of the application_top.php. This is unfoutunally too late, cause the links in the navigation bar are written allready. The installationguide should have been called: - add in your application_top.php after// include server parameters require('includes/configure.php') // Spiderkiller require(DIR_WS_INCLUDES . 'spider_configure.php'); Sorry for this. Henri Link to comment Share on other sites More sharing options...
joekilo Posted July 30, 2003 Share Posted July 30, 2003 hello...i have added header controller , admin...and now burts sid killer... this is what i get when testig on spider test Status 200 (return error code 0) Spider url http://www.joekilo.com User Agent Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.0) 213.122.44.194 Referrer http://www.searchengineworld.com/cgi-bin/s.../sim_spider.cgi Spider title Untitled Document Spider meta desc No description available. Spider meta keywords its stating untitled doc...is this because i am not yet on a search engine...i did submit my url to google..but its not showing there yet either??/ any helpers please jk Link to comment Share on other sites More sharing options...
Guest Posted August 3, 2003 Share Posted August 3, 2003 AHHHHHHHHHH i have tried to install the code for the sid killer,, i folled it word for word and pasted what i needed to where i needed to and its still popping up errors. is there a way i can download a good html_output.php? or can someone send me one already done.? Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.