|
Blocked directories:
/graphics /images /inc /include and all subdirectories
All blocked directories are WRRC specific.
While I recommend using a similar approach you may want to avoid
blocking syndication images. The best way to do that is by creating
separate directories for syndication images.
Blocked files:
*.gif *.jpg *.jpeg *.png *.css *.inc *.js *.jpe *.wmz *.dll
*.class *frame.asp *nframe.asp *robots.txt
The above blocked file names and extensions are universal with
the caveat that the frame related items are named by each center.
The following file names and partial names are WRRC specific:
*sidebar.htm *top.htm nav.asp subsection.asp default.ida *vdir.htm
Blocked browsers:
Browser 1 -
Googlebot Gigabot bumblebee sitecheck appie Mercator ia_archiver
spider spyder obot crawler architext lycos infoseek surfbot slurp
scooter gulliver fido freecrawl linkbot webfind enigmabot happybot
Browser 2 -
adminshop Aleksika analyst answerbus aport archiver asterias Baiduspider bloglines Border Butch coast control DiamondBot DittoSpyder easydl
Enterprise_Search Environmental FAST FeedValidator FindAnISP
Browser 3 -
ABACH0Bot Acme.Spider AxmoRobot BravoBrian "Change Spider" CyberSpyder dtSearchSpider dumbot EMPAS_ROBOT freefind FyberSpider Gaisbot LNSpiderguy
"maxamine.com--robot" mozDex
Browser 4 -
ABACHOBot ArmadilloBot "Climate Spider" "ClimateArk Spider" CydralSpider Faxobot FiNDoBot Fluffy Jyxobot KATATUDO-Spider mobileGate-Spider "NLESE+USEPA" "NLESE USEPA" NokodoBot Ocelli OpidooBOT pipeLiner Scooter-3.2
Browser 5 -
documagix backrub netsite tarantula whacker www-collector "webtrends
link analyzer" microsoft_site_analyst keynote-agent arachnoidea
kit-fireball harvest excite inktomi verity xenu wisebot
Browser 6 -
polybot linkscan libby psbot sqworm lwp-trivial about Zyborg teoma-agent
EmailSyphon JennyBot BunnySlippers htdig DIIbot Ultraseek SiteSweeper
contype toCrawl "Micosoft URL Control" LinkWalker moget
Browser 7 -
webcollage gazz larbin_2.6.2 Gather Hloader "ask jeeves" cyberalert looksmart singingfish crawler918
"WIPO Spider" WISEnutbot Works "Xenu link sleuth" zippobot
Browser 8 -
FlashGet Forest hl_ftien_spider htmlparser " http://search.msn.com/msnbot.htm " ichiro iltrovatore Inet insumascout Jakarta jetbot joedog
k2spider larbin libwww linkalarm lwp microcomputers moxDex
Browser 9 -
msnbot MSNPTC naverbot netcraft npbot nuSearch nutch "NutchCVS/0.06-dev" PicSpider pompos Python pythonrpt RPT "search.ch" "sherlock/1.0"
sleipnir "Slurp/cat" "sohu-search" SpeedySpider Spider.NET
Browser 10 -
SpiderMan sygol teleport test URL_Spider_Pro vspider "vspider for EPA" "vspider+for+EPA" Water "Water Conserve Spider" webdup webharvester
WebTrends "WebTrends Link Analyzer" wget "Wget/1.8.2"
Browser 12 -
"Netcraft Web Server Survey" NetResearchServer "nuSearch Spider" Openbot picsearch "PlantyNet_WebRobot_V1.9" PortalBSpider Robot ScSpider
semanticdiscovery "Server Survey" spider.acont.de swish-e
Browser 13 -
Szukacz verio "Water Conservation Spider" WebZIP Scooter-3.2.NIV Scooter-3.2.PDF Scooter-3.2.SF0 SiteSpider "SpiderNet IE" StanleyWebSpider
strider Top10Ranking USyd-NLP-SPider Yahoo! "NLESE USEPA"
The above blocked browsers are universal and represent
the more active subset of spiders.
Note: WebTrends will block partial string matches so it is important
to be careful when using short strings and to use quotation marks
on multi-word strings.
Blocked users from DPPEA/WRR LAN: 207.4.183.*
These are WRRC specific users. Each Center will need to determine
their blocked users.
Blocked domains:
*cyberalert* *SV-BOT21* *crawler918* *looksmart*
Back to P2Rx Administrative Information
TOP
|