Author Topic: Search engine bots/crawlers  (Read 1729 times)

0 Members and 1 Guest are viewing this topic.

Offline Underdog

  • Administrator
  • Hero Member
  • *******
  • Posts: 1103
  • Kudos: 54
  • Gender: Male
    • Ask Us A Question
Search engine bots/crawlers
« on: December 31, 2011, 04:37:37 PM »
Here is a great read regarding search engine bots:

Introduction to robots.txt

Configuring this file can save bandwidth, stop bots from searching specific files and/or directories & stop spam bots from gathering info from some files/directories perhaps.

A good read for people running forums (a.k.a Webmasters).



Offline Underdog

  • Administrator
  • Hero Member
  • *******
  • Posts: 1103
  • Kudos: 54
  • Gender: Male
    • Ask Us A Question
Re: Search engine bots/crawlers
« Reply #1 on: January 04, 2012, 09:43:19 PM »

I don't think bots are automatically directed to the above file & forced to use its written rules.
Although renowned Companies such as ie. Google may configure their bots to read & obey the instructions from the robots.txt file in which case one may be able to save on some resources by controlling some bots actions (ie. no access to certain image directories, etc).
 
All traffic is forced to obey .htaccess rules & those files can be written to disallow ip's, certain file access, etc.
This can definitely benefit a site with proper configuration and possibly save bandwidth & make a site more secure.

Here are some basic examples of .htaccess file configuration.

disallow ip addresses (2 examples given):
Code: [Select]
Order allow,deny
Allow from all
Deny from 123.536.744.63
Deny from 531.556.886.321
   
.. etc..

disallow specific file access:
Code: [Select]
<Files config.inc.php>
  order allow,deny
  deny from all
</Files>

There are a few SMF modifications available on SMF.org that will add/configure some .htaccess files to thwart attackers/hackers & bad bots.
You may also be interested in some 3rd party scripts that offer some protection (although not recently updated): Bot TrapBot Trap2, Cloud Flare
 
.htaccess files allow all sorts of control ie. including redirecting specific IP's to other pages/url's, password protect directories, display specific html pages for page errors, etc.

.htaccess files can overrule each other. Your server will obey whichever is closest in its folder tree.

Offline Skhilled

  • Admin
  • Hero Member
  • *****
  • Posts: 708
  • Kudos: 32
  • Gender: Male
  • Retro Gamer!
Re: Search engine bots/crawlers
« Reply #2 on: January 06, 2012, 08:37:11 AM »
Great post. :)

 

Donations help to pay for server/domain expenses. Thanks to all who contribute.