it's a raspberry pi

Editorials

 Setups & configurations

anchor life saver

 We got some boring movies

🎥 Cinematique

 Regarding unity3d

🙂 In-game char icons
Externals

 All about pirate games

⚓ Pirates ahoy!

 This is German language

💻 Linux - ubuntusers.de
Windward bathtub
Tasharen Windward Game

 Social news aggregation

🌍 Windward #reddit
Windward intensive

 Official Windward wiki

🛠 Wiki @gamepedia
Front desk clerk

No ads, no trackers,
no web beacons

rum barrel
Something else

 Get the weather widget

🌤 Weather code snippet

 Hotchpotch of weblinks

📖 Yellow pages
Archives
Tag cloud

Webalizer referrer spamming

Precautions, cutback, elimination, prevention


This article is related to Webalizer - Server usage reports.

Webalizer referrer spamming

Preface


Referrer spam (also known as referral spam, log spam or referrer bombing[1]) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. Sites that publish their access logs, including referer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs, improving the spammer's search engine ranking. Except for polluting their statistics, the technique does not harm the affected sites.
Wikipedia


... , the technique does not harm the affected sites.


Referrals may harm the affected sites. You can get bad reputations from search engines. Clickable links even may direct your visitors or yourself to infected websites !


Preparation


You'll need access & full administrative rights to deal with

     
     /var/www/html                             # http(s) root
     /var/www/html/.htaccess                   # directive file, server behaviour
     /var/www/html/robots.txt                  # directive file, webcrawlers
     
     /var/www/html/OutputDir                   # Webalizer output directory
     /var/www/html/OutputDir/webalizer.current # Webalizer main database
     
     /etc/webalizer/webalizer.conf             # Webalizer configuration file
     
     /var/log/apache2/access.log               # Apache server access log file
     /var/log/apache2/error.log                # Apache server error log file
     

Precautions


Rules of thumb Do never ever use the so-called bulk submission services, even if they offer it for free. Sooner or later you are confrontated with a lot of scammers and spammers. Extra early in your mail box. At first go for the Big Three Bing/Yahoo! Google. Always submit your websites in the ordinary manner.


Reputable search engines shall obey the robots.txt file. Instruct the search engine crawlers to not crawl your statistic pages, they will not publish the links in their directories. Set this on top in your robots.txt


Keep in your minds: same as the .htaccess the robots.txt functions like a batch file.

     
     User-agent: *
     ...
     Disallow: /OutputDir
     ...
     

Do never use the Webalizer's standard output directory '/webalizer'. Spammers could search for that.
Move the content from '/webalizer' to your HDD and copy it back into the new directory.


root@raspberry:~# nano /etc/webalizer/webalizer.conf


     OutputDir /var/www/html/myserverstats     # whatever to name it
     

Leave the META-tag with "noindex,nofollow".


     HTMLHead <meta name="robots" content="noindex,nofollow">
     

Referrer option determines if entries in the referrer table should be plain text or a HTML link.


     LinkReferrer  no   # standard format, plain text
     HideURL       /OutputDir
     IgnoreURL     /OutputDir
     

Intermezzo


Understanding Apache's server access.log file


Reading from the left hand to the right hand.

     
     303.202.101.321         # decimal IP address of the client
     ba04d64a                # hexadecimal IP address of the client
     
                                 or
                               
     303-202-101-321         # IP address of the client in combination with
     .client.example.com     # identity of the client determined by identd 
                             # on the client’s machine. Returns a hyphen (-) 
                             # if this information is not available
                               
     [28/Dec/2017:10:34:12]  # time that the request was received
     
     "GET /pic.png http/1.1" # request line http method used & source
     
     200                     # status code, the server sends back
     
     5867                    # size of the object requested
     
     "http://spam.com.ua/"       # the referral link. Returns a hyphen (-) 
                             # if this information is not available
                              
     "Mozilla/5.0 (...)"     # the user agent. Returns a hyphen (-) 
                             # if this information is not available
     

Cutback


Once you got infected, so what's next? Take the prescription in ❸ first.
Consult /var/log/apache2/access.log and /var/log/apache2/error.log for investigation.


Study Log Files to Apache http Server Version 2.4.x .


Limit the bad bots activities by your .htaccess directives. This only limits the bandwidth taken!
You may limt the access from various IPs, domains and top level domains (TLD).

Set this on top in your .htaccess
Resulting with error code 403 Access forbidden

     
     <Limit GET>
        Require all granted
        Require not ip 111.222.333.444
        Require not ip 555.666.777
        Require not ip 888.999
        Require not host spam.com.ua
        Require not host info
     </Limit>
     

Apart from that in case you got spam in your forum, guestbook or any board as well then use

     
     <Limit POST>
        Require all granted
        Require not ip 111.222.333.444
        Require not ip 555.666.777
        Require not ip 888.999
        Require not host spam.com.ua
        Require not host info
     </Limit>
     

Once you changed the OutputDir, redirect the unwanted to a harmless external page.

     
     Redirect /OutputDirOld https://duckduckgo.com/about
     ErrorDocument 403 https://duckduckgo.com/about
     

That's not all. Now we try harder and step ahead to evil's core.


root@raspberry:~# nano /etc/webalizer/webalizer.conf


Scroll down and look for the examples to IgnoreReferrer.
Here you can set whatever you desire to keep off as referrals.
Reject top level domains (TLD), domains, IPs, certain expressions appearing in domain names.

     
     IgnoreReferrer   *.ru                     # rejects a top level domain (TDL)
     IgnoreReferrer  westio.com                # rejects a domain
     IgnoreReferrer  essaydates.com              # rejects a domain
     
     IgnoreReferrer   casino                   # rejects any expression casino
     
     IgnoreReferrer   http://                  # rejects all non-https domains
     
     IgnoreReferrer   http://1                 # rejects non-https IP addresses
     IgnoreReferrer   http://2                 # ...
     IgnoreReferrer   ...                      # ...
     IgnoreReferrer   http://8                 # ...
     IgnoreReferrer   http://9                 # ...
     
Never I tested the following method:
     
     IgnoreReferrer   *                        # does this ignore any referrer = ?
	 
     IncludeURL       *google.*                # can pass if it's a friendly bot
     IncludeURL       *yahoo.*
     IncludeURL       *bing.*
     

Elimination mission impossible


Quit good question. Fact has it's a matter of time until you get it solved.
Meanwhile you can manually tidy-up. webalizer.current is a normal ASCII file.

Study it exactly from the top to the bottom before you know what you want to wipe off.
Prior do a backup!


root@raspberry:~# nano /var/www/html/OutputDir/webalizer.current


28-Dec 2017


Prevent referrer spam by the env=!dontlog directive


The SetEnvIf and SetEnvIfNoCase directives can be used in the following contexts in your global Apache (2.4) configuration file. E.g. if you get lots of visits from search engine spiders (bots), certain IPs or socalled referrer spammers.


I figured out that another method got very effective against referrer spam.

Please follow up the internal link.


Study regularly Apache's access.log & error.log.


04-Mar 2018


Set up the UFW firewall against referrer spam


ufw is a front end application for iptables. Here you get the basic handling to your personal firewall - but effective one - to IPv4 & IPv6. The "ufw" is a comfortable command line application for managing your personal "iptables" rules in Linux.


Follow this link: Install & configure the socalled "ufw" | Uncomplicated firewall for Linux web servers.


07-Jun 2018

Hafenzoll 2019