WWW Shield home page banner

Rule Language Help

WWW Shield is controlled by a file ~/.wwwshield/rules that defines the values of various configuration variables and the macros shown in the macro page. This page describes the language syntax. Literal text is shown in boldface, and meta tokens that stand for other syntax elements are italicized. Optional parts are enclosed in non-boldfaced square brackets [ ].

The general operation of WWW Shield is the application of macros to requests and HTML commands. The macro evaluator is called in three different cases:

  1. For a request header that is sent to a remote WWW server as the result of a browser operation, for a web page or an image. These headers typically contain lines such as ``referer: foo''. Each such line is converted to a variable; in this case a new variable ``referer'' is created with the value ``foo''. By inspecting, changing, or deleting variables macros can affect the header, for example by concealing the page referer.

  2. For a reply header received from the remote WWW server in response to a request header. This works the same way as for request headers. For example, a macro could drop set-cookie lines.

  3. For every HTML instruction enclosed in angle brackets, such as ``<img src=url>''. Before running the macros the HTML instruction is converted to variables: the variable ``img'' is set but has no value, and the variable ``img.src'' is set to the value ``url''. Variable names may contain slashes, so ``/a'' is a perfectly valid variable name. This is the heart of macro operation because variables can be matched, changed, and deleted.

Conspiciously absent is the possibility of letting macros operate on plain text outside HTML instructions. Although such text can be killed if enclosed in paired HTML instructions, it can not be edited. This is intentional; I do not want this tool to be used for censorship in the way of those ill-conceived cyber nannies supposed to protect children. I disapprove of censorship. A fine line to walk, I know.


Commands

Commands are statements that perform an operation. They must be terminated with a semicolon, except that a closing right brace } is never followed by a semicolon (in other words, like in C). Multiple commands (other than macros) can be combined to a simple command by enclosing the list in curly braces, again like in C. Much of the syntax is inspired by C.

macro "name" expression [enable] { commands }

Macros are lists of commands that can be turned on and off. If turned on, they operate on outgoing page requests and incoming pages. If turned off, they have no effect. If enable is missing or true, the macro is enabled by default; if it is false the macro is disabled by default. The default can be changed in the macro page.

The name is a short name of the macro.

The expression is evaluated and shown as descriptive text in the macro page. Being an expression, it can use operators. By convention, the expression should have the form <B>Short description</B>\n\nLong description. Backslash-n (\n) inserts a paragraph break. If the expression contains a dollar sign ($), the text following the dollar sign up to the next blank, tab, or end-of-string is assumed to be the name of a variable that will be shown in a new row at this point, as a text-entry field. Any number of variables may be referenced in this way, at any point in the description. This is how macros are parameterized. Literal dollar signs can be escaped with a leading backslash. For example, if the expression contains the text foo $var bar, foo will be printed in one line, followed by a text entry field for the variable var, followed by a line bar. The variable var can then be changed by the user, and accessed in the macro commands. The variable must exist at this point, for example by having been assigned to with a statement such as var=""; before the macro definition. All this is merely a convention; there are no such things as local variables and formal parameter lists for macros.

The commands are the commands to execute. The macro command is only allowed at the top level, not inside another command.

if (expression) command1
if (expression) command1 else command2

This is a conditional. If the expression evaluates to false, command2 is executed if present; otherwise command1 is evaluated. If an else is present, then command1 must be enclosed in curly braces.

[ global ] variable = expression

The value of expression is assigned to the variable variable. Global variables are permnanent; all others remain valid only for the duration of the current request or page loading operation. See below for an explanation of variables.

[ global ] variable = [ expression_list ]

This is similar to a simple assignment, except that a list of values is assigned to the variable, which becomes an array variable. Array variables have a special meaning in some kinds of comparisons (see below) in that the comparison is done for every value in the list until one is true. This is useful for matching a list of URLs, for example. The expression_list is a comma-separated list of expressions.

delete variable

If the variable variable exists, it is deleted. This can not be used for deleting HTML text from a web page. It is safe to delete nonexisting variables; in this case nothing happens. Typical usage: delete header.cookie.

drop

If the macro is called for a request or reply header, the entire header is deleted, which effectively cancels the request. If the macro is called for an HTML command, the command (everything in angle brackets) is omitted from the page.

kill

If the macro is called for a request or reply header, this works like drop. If the macro is called for an HTML command, the command and the end-command (which begins with a slash followed by the same command) and everything in between is omitted from the page. For example, if applied to the command ``<a href=foo>'', this command and all the text up to and including ``</A>'' is deleted. Obviously this should be used only on commands that have end-commands because otherwise the entire rest of the page is killed.

insert expression

Insert the text expression before the header line or HTML command just matched. Macros that need to replace an HTML command or header line with a completely different one often use drop or kill, and insert together.

append expression

Append the text expression after the header line or HTML command just matched.

abort

Abort the current transfer. This should be used sparingly because it may be surprising to the user of the browser, and may leave broken or empty image icons in the displayed page. It was introduced to suppress hidden connections to spying pages like Netscape's what's-related pages. It is best to use this only for header matches, not for HTML command matches.


Expressions

Expressions compute values, either for testing conditions or for insertions into the page. Expressions have associativity: ``.'' groups most strongly, followed by relational operators, followed by ``?:'', followed by ``&&'', followed by ``||''. Parentheses may be used to group operators differently.

"text"
number
true
false

These provide constants. A pair of single quotes may be used instead of a pair of double quotes. Numbers are actually treated like strings, and may contain a sign and a decimal point, but no scientific notation. The true and false constants are useful for boolean operations; they are handled more efficiently internally.

variable

Returns the value of the variable; see below.

expression . expression

String concatenation. Since numbers are treated like strings, they may also be concatenated, just like any other expression. No blanks are inserted implicitly. Note that periods are also used for separating variable members; use blanks around the concatenation period to disambiguate.

expression operator expression

The two operands are compared, either numerically if both sides look like numbers, or alphabetically if not. The result is either true or false. If the right-hand expression is an array variable, the comparison is performed for every element of the array variable until one evaluated to true, in which case the result is also true. If all members of the array evaluate to false, the result is false. The following relational operators are supported:

== equal
!= not equal
<= less than or equal
>= greater than or equal
< less than
> greater than
=~ regular expression match
!~ regular expression mismatch
contains substring of
!contains    not substring of

All comparisons are case-insensitive; x and X compare equal. WWW Shield also supports numeric operators. The operators and their precedence follow C rules. Again, use blanks around minus signs to make sure that they are not taken as part of a variable name. Minus signs are common in variable names, so a-b is a variable name and a - b is a subtraction.

* multiply
/ divide
% integer modulo
+ add
- subtract
& bitwise AND
^ bitwise XOR
| bitwise OR
&&    boolean AND
|| boolean OR
<< integer left shift
>> integer right shift

operator expression

There are also three unary prefix operators, as known from C:

-     negative
! boolean NOT
~ bitwise NOT

expression ? expression : expression

The result is the second expression if the first expression is true, or the third expression otherwise.


Variables

WWW Shield is based on the manipulation of variables. It distinguishes global variables that remain valid at all times, and local variables that exist only while processing a request or reply. Variables can be either simple (as in var) or structured (as in var.member). Valid variable names must begin with an optional slash, followed by a letter, followed by letters, numbers, minus signs, underscores, and periods.

Variables are not case-sensitive, so capitalization does not matter. This is because HTML commands are not case-sensitive either.

To check whether a variable exists, use a command like if (var). To check whether a variable exists and has a certain value, use if (var == value) or some other relational operator.

Global Variables

The following table lists all standard global variables. They can be overridden with global assignment commands in the ~/.wwwshield/rules file, and new ones can be set. All macros always have access to these.

config.version WWW Shield's version number.
config.host The host that WWW Shield runs on, without domain.
config.domain The domain of the host that WWW Shield runs on. This may be empty.
config.port The port number to which WWW Shield answers. The default is 1188.
config.proxymode If set, WWW Shield talks to another proxy, such as Squid, instead of sending requests directly to the Internet.
config.proxyhost In proxy mode, the host (and optionally domain) the proxy runs on.
config.proxyport In proxy mode, the port number of the proxy. The default is 80.

Request headers

Browser requests are parsed into members of the local header variable. The first four are from the first request line, and the rest is from the remainder of the request. POST bodies are not parsed. These variables exist only when sending the request but not when the reply is received because that is a different transaction.

request A variable set by WWW Shield if a request header is being parsed. It has no value. Macros can use this to distinguish between requests and replies.
header.method The request method, either GET or POST.
header.url The URL being requested.
url The URL being requested (for symmetry with reply headers).
host The host name from the URL being requested.
header.protocol The HTTP protocol identifier, typically HTTP/1.0 or HTTP/1.1.
header.referer The previous page visited that contained the link now being followed.
header.user-agent The name of your browser, such as Mozilla/4.05C-SGI etc.
header.host The host that the page is being requested from. Some HTTP 1.0 browsers do not send this header.
header.accept The MIME types of data that the browser is expecting, such as image/gif, image/x-xbitmap, image/jpeg.

There are more headers; see the HTTP standard. The macros are run when all header lines have been read from the browser, just before the request is sent to the proxy or server (unless the macros issue an abort command, in which case the request is cancelled).

Reply headers

Server replies work the same way as browser requests, except that the header variable members are somewhat different. The variables remain valid when the page body is being parsed (which happens only if the page type is HTML, images bodies don't contain HTML commands but go through reply header parsing nevertheless).

url A copy of the URL in the request header. (It is not part of the reply header.)
host A copy of the host name in the URL in the request header. (It is also not part of the reply header.)
reply A variable set by WWW Shield if a reply header is being parsed. It has no value.
header.protocol The HTTP protocol identifier, typically HTTP/1.0 or HTTP/1.1.
header.replycode

The HTTP reply code. 200 means OK, 404 means page not found, etc.
header.replymessage

A human-readable description of the reply code, such as OK.
header.date The date when the request was processed by the server.
header.server The server identification, such as Apache/1.2.0 PHP/FI-2.0b11.
header.content-type The data type of the reply, such as text/html or image/jpeg.

Again, there may be others. The macros are executed when all header lines have been read.

Reply HTML commands

If the reply has the content type text/html, the body is also parsed. Every HTML command, typically beginning with <html> and ending with </html>, is parsed and turned into a variable, in this case named html and /html, respectively. (The slash is part of the name.) If the HTML command has fields, they are turned into members of the variable. For example, <IMG SRC=foo> sets the variable img, and also sets img.src to the value foo. The variables remain valid only while the command is being parsed. The macros are run whenever a command was fully read (that is, when the trailing > is found).

When parsing an HTML page, the reply header variables are also still available (such as url), but the request header variables are not.


Language examples
Go to WWW Shield's main menu
Back to my home page
Tell me if you found this information interesting or useful, or if you have comments.

The URL of this page is http://www.bitrot.de/www_language.html. See the copyright notice.