WWW Shield is controlled by a file ~/.wwwshield/rules that defines the values of various configuration variables and the macros shown in the macro page. This page describes the language syntax. Literal text is shown in boldface, and meta tokens that stand for other syntax elements are italicized. Optional parts are enclosed in non-boldfaced square brackets [ ].
The general operation of WWW Shield is the application of macros to requests and HTML commands. The macro evaluator is called in three different cases:
Conspiciously absent is the possibility of letting macros operate on plain text outside HTML instructions. Although such text can be killed if enclosed in paired HTML instructions, it can not be edited. This is intentional; I do not want this tool to be used for censorship in the way of those ill-conceived cyber nannies supposed to protect children. I disapprove of censorship. A fine line to walk, I know.
Commands are statements that perform an operation. They must be terminated with a semicolon, except that a closing right brace } is never followed by a semicolon (in other words, like in C). Multiple commands (other than macros) can be combined to a simple command by enclosing the list in curly braces, again like in C. Much of the syntax is inspired by C.
| macro "name" expression [enable] { commands } |
Macros are lists of commands that can be turned on and off. If turned on, they operate on outgoing page requests and incoming pages. If turned off, they have no effect. If enable is missing or true, the macro is enabled by default; if it is false the macro is disabled by default. The default can be changed in the macro page.
The name is a short name of the macro.
The expression is evaluated and shown as descriptive text in the macro page. Being an expression, it can use operators. By convention, the expression should have the form <B>Short description</B>\n\nLong description. Backslash-n (\n) inserts a paragraph break. If the expression contains a dollar sign ($), the text following the dollar sign up to the next blank, tab, or end-of-string is assumed to be the name of a variable that will be shown in a new row at this point, as a text-entry field. Any number of variables may be referenced in this way, at any point in the description. This is how macros are parameterized. Literal dollar signs can be escaped with a leading backslash. For example, if the expression contains the text foo $var bar, foo will be printed in one line, followed by a text entry field for the variable var, followed by a line bar. The variable var can then be changed by the user, and accessed in the macro commands. The variable must exist at this point, for example by having been assigned to with a statement such as var=""; before the macro definition. All this is merely a convention; there are no such things as local variables and formal parameter lists for macros.
The commands are the commands to execute. The macro command is only allowed at the top level, not inside another command.
|
if (expression) command1
if (expression) command1 else command2 |
This is a conditional. If the expression evaluates to false, command2 is executed if present; otherwise command1 is evaluated. If an else is present, then command1 must be enclosed in curly braces.
| [ global ] variable = expression |
The value of expression is assigned to the variable variable. Global variables are permnanent; all others remain valid only for the duration of the current request or page loading operation. See below for an explanation of variables.
| [ global ] variable = [ expression_list ] |
This is similar to a simple assignment, except that a list of values is assigned to the variable, which becomes an array variable. Array variables have a special meaning in some kinds of comparisons (see below) in that the comparison is done for every value in the list until one is true. This is useful for matching a list of URLs, for example. The expression_list is a comma-separated list of expressions.
| delete variable |
If the variable variable exists, it is deleted. This can not be used for deleting HTML text from a web page. It is safe to delete nonexisting variables; in this case nothing happens. Typical usage: delete header.cookie.
| drop |
If the macro is called for a request or reply header, the entire header is deleted, which effectively cancels the request. If the macro is called for an HTML command, the command (everything in angle brackets) is omitted from the page.
| kill |
If the macro is called for a request or reply header, this works like drop. If the macro is called for an HTML command, the command and the end-command (which begins with a slash followed by the same command) and everything in between is omitted from the page. For example, if applied to the command ``<a href=foo>'', this command and all the text up to and including ``</A>'' is deleted. Obviously this should be used only on commands that have end-commands because otherwise the entire rest of the page is killed.
| insert expression |
Insert the text expression before the header line or HTML command just matched. Macros that need to replace an HTML command or header line with a completely different one often use drop or kill, and insert together.
| append expression |
Append the text expression after the header line or HTML command just matched.
| abort |
Abort the current transfer. This should be used sparingly because it may be surprising to the user of the browser, and may leave broken or empty image icons in the displayed page. It was introduced to suppress hidden connections to spying pages like Netscape's what's-related pages. It is best to use this only for header matches, not for HTML command matches.
Expressions compute values, either for testing conditions or for insertions into the page. Expressions have associativity: ``.'' groups most strongly, followed by relational operators, followed by ``?:'', followed by ``&&'', followed by ``||''. Parentheses may be used to group operators differently.
|
"text"
number true false |
These provide constants. A pair of single quotes may be used instead of a pair of double quotes. Numbers are actually treated like strings, and may contain a sign and a decimal point, but no scientific notation. The true and false constants are useful for boolean operations; they are handled more efficiently internally.
| variable |
Returns the value of the variable; see below.
| expression . expression |
String concatenation. Since numbers are treated like strings, they may also be concatenated, just like any other expression. No blanks are inserted implicitly. Note that periods are also used for separating variable members; use blanks around the concatenation period to disambiguate.
| expression operator expression |
The two operands are compared, either numerically if both sides look like numbers, or alphabetically if not. The result is either true or false. If the right-hand expression is an array variable, the comparison is performed for every element of the array variable until one evaluated to true, in which case the result is also true. If all members of the array evaluate to false, the result is false. The following relational operators are supported:
| == | equal |
| != | not equal |
| <= | less than or equal |
| >= | greater than or equal |
| < | less than |
| > | greater than |
| =~ | regular expression match |
| !~ | regular expression mismatch |
| contains | substring of |
| !contains | not substring of |
All comparisons are case-insensitive; x and X compare equal. WWW Shield also supports numeric operators. The operators and their precedence follow C rules. Again, use blanks around minus signs to make sure that they are not taken as part of a variable name. Minus signs are common in variable names, so a-b is a variable name and a - b is a subtraction.
| * | multiply |
| / | divide |
| % | integer modulo |
| + | add |
| - | subtract |
| & | bitwise AND |
| ^ | bitwise XOR |
| | | bitwise OR |
| && | boolean AND |
| || | boolean OR |
| << | integer left shift |
| >> | integer right shift |
| operator expression |
There are also three unary prefix operators, as known from C:
| - | negative |
| ! | boolean NOT |
| ~ | bitwise NOT |
| expression ? expression : expression |
The result is the second expression if the first expression is true, or the third expression otherwise.
WWW Shield is based on the manipulation of variables. It distinguishes global variables that remain valid at all times, and local variables that exist only while processing a request or reply. Variables can be either simple (as in var) or structured (as in var.member). Valid variable names must begin with an optional slash, followed by a letter, followed by letters, numbers, minus signs, underscores, and periods.
Variables are not case-sensitive, so capitalization does not matter. This is because HTML commands are not case-sensitive either.
To check whether a variable exists, use a command like if (var). To check whether a variable exists and has a certain value, use if (var == value) or some other relational operator.
The following table lists all standard global variables. They can be overridden with global assignment commands in the ~/.wwwshield/rules file, and new ones can be set. All macros always have access to these.
| config.version | WWW Shield's version number. |
| config.host | The host that WWW Shield runs on, without domain. |
| config.domain | The domain of the host that WWW Shield runs on. This may be empty. |
| config.port | The port number to which WWW Shield answers. The default is 1188. |
| config.proxymode | If set, WWW Shield talks to another proxy, such as Squid, instead of sending requests directly to the Internet. |
| config.proxyhost | In proxy mode, the host (and optionally domain) the proxy runs on. |
| config.proxyport | In proxy mode, the port number of the proxy. The default is 80. |
Browser requests are parsed into members of the local header variable. The first four are from the first request line, and the rest is from the remainder of the request. POST bodies are not parsed. These variables exist only when sending the request but not when the reply is received because that is a different transaction.
| request | A variable set by WWW Shield if a request header is being parsed. It has no value. Macros can use this to distinguish between requests and replies. |
| header.method | The request method, either GET or POST. |
| header.url | The URL being requested. |
| url | The URL being requested (for symmetry with reply headers). |
| host | The host name from the URL being requested. |
| header.protocol | The HTTP protocol identifier, typically HTTP/1.0 or HTTP/1.1. |
| header.referer | The previous page visited that contained the link now being followed. |
| header.user-agent | The name of your browser, such as Mozilla/4.05C-SGI etc. |
| header.host | The host that the page is being requested from. Some HTTP 1.0 browsers do not send this header. |
| header.accept | The MIME types of data that the browser is expecting, such as image/gif, image/x-xbitmap, image/jpeg. |
There are more headers; see the HTTP standard. The macros are run when all header lines have been read from the browser, just before the request is sent to the proxy or server (unless the macros issue an abort command, in which case the request is cancelled).
Server replies work the same way as browser requests, except that the header variable members are somewhat different. The variables remain valid when the page body is being parsed (which happens only if the page type is HTML, images bodies don't contain HTML commands but go through reply header parsing nevertheless).
| url | A copy of the URL in the request header. (It is not part of the reply header.) |
| host | A copy of the host name in the URL in the request header. (It is also not part of the reply header.) |
| reply | A variable set by WWW Shield if a reply header is being parsed. It has no value. |
| header.protocol | The HTTP protocol identifier, typically HTTP/1.0 or HTTP/1.1. |
| header.replycode | The HTTP reply code. 200 means OK, 404 means page not found, etc. |
| header.replymessage | A human-readable description of the reply code, such as OK. |
| header.date | The date when the request was processed by the server. |
| header.server | The server identification, such as Apache/1.2.0 PHP/FI-2.0b11. |
| header.content-type | The data type of the reply, such as text/html or image/jpeg. |
Again, there may be others. The macros are executed when all header lines have been read.
If the reply has the content type text/html, the body is also parsed. Every HTML command, typically beginning with <html> and ending with </html>, is parsed and turned into a variable, in this case named html and /html, respectively. (The slash is part of the name.) If the HTML command has fields, they are turned into members of the variable. For example, <IMG SRC=foo> sets the variable img, and also sets img.src to the value foo. The variables remain valid only while the command is being parsed. The macros are run whenever a command was fully read (that is, when the trailing > is found).
When parsing an HTML page, the reply header variables are also still available (such as url), but the request header variables are not.
| Language examples | |
| Go to WWW Shield's main menu | |
| Back to my home page | |
| Tell me if you found this information interesting or useful, or if you have comments. |