Use Regular Expressions with Mailing Lists

Overview

Regular expressions (aka regex) are used within many programming languages to specify rules for matching strings of text. In other words, you can use regular expressions to match text in incoming email messages. The settings in which you use the regular expressions will tell the Mailing list software what to do with the messages that have matching text.

The settings in the Mailing list software will also accept non-regular expression text, so the software needs to be able to determine whether the text that is entered is a regular expression or not. To specify that a text is a regular expression, begin the line with a caret ( ^ ) character, also known as the circumflex accent when modifying another character.

In this article:

 

Regular Expression Syntax

Regular expressions use character pattern matching to find and capture the information that is needed. Regular expressions have a few different standards (e.g. POSIX Basic Regular Expressions (BRE), POSIX Extended Regular Expressions (ERE), Perl Compatible Regular Expressions (PCRE)) learning the full nuance of regex is beyond the scope of this article; however, Mailman uses the PCRE syntax since Mailman is written in Python. Individuals desiring to learn more can read Python's Regular Expression Operations, and Regular Expression How-To articles, and practice/test their regex expressions via a wide verity of means such as the Regular Expressions 101 website.

 

Escape special characters

Regular expressions are filled with special characters that mean something other than what they are. Subject lines, for example, often have square brackets in them, but square brackets have a special meaning in regular expressions. They mean any character appearing between the brackets. So if you try to block subjects that begin with "[SPAM]" using this:

^Subject: [SPAM]

What you're really doing is blocking any subject that begins with S, P, A, or M. It won't even block the [SPAM] subjects, since those begin with square brackets.

To avoid the special meaning of the square brackets, or any other special character, put a backslash in front of them:

^Subject: \[SPAM\]

Special characters you're likely to run into are the period (.), asterisks (*), plus symbols (+), and question marks (?), as well as parentheses and either of square brackets. All of these need to be escaped with a blackslash.

 

Optional matches

Often there will be slight variations on the text you're trying to match. For example, subjects sometimes begin with "Re:" and sometimes they don't. You can match optional text by surrounding that text with parentheses and putting a question mark after the parentheses. Parentheses mean treat this text as one item and the question mark means the previous item is optional.

^Subject: (Re: )?\[SPAM\]

This will match subjects that begin with "Re: [SPAM]" as well as subjects that begin with "[SPAM]".

 

Any character

In regular expressions, the period means "any character." Wherever a period occurs in a regular expression, any character will match.

^Subject: \[SPA.\]

This will match subjects that begin with "[SPA", then have any one character, then a closing bracket. [SPAM], [SPAR], and [SPAT] will all match.

 

Any of a series of known characters

Square brackets will match if any one of the characters between the brackets match. For example, if the only things we want to match are SPAM and SPAT, we could use:

^Subject: \[SPA[MT]\]

You can also use "A-Z" to match any letter, and "0-9" to match any digit. Subsets can also be used, such as "A-D" or "1-2". Because dashes have special meanings inside of brackets (they mean a range of characters), if you want to actually match a dash inside square brackets, put the dash at the end of the list of valid characters. "[a-e3-4-]" will match either a, b, c, d, e, 3, 4, or a dash.

 

Any number of characters

More useful is to pair the period with the asterisk or plus symbol. The asterisk means any number of the previous item, including zero, and the plus symbol means any number of the previous item, but at least one of them.

If you are receiving messages that contain more than one “Re:” at the beginning of the subject, for example, the question mark won’t do: it will match a single “Re: ” if it exists, but not two in a row or three in a row.

^Subject: (Re: )*\[SPAM\]

That will match any number of the text “Re: ” preceding the “[SPAM]” text.

But you can also use the asterisk with the period to match any number of any character.

^Subject: .*\[SPAM\]

This will match any subject that contains “[SPAM]” anywhere within the subject. Because the period will match any character in front of the text “[SPAM]”, and the asterisk will match zero or more occurrences of that.

 

Common use of regular expressions in Mailing Lists

Specify domains that are allowed to post

The most common use of regular expressions in a Mailing list is to specify entire domains that can, or cannot, post messages to the list. For example, to allow all addresses in the alaska.edu domain, even those address that are not subscribed, to post messages without moderation to your list, do the following.

  1. Go to the list's admin interface.
  2. Click Privacy Options.
  3. Click Sender filters.
  4. Scroll down to the non-member filters section, and find the List of non-member addresses whose postings will be automatically accepted (aka accept_these_nonmembers). Add the following regular expression in the field.
     
    ^.*@(.*\.)?alaska\.edu$
  5. Scroll to the bottom of the page and click Submit Your Changes button.

The ^ specifies that the string is a regular expression. The .* specifies that any text that comes before the "@" part of the email address is acceptable. The "(.*\.)?" part specifies that any optional subdomains (e.g. uaa, uas, etc.) are acceptable. The "alaska\.edu" part of the email address must end in "alaska.edu". For example, mary@alaska.edu, dan@uaa.alaska.edu, and joe@uas.alaska.edu will all match the regular expression. The \ specifies that the subsequent period is part of the text of the string you want to match and not a special character that can be used in regular expressions. The $ specifies that it is the end of the string.

Please be aware that spammers often forge the headers of spam email. If the forged From header is set to an email address ending in alaska.edu, that message will be able to get through to your list. If you are concerned about spam on your list, review the following KB articles for some options on securing your mailing list.

 

Specify domains that are not allowed to post

To ban an entire domain from being able to post to your list and have the mailing list software automatically discard all messages from that domain, do the following.

  1. Go to the list's admin interface.
  2. Click Privacy Options.
  3. Click Sender filters.
  4. Scroll down to the non-member filters section, and find the list of non-member addresses whose postings will be automatically discarded (aka discard_these_nonmembers). Add a regular expression similar to the following replacing somedomain\.tld with the domain you want to block (e.g. somespammer\.com) in the field.
     
    ^.*@somedomain\.tld$
  5. Scroll to the bottom of the page and click Submit Your Changes button.

 

Ban all non-UA domains from posting

If you want to ban all non-UA email address that are not members of the mailing list from being able to send (post) messages to you list, do the following.

  1. Go to the list's admin interface.
  2. Click Privacy Options.
  3. Click Sender Filters.
  4. Scroll down to the non-member filters section, and find the list of non-member addresses whose postings will be automatically discarded (aka discard_these_nonmembers) setting. Add the following regular expression in the field.
     
    ^(?!.*@(.*\.)?alaska\.edu$)
  5. Scroll to the bottom of the page and click Submit Your Changes button.

As you can see, this regular expression is a little more complex than the proceeding ones. This regular expression is known as a negative lookahead and is telling the mailing list software to match posting email addresses that don't end in "@alaska.edu", or related subdmains (e.g. @uaa.alaska.edu, @uas.alaska.edu), and to ban these email addresses.

 

Ban all non-UA domains from subscribing

If you want to ban all non-UA email address from subscribing to your list, do the following.

  1. Go to the list's admin interface.
  2. Click Privacy Options.
  3. Scroll-down to the Ban list subsection, and find the List of addresses which are banned from membership in this mailing list (aka ban_list) setting. Add the following regular expression in the field.
     
    ^(?!.*@(.*\.)?alaska\.edu$)
  4. Scroll to the bottom of the page and click Submit Your Changes button.

As you can see, this regular expression is a little more complex than the proceeding ones. This regular expression is known as a negative lookahead and is telling the mailing list software to match subscribing email addresses that don't end in "alaska.edu" and to ban these email addresses.

 

Spam filter rules

Regular expressions can be used to filter out and discard some kinds of messages, often spam or abusive messages. In Privacy options... -> Spam Filters you can add Spam Filter Rules to hold, reject, or discard incoming messages.

These rules will apply to every header in the email. Where other filters apply just to email addresses, spam filter rules can match any text anywhere in the header, so you'll have to be careful of the rules you define.

Match a specific header

You will almost always want to match against a specific header, often the Subject line. Begin your regular expression with a carent (^) and then the header name to limit your match to that header. For example:

^Subject: subject to block

The caret matches the beginning of a line, and then "Subject: " matches that header. So this regular expression will only match lines that begin with "Subject: ". In email headers, that means the Subject of the message.

 

Other Uses

There are many more uses of regular expressions. This page was not designed to go into all the uses; it is only meant to give you a simple overview of how regular expressions are used to manage mailing lists. For more information on specific settings that may use regular expressions, look through the other mailing list help pages on this site. While not necessarily mentioning the term "regular expression," other pages may contain information on how to accomplish specific tasks by using regular expressions.

 

Settings that use regular expressions

The following settings in the mailing list software will accept regular expressions:

  • accept_these_nonmembers (Privacy Options category → Sender Filters subsection)
  • acceptable_aliases (Privacy Options category → Recipient Filters subsection)
  • ban_list (Privacy Options category → Subscription Rules subsection)
  • bounce_matching_headers (Privacy Options category → Spam Filters subsection)
  • discard_these_nonmembers (Privacy Options category → Sender Filters subsection)
  • header_filter_rules (Privacy Options category → Spam Filters subsection)
  • hold_these_nonmembers (Privacy Options category → Sender Filters subsection)
  • reject_these_nonmembers (Privacy Options category → Sender Filters subsection)

 

Need additional help or have issues

For support, requests may be submitted anytime by Requesting Support for the Mailing List service. Support Requests are worked by Priority based on the Impact and Urgency of need as well as the order they are received by the IT Employees with the knowledge and permissions to assist with the request.

For immediate assistance please review the Contact Us page for ways to contact the appropriate support group.

Print Article

Related Articles (3)

UAA Mailman is a web-based email distribution list manager that allows staff, faculty, and university departments and organizations to use email lists for distribution of news, upcoming events, and/or general discussion among subscribers.
An introduction to Mailman for individuals responsible for managing a mailing list.