This document describes how the strings identifying a post are computed from the subject line in NewsSrv, starting from version 0.3.0. It assumes that you have a knowledge of regular expressions, and specifically Perl-compatible regular expressions, as implemented by libPCRE.
In the configuration pages of NewsSrv, you'll find two new configuration groups: "Post rules" and "Post rules groups". "Post Rules" allow you to configure arbitrary regular expressions, and "Post rules groups" lets you regroup these rules in so-called post rules groups, which can then be associated with GlobalGroups.
Additionnaly to the regular PCRE syntax, these expressions can contain two special tokens: %s and %t. These are equivalent to ([0-9]+), with the additionnal meaning that if %s matches, it will be considered to be the sequence number of the message in the post; %t will be considered to be the total number of messages in the post. You can obtain a litteral "%" by doubling it.
First, the subject line is matched against the regular expressions contained in the post rule group of the concerned global group. The order of the match is the order of the "priority" value, descending. As soon as a match is found, and this means that both %s and %t matched, the search for sequence/total stops, but the string is still matched against expressions marked "always".
Then, all capturing subpatterns corresponding to a match with either the first match, either one of the "always" expressions, are removed, and the generated string will be used to identify the message's post.
Some examples, now...Always | Expression | Priority |
---|---|---|
no | .*(\[%s/%t\].*) | 5 |
yes | (\.part[0-9]{2}) | 4 |
yes | (\.[rp]ar|([0-9]{2})) | 3 |
yes | (\.(nfo|md5|htm|txt)) | 2 |
[Dream-Anime]GenoCyber episode 4(SBC) [01/38] [D-A]GenoCyber - Stage 4(SBC)(DFE4FC57).part01.P01
This string matches rule 1, and the capturing subpart matches the substring
[01/38] [D-A]GenoCyber - Stage 4(SBC)(DFE4FC57).part01.P01
Then, none of the "always" rules match. Finally, the sequence
number will be 1, the total number of files for the post 38, and
the postID will be
[Dream-Anime]GenoCyber episode 4(SBC)
after removing the capturing substrings. Which is fine... Every
message having the same postID will be considered part of this
post.#Rice-Box@irc.enterthegame.com presents - yEnc "R-B__Yaiba - 06__XVID.part01.p02 [2/36]"
This matches rule 1, but the resulting string is absolutely not
unique through the post, because of the ".part01.p02"
thing. Luckily, the substring ".part01" matches rule 2, and thus
will be removed from the resulting string. The remaining ".p02"
matches rule 3 (as would match ".par", ".rar", ".r17", etc) and
will also be removed. Finally, the postID will be
#Rice-Box@irc.enterthegame.com presents - yEnc "R-B__Yaiba - 06__XVID"
which, once more, is fine :)
That's all for the doc. You'll have to do some experiments to find rules suitable for your newsgroups frequentations, or just use the default set!