Skip to main content.
May 1st, 2007

Extended holiday business

Since I had some spare days, I spent quite a lot of time on reconfiguring things on my computers. I moved the postal system I had on the laptop to the server (I’m moving many things there, since I will soon change my old Gericom), mostly through this wonderful tutorial (the only change I applied was to use procmail instead of Courier’s maildrop because I already had my whole set of filters). When I tried to send post to the outside, I had quite a lot of perplexities on why it didn’t work, until I discovered Tele2 blocks outgoing SMTP traffic (port 25). I tried to use their server as relay, but got lost on the TLS/SSL settings, which I couldn’t get right (both using postfix ones, and stunnel, simply resulted on a successful connection but no response from the server at all to the HELO), and I gave up. I’ll try it at a later time.

I did a further change to the configuration of the tutorial. My plan was to make automatic the system of learning new spam mail, and mail classification, through the use of Maildir’s folders. “Spam” folder is dedicated to spam the user finds, and “Quarantine” to mails the filter thinks as spam. To do the filtering, it was enough to add a couple of procmail rules (these are then put in /etc/skel/.procmailrc so every new user benefits of it):

SPAM=$MAILDIR/.Quarantine/
PROBABLY_SPAM=$MAILDIR/.Quarantine/

# Mails with a score of 15 or higher are almost certainly spam (with 0.05%
# false positives according to rules/STATISTICS.txt). Let’s put them in a
# different mbox. (This one is optional.)
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
$SPAM

# All mail tagged as spam (e.g. with a score higher than the set threshold)
# is moved to “probably-spam”.
:0:
* ^X-Spam-Status: Yes
$PROBABLY_SPAM

(as you can see, SPAM and PROBABLY_SPAM are the same folder, because this way things are simpler, but the user can distinguish the cases if he wants to). The auto-learning part was a little harder. To force the maildir structure, I opted for setting it in the /etc/skel file and making the change by hand or almost-by-hand on already existing users (which were few, luckily). The autolearning is made by the script in attachment. Once put in the /etc/cron.daily directory, it does all the work. There are yet a couple of things to adjust: spam messages could be removed once analysed (spamassassin recognize messages already analysed, but it wastes time on them), and a big inbox (from which spamassassin learns the ham) can make it waste a lot of time. On a few-users system, this is no problem, but I bet it could become once in production.

Then, I corrected a couple of things that didn’t work; e.g.: my ipv6 tunnel on eliot was not working correctly, since I could ping out but not ping in: I noticed Shorewall was filtering out all strange protocols, included proto 41, which is exactly the one you use to open up ipv6-on-ipv4 tunnels - luckily Shorewall has a “tunnels” configuration file where you set the hosts from which / to which you are tunnelling, and everything went fine. Mail on university was blocked, so I discovered why, and then started to get again all that nice spam I love so much. Oh, well, new material for spamassassin to learn from.

But most of the time, I passed on this blog. As I said in the previous post, I chose to theme it with something simple and nice to my taste. I worked a little to ensure everything was XHTML-compliant (in fact only a “<br>” to change into “<br/>” was the problem; some more on my old posts), and the rest of time in plugin choice and installation.

XHTML Validator: through tidy and/or xmllint, checks the posts you write for XHTML-correctness, and checks all your old posts are XHTML valid, shows you possible errors and propose you to use tidy to automatically correct. In fact, one of my pages chocked over a not-well-encoded URL (it did use ampersands without entities), but tidy did more damages than good, so I had to correct it by had. Well. XHTML Validator validates only the posts content, not the template, so be sure to validate it on your own.

Google Sitemaps: Google propose a collaborative way for crawling sites: you provide a sitemap in XML, so Google can do less work (no links discovery by Googlebot) and check your pages more frequently (I hope). Moreover, go to the webmasters’ tools section of Google. Man, they’re doing really a whole lot of work for us. Where’s their reward? Simple: did anybody tell you we live in the society of information? And how many informations is gaining google through all this…?

OpenID Registration: OpenID is a decentralized, open identity system, invented by LiveJournal’s creator. It allows you to trust personal informations and passwords only to one site, then all the further authentications toward others websites will work automatically passing through the trusted one, using a challenge-response method. This was, the site you want to login on will never have your password, but will simply ask the OpenID site where you register to confirm on your behalf your identity. Substantially, an ingenious use of actual web architecture to implement an automatic single-sign on system. So, through this good post (and after a couple of links, this one too) I learned a way to activate OpenID authentication on this blog and how to use itself as my identifier (everything’s a URL, in the web vision, and so are you… don’t you are an URL yet? :-)). The plugin is forcedly hackish: WordPress do not provide a general authentication system (as for the UserFolder concept in Zope), so the places where you can login (login and comments form) are trapped and it tries to add correct xhtml in the correct place by guess. In the standard theme it works well, otherwise you have to change the template by hand. I did it only for the comments, since login page was untouched by this theme and could safely use the plugin hack; look for the “OPENID” string in the code to get the idea of what things I had to add to enable it.

So, all in all… I admit I have added quite a lot of functionalities in a relatively little time, without major fusses, thanks to Wordpress. So, if my idea on the architecture behind remains quite negative, I admit the great user base make it for sure a valuable product, thanks to all the plugins and themes supported. This doesn’t mean I’ll stop developing my own framework :-)

Posted by mattia as smtp, spam, wordpress at 3:36 PM CEST

No Comments »

April 12th, 2007

Doubts

Talking about DNS and reverse DNS the other day at lesson, one of the guys asked me about what reverse DNS is good for. In fact, the only useful example I could come up with, was the reverse resolution of the SMTP HELO command used to avoid easy spammers. Is there anything else it is really useful for? I bet it is, but I can’t think for what!

Another thing. Discussions about incredibly-expressive grammars and fast parsers is always on. E.g., we have context-free grammars, but LL(1), LR(1), LALR(1) parsers (and so on) which manage to rapidly parse them. Ok, but: is it my impression or the cost of parsing is a minor factor in the overall equation of compilation, and most of the time is passed into optimization, parsing tree traversals, bytecode interpretation, and so on? In that case, why we don’t use more powerful parsers (like Earley-likes, whose modifications run almost always in linear time too and can parse any context-free grammar - even ambiguous ones - without any modification)? I suppose it’s worthier to write grammars easily and correctly rather than trying to adapt that C grammar to LALR(1) or stuff to gain that small percentage of time…

Posted by mattia as dns, grammars, smtp, spam at 10:45 PM CEST

No Comments »