I suppose that if you pass by this blog, you’re geek. I meant, a programmer. Excuse me, sometimes I forget the difference.
What difference?
I’m going out topic. I just wanted to underline a thing: when we validate a page as XHTML 1.0 Strict or 1.1 with W3C Validator or similar validators, we’ve not done all the work.
Why? Because we often forget one thing: the Content-Type.
Content-Type is one of the HTTP headers from the server. To be exact, it specify the MIME type of the object returned. On the GET call from the browser, we have the corresponding Accept header which tells us which kind of Content-Type the client can accept, so at server side possible transformations can occur to provide the best one. E.g.: if a browser does know nothing about SVG, it won’t send in the Accept header the value “image/svg+xml” (which tells us: we’re talking about an image, whose type is both svg and xml - in fact svg is just an XML application, so this definition is correct). So, it could be possible for the server, in theory, if it finds a client that cannot handle SVG, to perform locally the transformation into another non-vectorial type and send it: this process is called content negotiation. The client is always right, and, moreover, gets what it asks. It all boils down, at the end, at the old, and somehow still valid, philosophy of internet: dumb client, intelligent server.
Back to XHTML. As for specs, XHTML 1.1 should not be sent with Content-Type (aka media type) text/html, which is the standard for all HTML, but with type application/xhtml+xml (as you can see, the syntax is analogout to image/svg+xml), and it seems that XHTML 2.0 must be sent with Content-Type application/xhtml+xml (I’m referring to standard w3c/ietf terminology, look at here for the exact meaning). So, it seems simple: we change the headers sent according to the way our framework allows us to do it, and we’re fully compliant.
Well, no. Sadly, Internet Explorer, event the new-born 7 serie, does not accept that type of content-type, showing it without style (well, I couldn’t reproduce it under wine, but I assure you it’s this way). The reason is explained here. I can’t say there’s no good reason on it, but the fact remains. So?
As I said, the Accept header in the request it’s there to tell which kinds the browser accept, and make the server adapt. But in this case is just too easy: valid XHTML 1.1 is also valid HTML, so a downgrade to text/html is always correct for every XHTML 1.1 document without any change (I admit I’m not completely sure this is true by standard, but surely is true by common browser implementations)! So, it’s enough to decide the reply Content-Type header according to the Accept types.
(To speak the truth, normal content negotiation, which here I cut down to the minimum, cannot be applied as for standard, otherwise, you could send anyway application/xhtml+xml to explorer and use text/html instead for Opera and Safari - this is because their Accept header isn’t completely coherent with the philosophy behind it, and this is the result)
Notice one more thing: text/html and application/xhtml+xml are really different, if you understand the logic behind. text/html doesn’t claim to be XML, application/xhtml+xml do. This means that if a text/html isn’t correct XHTML, it’s okay anyway for a logical browser engine (spelled as Gecko, or KHTML), whereas if an application/xhtml+xml is no XML in truth (one common problem: the use of non-entity ampersand - & - thanks to default PHP setting arg_separator.output; change it to & to get a sensible result), then we have a syntax error. And, for example, Firefox/Mozilla is really picky about this: if you have a wrong XHTML page served as text/html, it will fall back in normal mode and parse it anyway (e.g., the ampersand error above isn’t even recognized by W3C Validator!), but if you serve it as application/xhtml+xml, it shows you a menacing page with nothing more but an error:

But this is useful. Lax HTML had proliferated, making life harder and harder for automatic tools to be written correctly, just because browser accepted it. But if they stop, because you’ve declared a strict media type, no more wrong code will start going on the net. Moreover, HTML is full of non-semantic, visual-targetted tags, whereas XHTML stands on the semantic side. Since you now validate to a certain doctype, we make sure that we also stop using non-semantic tags. All in all, we both need power as much as restrictions to make things go the right way, and since both can be automated, both are. And we’ll all be finer.
To make the long story short: for Wordpress you can find this über-simple plugin (and, as many über-simple things, well working: KISS philosophy) which does exactly the browser-dependant media type workaround: XHTML Mime Plugin. And then, you’ll be 100% XHTML compliant. Ah, what a geek thing!
Posted by mattia as apache, wordpress, xhtml at 3:06 PM CEST
No Comments »
Since I had some spare days, I spent quite a lot of time on reconfiguring things on my computers. I moved the postal system I had on the laptop to the server (I’m moving many things there, since I will soon change my old Gericom), mostly through this wonderful tutorial (the only change I applied was to use procmail instead of Courier’s maildrop because I already had my whole set of filters). When I tried to send post to the outside, I had quite a lot of perplexities on why it didn’t work, until I discovered Tele2 blocks outgoing SMTP traffic (port 25). I tried to use their server as relay, but got lost on the TLS/SSL settings, which I couldn’t get right (both using postfix ones, and stunnel, simply resulted on a successful connection but no response from the server at all to the HELO), and I gave up. I’ll try it at a later time.
I did a further change to the configuration of the tutorial. My plan was to make automatic the system of learning new spam mail, and mail classification, through the use of Maildir’s folders. “Spam” folder is dedicated to spam the user finds, and “Quarantine” to mails the filter thinks as spam. To do the filtering, it was enough to add a couple of procmail rules (these are then put in /etc/skel/.procmailrc so every new user benefits of it):
SPAM=$MAILDIR/.Quarantine/
PROBABLY_SPAM=$MAILDIR/.Quarantine/
# Mails with a score of 15 or higher are almost certainly spam (with 0.05%
# false positives according to rules/STATISTICS.txt). Let’s put them in a
# different mbox. (This one is optional.)
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
$SPAM
# All mail tagged as spam (e.g. with a score higher than the set threshold)
# is moved to “probably-spam”.
:0:
* ^X-Spam-Status: Yes
$PROBABLY_SPAM
(as you can see, SPAM and PROBABLY_SPAM are the same folder, because this way things are simpler, but the user can distinguish the cases if he wants to). The auto-learning part was a little harder. To force the maildir structure, I opted for setting it in the /etc/skel file and making the change by hand or almost-by-hand on already existing users (which were few, luckily). The autolearning is made by the script in attachment. Once put in the /etc/cron.daily directory, it does all the work. There are yet a couple of things to adjust: spam messages could be removed once analysed (spamassassin recognize messages already analysed, but it wastes time on them), and a big inbox (from which spamassassin learns the ham) can make it waste a lot of time. On a few-users system, this is no problem, but I bet it could become once in production.
Then, I corrected a couple of things that didn’t work; e.g.: my ipv6 tunnel on eliot was not working correctly, since I could ping out but not ping in: I noticed Shorewall was filtering out all strange protocols, included proto 41, which is exactly the one you use to open up ipv6-on-ipv4 tunnels - luckily Shorewall has a “tunnels” configuration file where you set the hosts from which / to which you are tunnelling, and everything went fine. Mail on university was blocked, so I discovered why, and then started to get again all that nice spam I love so much. Oh, well, new material for spamassassin to learn from.
But most of the time, I passed on this blog. As I said in the previous post, I chose to theme it with something simple and nice to my taste. I worked a little to ensure everything was XHTML-compliant (in fact only a “<br>” to change into “<br/>” was the problem; some more on my old posts), and the rest of time in plugin choice and installation.
XHTML Validator: through tidy and/or xmllint, checks the posts you write for XHTML-correctness, and checks all your old posts are XHTML valid, shows you possible errors and propose you to use tidy to automatically correct. In fact, one of my pages chocked over a not-well-encoded URL (it did use ampersands without entities), but tidy did more damages than good, so I had to correct it by had. Well. XHTML Validator validates only the posts content, not the template, so be sure to validate it on your own.
Google Sitemaps: Google propose a collaborative way for crawling sites: you provide a sitemap in XML, so Google can do less work (no links discovery by Googlebot) and check your pages more frequently (I hope). Moreover, go to the webmasters’ tools section of Google. Man, they’re doing really a whole lot of work for us. Where’s their reward? Simple: did anybody tell you we live in the society of information? And how many informations is gaining google through all this…?
OpenID Registration: OpenID is a decentralized, open identity system, invented by LiveJournal’s creator. It allows you to trust personal informations and passwords only to one site, then all the further authentications toward others websites will work automatically passing through the trusted one, using a challenge-response method. This was, the site you want to login on will never have your password, but will simply ask the OpenID site where you register to confirm on your behalf your identity. Substantially, an ingenious use of actual web architecture to implement an automatic single-sign on system. So, through this good post (and after a couple of links, this one too) I learned a way to activate OpenID authentication on this blog and how to use itself as my identifier (everything’s a URL, in the web vision, and so are you… don’t you are an URL yet? :-)). The plugin is forcedly hackish: WordPress do not provide a general authentication system (as for the UserFolder concept in Zope), so the places where you can login (login and comments form) are trapped and it tries to add correct xhtml in the correct place by guess. In the standard theme it works well, otherwise you have to change the template by hand. I did it only for the comments, since login page was untouched by this theme and could safely use the plugin hack; look for the “OPENID” string in the code to get the idea of what things I had to add to enable it.
So, all in all… I admit I have added quite a lot of functionalities in a relatively little time, without major fusses, thanks to Wordpress. So, if my idea on the architecture behind remains quite negative, I admit the great user base make it for sure a valuable product, thanks to all the plugins and themes supported. This doesn’t mean I’ll stop developing my own framework :-)
Posted by mattia as smtp, spam, wordpress at 3:36 PM CEST
No Comments »
Ok, I wrote the template of my blog the way I liked. I didn’t consider I would have used wordpress, and this showed out. So, got the first theme that give me some good vibes and Real Simplicity ™, and used that.
One day RedFrame will be out, and I’ll use that. One day.
Posted by mattia as wordpress at 6:41 PM CEST
No Comments »
Oh, what a nice start for a techie blog. Insulting one of the most used blogging technologies, which has given us all that semantic stuff almost for free.
Ok, now, please, taking a nice XHTML template, and filling it with php which makes out more XHTML which perhaps doesn’t conform at all with your idea of what you want to put inside, splitting it in many files, is not doing stuff in a semantic way, with clear separations of concers.
ZPT are syntax separared from semantic. PHP is not. I want something more: we’re in 2007, it’s the time to change our way of thinking web designing.
But I’ll take more on this rambling another time. It’s a bit too late now.
Posted by mattia as php, wordpress, xhtml, zpt at 2:02 AM CEST
No Comments »