What goes on in my registry when…

No Gravatar

Process Monitor

Ever asked yourself what actually happens when you perform a certain thing in a certain program? Which files are read, created, updated and what goes on in my registry.

The Sysinternals Process Monitor tool does this. Filter the events any way you need and you will be able to release registry updates quickly. I just used it to identify the registry changes performed as I changed the cache settings of IE, so a registry update file can be made and distributed to anyone who needs the same settings. It took about 10 minutes to learn to use Process Monitor and identify the key in question.

Magpie RSS/Snoopy problem finally solved

No Gravatar

MagpieRSS IS great – I like it a lot, and it almost does exactly what you need to aggregate RSS feeds from PHP. MagpieRSS has one achilles heel in Snoopy. MagpieRSS relies on Snoopy as the HTTP-client (browser-component) that fetches feeds from websites – be it RSS or Atom formatted – It fetches files from a webserver.

The latests Snoopy release was in 2005 (as was the latest MagpieRSS), which could indicate that the development community has abandoned the code. The problem with Snoopy is that it has low tolerance to the use of carriage-return and linefeed characters in the header of HTTP responses from webservers. For some requests that means certain files, though rich in content, appear empty if they are fetched using Snoopy.

At first I tried downloading the latest Snoopy release from Sourceforge. It seemed to do the trick for some feeds, but aparently not all. Arnold Daniels came to my rescue. He has devised a wrapper – on the “outside” it appears to be Snoopy – but on the inside PHP’s Curl library has taken the place of a lot of the original Snoopy code. Give his alternative library a spin if you are using MagpieRSS for anything the should be somewhat resistant to deviations from standards.

Generate PDF files from PHP

No Gravatar

For a web project I needed to create invoices, to be downloaded or e-mailed to clients in PDF format. The platform is PHP and the documentation for this almost exclusively describe PDFLib for this purpose. I am sure PDFLib is fine, and the tutorials that come with the package looks really good.

But..

You pay for PDFLib – I don’t mind paying for software, you should pay for usable high-quality stuff, which I am sure PDFLib is. US$ 995 is a little steep though – compared to free. The FAQ however mentions a few free alternatives. Right now I am looking into FPDF – which seem to do the trick.

Actually I found out that my previous ISP (invoicing their customers only by PDF invoices via E-mail) use FPDF. This tells me that FPDF is more than ready for production. I’ll let you know if I something disappoints me – although I would hate to bash a truly free initiative such as this one. Oh – and by the way it has another edge to PDFLib – It is written in PFP and does not need server reconfiguration to work. If you want a head start, there are quite an extensive list of code examples available.

How does my website look in ALL the other browsers?

No Gravatar

Wanna know how Mac og Linux users running rare browsers view your website? Then check out http://browsershots.org/. Queue up your URL and depending on the current load, the screenshouts will begin pouring in over the next couple of minutes. Why, oh why should we wait soo many years for a service like this?

The shots are of course static, so it is not possible for you to actually check how your site performs and functions without running the actual browser on the actual OS you want to test. It can however save you A LOT of time, getting rid of the most common errors early in the layout debugging process.

The site features quite a few languages, and more a being added all the time

MySQL query/indexing tip

No Gravatar

I originally learned about this MySQL shortcoming in the book High Performance MySQL, and luckily I remembered it for an issue that struck me today.

I have two tables – one containing blog-posts, and one containing a record for each tag on these posts. Now the requirement was to show a blog post AND to show the five latest related posts. Related posts in this scenario are other posts that share one or more tags with this post.

Now, it does not take one second to find a blog post, and it does not take one second to find tags related to it. And actually it does not even take a second to find a list of posts having a list of tags. The reason why all these requests take less than a second is because the tables have been supplied with efficient indexes – the table of blogposts has more than 3.5 million records and there are about 700.000 tags.

The problem was that when we tried to generate a list of related posts, the query took up to twelve seconds to run. It sucked and looked like this:

SELECT distinctrow i.id, i.item_title, i.item_url
FROM rss_items_master i
INNER JOIN rss_item_tag t ON i.id=t.item_id
WHERE t.item_tag IN (SELECT item_tag FROM rss_item_tag WHERE item_id=$itemid)
AND t.item_id != $itemid
ORDER BY i.item_date DESC
LIMIT 5;

In plain english we request a list of items that does not match the specific post, but they must have at least one tag that match one of the tags from the specific post. The nested SELECT-statement produce a list of tags to look for, and the outer SELECT statement pulls blog posts containing tags found in the nested SELECT. This does not perform, because only one index can be used per query – so either the index from the rss_item_tag table or from the rss_items_master table is ignored (i did not bother figuring out which one was ignored). To fix it, I split the SQL in two (PHP example):

$sql_tags = "SELECT item_tag FROM rss_item_tag WHERE item_id=$itemid;";
$arr_tags = query_array($sql_tags, $link);
// $tag_o will be an array of tags from the original post.
$tag_o = Array();
foreach($arr_tags as $tag_i) {
array_push($tag_o, mysql_real_escape_string($tag_i[0]));
}
$tags = implode("','", $tag_o);

Now $tags contain a list og tags to look for – actually it contains exactly what the nested SELECT-part did in the slow query. Now these two queries will actually complete in less than a second:

select distinctrow i.id, i.item_title, i.item_url
from rss_items_master i inner join rss_item_tag t on i.id=t.item_id
and t.item_tag in ('$tags')
and t.item_id != $itemid
order by i.item_date desc
limit 5;

So even though the SQL has been split in two, and even though we are passing a lot of information through variables that we have to build from the first query and added quite a lot of code – this outperforms the first single query by a factor larger than 10 – simply because it enables MySQL to use the indexes it should.

For more MySQL performance tips, check out MySQL Performance Blog.

Attention Profiles may filter news better

No Gravatar

Amazon profiles you by the books you buy. Google sponsored links are displayed by the contents of your gmail e-mails and general website interests. All “Big Brother” considerations aside – this is clever – or at least technically interesting. There are risks of evil marketeers profiling you down to the bone, but on the other hand you probably only see stuff that is interesting to you – as opposed to receiving a load of Viagra spam e-mails. I know – The fact that something really sucks, does not make something that sucks a bit less excellent..

A group is in the process of specifying the APML format. APML is short for Attention Profile Markup Language and will specify a fileformat to express your interests ranked. OPML can already be considered an APML subset describing your feed subscriptions, but APML i more of an aggregation of your interests also including e-mail, browser history and bookmarks.

An important thing about your attention profile is, that is has value – to you as well as to others. Check out AttentionTrust for more..

Light-weight XML editor

No Gravatar

XML Copy Editor LogoI was looking for an XML editor to perform a few simple tasks. Since i worked heavily with XML a few years ago, the Windows XML editor of choice is apparently still the Altova XML Spy. You can get it to do anything with XML (except – i think – transform XML to coffee). Negative minds would probably call it bloated and expensive. Positive minds would look for a cheap agile alternative.

Using the Open Source Alternative-site it is possible to find applications doing somewhat the same as commercial products. For the XML Spy query, the result is XML Copy Editor – a SourceForge application that has currently reached version 1.0.9. My needs were met as it:

  1. color-codes XML
  2. beautify/pretty-print XML with consistent indentations to increase readability
  3. opens files directly from URL’s

Also it XSLT transforms and has a small set of other features you may or may not need. Before you venture into extensive license bill-paying, you may try out this nifty little XML editor.

Why are Wiki’s so darn ugly?

No Gravatar

I am a Wiki-virgin. Not in the sense of being a wiki-reader/contributor, but today I set up my first Wiki which is about to go online. Wiki’s based on the MediaWiki software (which is also used for WikiPedia) are actually skinable, but as content usually is king in Wiki land, very few operators make an effort to change skin before going online. The well known default Monobook skin is used almost everywhere (you know – the one where the monochrome image of an book is the back-drop).

This has one obvious benefit: People recognize it as being a Wiki-type site immediately. Indeed a strong argument to keep things as they are. I may be running my mouth of as the green first day Wiki operator, and maybe I have actually used Wiki based sites that I did not recognize because it used some great unknown skin. But doesn’t the Wiki deserve it’s Kubrick skin – which was considered the defacto blog-look a few years back. People would look at the skin (which is beautiful) and say “Ahh – that’s a weblog”. These days I (and probably a few other) say “Yuck – It’s a Wiki”.

Michael Heilemann did such a great job with Kubrick. Isn’t it time for Wiki’s to have their own updated slick skin? Do you have it or know where to get it? There is one Wiki-skin I like: Cavendish. I would love to be able to create one myself, but my graphic skills are unfortunately only focused around my eyes – not in my hands.

Rebooting for the next couple of days

No Gravatar

Reboot 9Thursday and friday i will be at Reboot 9 – it’s the third time I attend – last time was two years ago, and the first time was actually at the first Reboot conference – pre-bubble, y2k and all. This years theme is “Human”. The focus is rarely technology, but future ways of thinking and working. This year one of the headliners are Dave Winer – a talk on friday I will forward to. I also expect this to be another great networking opportunity.

Windows ‘amp tip

No Gravatar

I was annoyed – but not anymore. I have been developing for Apache/MySQL on my laptop for a while, and in the beginning I would start the Apache and MySQL services automatically as I booted Windows. Recently I embarked to minimize the boot time of my Windows installation – and that included minimizing the services starting automatically – including Apache and MySQL. To start these two services, I enabled the System Tray icons for the two services and when I wanted to start the services I would right click the MySQL monitor and choose “Start instance”, and then left-click the Apache icon and click “Start” in the submenu.

This seems fairly simple, but if you try to navigate icons in the System Tray before everything is done loading, you will know that right-clicking something may give long delays, sometimes sub-menues disapear and you don’t know if it was because of your click, which it did not respond to or if your click was actually registered. So the tip is simple and low tech – perform the same thing in a .bat file:

startamp.bat:

net start mysql
net start apache2

stopamp.bat:

net stop apache2
net stop mysql

Put these on your desktop or quickstart menu..