It’s almost too easy with a Raspberry Pi

No Gravatar

I am part of Overskrift.dk – a local social media monitoring service built on a LAMP stack. We are the server itself, but also quite a few older laptops. They do have a couple of downsides though:

  1. Big (-ish)
  2. Clunky
  3. Noisy
  4. Power consuming
  5. Have mechanical harddisks


They each need PHP, cUrl and a MySQL client library installed in order to function – but then they are actually able to support our aggregation tasks quite nicely.

So the other day when a harddisk crashed, I came to think of the Raspberry Pi (popularly dubbed the $25 computer) – It actually set me back DKK 638,- including P&P from a danish webshop (a little more over $100), but that was only because I insisted on a cabinet, a power supply and an SD card. Still I can get three of these for the same price as 1 very low-end netbook. An our after the postman delivered the envelope, it was up and running aggregating away.

Raspberry Pi

From my laptop I downloaded the Raspian “wheezy” SD card image – it downloads a .zip file to unzip into a .img-file. On Mac and Linux the image can easily be copied to the SD card (but taking about 20 minutes). I used this process.

Once the image was downloaded I moved the SD card to theRaspberry Pi unit, plugged a keyboard into one of the USB ports, connected my TV through the HDMI connector and powered up. First boot took about 30 seconds and took me directly to a very nice configuration UI, I set the locale and timezone and asked the ssh-daemon to start on boot.

raspi-config
The Raspi Config UI

Next step was to shut down and move the box over to a LAN connection by my modem. Now only the LAN connection and the power supply was connected.

Coming from the Ubuntu world, installing PHP, cUrl and the MySQL client libraries was a question of logging on,  running

sudo apt-get update
sudo apt-get install php5 curl php5-curl php5-mysql mysql-client

Now I could simply copy my PHP code to the filesystem and set up cron jobs just as I would in Ubuntu.

UPDATE 2014-02-21: It has been almost a year since we started using Raspberry PI’s for our aggregation purposes. Since then, we’ve added a couple of pi’s for this specific purpose and retired all other aggregations machines, probably saving quite a bit on our power bill.

 

A valuable lesson in the Facebook API

No Gravatar

So – A client wanted to enable users to pick a friend from Facebook and buy them a birthday-present on their e-Commerce site.

Being sort of a n00b in the world of Facebook i thought Hey – that’s just a question of creating a Facebook App and list the friends using FQL that would pick friends and their birthdays. It looks something like:

SELECT uid,name,first_name,middle_name,last_name,sex,birthday_date 
FROM user 
WHERE uid IN 
  (SELECT uid2 FROM friend WHERE uid1 = me()) 
  AND birthday_date <> '' 
ORDER BY substr(birthday_date, 0,5)

I did it – and it worked. For me. And for me only. Huh? Yep – just for me.

 

 

Permission seemed adequate. It was indeed a mystery, until i actually *RTFM*. The top of this screen clearly states:

Use these settings to enter the set of permissions your app requires when displayed in App-center or using Auth Referrals

which apparently has nothing to do with what my app could do at the website. My click handler to perform the login now looks like this:

$('.eg_facebook_login').live('click', function() {
 FB.login(function(response) {
  if (response.authResponse) {
   FB.api('/me', function(response) {
     // .. get fb friend list
   });
  } else {
   console.log('User cancelled login or did not fully authorize.');
  }
 }, {scope:'user_about_me,user_birthday,friends_about_me,friends_birthday,friends_photos});
});

Turns out FB.login is unaware of the permissions set in my app, so I specifically have to ask for them at login using the scope-part of the opts-object. FB.login actually has documentation.

Twitter OAuth with PHP and cURL multi

No Gravatar

I am maintaining a website that aggregate a large number of Twitter accounts’ tweets. For this purpose we’ve been given one of the accounts with the high API request rate limit – usually it is possible to call the Twitter API’s 150 times every hour. We have a rate limit of 20.000 hourly requests and we use most of them to keep up a near real time picture of the activity of the user base.

Recently Twitter disabled regular basic HTTP authentication supporting only OAuth – which generally is a good thing, In this case we simply read the RSS-version of the user_timeline and OAuth seems overkill for reading a simple public time line.¬†Aaaanyways – we are using PHP with cURL multi and OAuth introduces a bit of extra code. Sure – there are plenty of OAuth API’s and OAuth API’s for Twitter out there, but the specific combination of PHP cURL multi and GET’ing user_timelines required a combination of Google’ing and a bit of guesswork.

Preparations

– First¬†Register your application with Twitter – You’ll need a couple of tokens and secrets from your app-page and from the specific token page.
– If you need to make more than 150 requests per hour, apply for¬†whitelisting here. ¬†It’ll take a while for Twitter to process your request.
– Make sure you have cURL for PHP installed

Specifically preparing the GET connections for cURL presented a challenge – this piece of code did the trick for us:

<?php
// $url_array is an array of Twitter RSS-feed URL's that we need to read
$mh = curl_multi_init();
foreach($url_array as $i => $item) {
	// Keys and secrets from http://dev.twitter.com/apps/
	$consumer_secret = '<get your own from http://dev.twitter.com/apps/_your_app_id_/>';
	$token_secret = '<get your own from http://dev.twitter.com/apps/_your_app_id_/my_token>';

	// Build ud an array of OAuth parameters
	$params = Array();
	$params['oauth_consumer_key'] = '<get your own from http://dev.twitter.com/apps/_your_app_id_/>';
	$params['oauth_token'] = '<get your own from http://dev.twitter.com/apps/_your_app_id_/my_token>';
	$params['oauth_signature_method'] = 'HMAC-SHA1';
	$thetime = time();
	$params['oauth_timestamp'] = $thetime;
	$params['oauth_nonce'] = SHA1($thetime);
	$params['oauth_version'] = '1.0';

	// Sort the array alphabetically
	ksort($params);

	// Build the paramter string for the GET request
	$concatenatedParams = '';
	foreach($params as $k => $v)
	{
	  $k = urlencode($k);
	  $v = urlencode($v);
	  $concatenatedParams .= "{$k}={$v}&";
	}
	$unencodedParams = substr($concatenatedParams,0,-1);
	// URL-encode the parameters for signature use
	$concatenatedParams = urlencode(substr($concatenatedParams,0,-1));
	$signatureBaseString = "GET&".urlencode($item['url'])."&".$concatenatedParams;

	$params['oauth_signature'] = base64_encode( hash_hmac('sha1', $signatureBaseString, $consumer_secret."&".$token_secret, true) );

	// Initiate a new cURL connection and set up it's URL
	$conn[$i] = curl_init();
	curl_setopt($conn[$i], CURLOPT_URL, $item['url'] . '?' . $unencodedParams);
	curl_setopt($conn[$i], CURLINFO_HEADER_OUT, 1);

	// Build the HTTP header for the request
	$curlheader = Array();
	$curlheader[]='Content-Type: application/x-www-form-urlencoded';
	$curlheader[] = 'Authorization: OAuth';
	foreach($params as $ky=>$va) {
		$curlheader[] = "$ky=$va,\n";
	}

	// Initiate a new cURL connection and assign URL + HTTP header to it
	$conn[$i] = curl_init($item['url']);
	curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $curlheader);
	curl_setopt($conn[$i], CURLOPT_HTTPGET, 1);

	// Add the connection to the cURL multi handle
	curl_multi_add_handle ($mh,$conn[$i]);
}

// Now execute curl_multi_exec on the $mh handle
This can no doubt be improved and secured in many ways, but since this specific example runs from as a cron-job/a PHP CLI Рnot on a server that replies to  inbound connections Рthe keys are not secured any further in this implementation.

Testing: WP Super Cache

No Gravatar

Seems like a very cool plugin to enhance the userexperience – performance wise, plus it seems to prepare WordPress blogs for high traffic spikes.

Bug found: Links to posts from archives seems to cache PHP-code – not the resulting HTML. There’s gotta be a simple fix. Was a bit quick there. The bug is in my theme – not in WP Super Cache.

Edit: Also – the issues may have to do with my host who seems to mess with their user database right now..

Magpie RSS/Snoopy problem finally solved

No Gravatar

MagpieRSS IS great – I like it a lot, and it almost does exactly what you need to aggregate RSS feeds from PHP. MagpieRSS has one achilles heel in Snoopy. MagpieRSS relies on Snoopy as the HTTP-client (browser-component) that fetches feeds from websites – be it RSS or Atom formatted – It fetches files from a webserver.

The latests Snoopy release was in 2005 (as was the latest MagpieRSS), which could indicate that the development community has abandoned the code. The problem with Snoopy is that it has low tolerance to the use of carriage-return and linefeed characters in the header of HTTP responses from webservers. For some requests that means certain files, though rich in content, appear empty if they are fetched using Snoopy.

At first I tried downloading the latest Snoopy release from Sourceforge. It seemed to do the trick for some feeds, but aparently not all. Arnold Daniels came to my rescue. He has devised a wrapper – on the “outside” it appears to be Snoopy – but on the inside PHP’s Curl library has taken the place of a lot of the original Snoopy code. Give his alternative library a spin if you are using MagpieRSS for anything the should be somewhat resistant to deviations from standards.

Generate PDF files from PHP

No Gravatar

For a web project I needed to create invoices, to be downloaded or e-mailed to clients in PDF format. The platform is PHP and the documentation for this almost exclusively describe PDFLib for this purpose. I am sure PDFLib is fine, and the tutorials that come with the package looks really good.

But..

You pay for PDFLib – I don’t mind paying for software, you should pay for usable high-quality stuff, which I am sure PDFLib is. US$ 995 is a little steep though – compared to free. The FAQ however mentions a few free alternatives. Right now I am looking into FPDF – which seem to do the trick.

Actually I found out that my previous ISP (invoicing their customers only by PDF invoices via E-mail) use FPDF. This tells me that FPDF is more than ready for production. I’ll let you know if I something disappoints me – although I would hate to bash a truly free initiative such as this one. Oh – and by the way it has another edge to PDFLib – It is written in PFP and does not need server reconfiguration to work. If you want a head start, there are quite an extensive list of code examples available.

MySQL query/indexing tip

No Gravatar

I originally learned about this MySQL shortcoming in the book High Performance MySQL, and luckily I remembered it for an issue that struck me today.

I have two tables – one containing blog-posts, and one containing a record for each tag on these posts. Now the requirement was to show a blog post AND to show the five latest related posts. Related posts in this scenario are other posts that share one or more tags with this post.

Now, it does not take one second to find a blog post, and it does not take one second to find tags related to it. And actually it does not even take a second to find a list of posts having a list of tags. The reason why all these requests take less than a second is because the tables have been supplied with efficient indexes – the table of blogposts has more than 3.5 million records and there are about 700.000 tags.

The problem was that when we tried to generate a list of related posts, the query took up to twelve seconds to run. It sucked and looked like this:

SELECT distinctrow i.id, i.item_title, i.item_url
FROM rss_items_master i
INNER JOIN rss_item_tag t ON i.id=t.item_id
WHERE t.item_tag IN (SELECT item_tag FROM rss_item_tag WHERE item_id=$itemid)
AND t.item_id != $itemid
ORDER BY i.item_date DESC
LIMIT 5;

In plain english we request a list of items that does not match the specific post, but they must have at least one tag that match one of the tags from the specific post. The nested SELECT-statement produce a list of tags to look for, and the outer SELECT statement pulls blog posts containing tags found in the nested SELECT. This does not perform, because only one index can be used per query – so either the index from the rss_item_tag table or from the rss_items_master table is ignored (i did not bother figuring out which one was ignored). To fix it, I split the SQL in two (PHP example):

$sql_tags = "SELECT item_tag FROM rss_item_tag WHERE item_id=$itemid;";
$arr_tags = query_array($sql_tags, $link);
// $tag_o will be an array of tags from the original post.
$tag_o = Array();
foreach($arr_tags as $tag_i) {
array_push($tag_o, mysql_real_escape_string($tag_i[0]));
}
$tags = implode("','", $tag_o);

Now $tags contain a list og tags to look for – actually it contains exactly what the nested SELECT-part did in the slow query. Now these two queries will actually complete in less than a second:

select distinctrow i.id, i.item_title, i.item_url
from rss_items_master i inner join rss_item_tag t on i.id=t.item_id
and t.item_tag in ('$tags')
and t.item_id != $itemid
order by i.item_date desc
limit 5;

So even though the SQL has been split in two, and even though we are passing a lot of information through variables that we have to build from the first query and added quite a lot of code – this outperforms the first single query by a factor larger than 10 – simply because it enables MySQL to use the indexes it should.

For more MySQL performance tips, check out MySQL Performance Blog.

Windows ‘amp tip

No Gravatar

I was annoyed – but not anymore. I have been developing for Apache/MySQL on my laptop for a while, and in the beginning I would start the Apache and MySQL services automatically as I booted Windows. Recently I embarked to minimize the boot time of my Windows installation – and that included minimizing the services starting automatically – including Apache and MySQL. To start these two services, I enabled the System Tray icons for the two services and when I wanted to start the services I would right click the MySQL monitor and choose “Start instance”, and then left-click the Apache icon and click “Start” in the submenu.

This seems fairly simple, but if you try to navigate icons in the System Tray before everything is done loading, you will know that right-clicking something may give long delays, sometimes sub-menues disapear and you don’t know if it was because of your click, which it did not respond to or if your click was actually registered. So the tip is simple and low tech – perform the same thing in a .bat file:

startamp.bat:

net start mysql
net start apache2

stopamp.bat:

net stop apache2
net stop mysql

Put these on your desktop or quickstart menu..