Use pv as your MySQL import progress indicator

No Gravatar

There is only one relief for the pain of a long running command – a progress indicator, but commands only may or may not show their progress – and this is by design. They are good at one thing, and through regular pipes, the data can pass through a lot of steps. Processing a file may take time, but how much? Here is a description of how you would add a mysql progress indicator for data  imports. 

In my case, I had 11 Gigabytes of zipped MySQL database dump, that needed to be restored. The create and inserts by themselves does not take up a lot of time, but the amount of data was huge and there was no way of telling how complex the indexes were (and theywould add to the time needed for the mysql restore). This could take 1 hour, 3 hours or 3 days – I had no idea. To restore, I would simply do:

gunzip -c databasebackup-file.sql.gz | mysql -uxxx -p databasename

In my search for answers, quite a few of the search results mentioned pv – which you may have encountered, as some services actually uses a similar output to display progress like:

1.43GiB 1:43:13 [ 392KiB/s] [=========> ] 13% ETA 10:56:22

The restore above has processed 1.43 Gigabytes of data so far, that took an hour and 43 minutes and 13 seconds. The current throughput is 392 Kilobytes of processing per second. We are 13% complete, and it is estimated that the restore will take almost 11 hours more (ETA) to complete, based on the historic processing performance.

As the file is restored, the numbers are updated – one second at a time. In this particular case, there may be statements that run for a longer times than others, initially leading to quite a few “jumps” up and down in the ETA, but as time passes, the bumps will even out, and the ETA becomes more accurate. In this particular case the restore actually ended up taking just over 4 hours, but the percent indicator alone gives a great deal of usable insights.

So how is this done?

It is quite simple actually – instead of running

gunzip -c databasebackup-file.sql.gz | mysql -uxxx -p databasename

you would run

pv databasebackup-file.sql.gz | gunzip -c | mysql -uxxx -p databasename

The indicator starts running right away – even though you have to enter the database users’ password before the restore can begin. That may mess up the terminal, but after that you are good to go. Useful switches include:

– I, –fineta to display the estimated local time when done, instead of the ETA

-a Show the average data throughput – not the current

If you use pv for a database export, the size of the export is unknown until the export is complete, so pv cannot calculate an ETA, but if you know the expected size of the export, you can tell pv about that using the

-s SIZE, –size SIZE argument for pv which can help it calculate the ETA and percentage complete. Read the man page for further tips on using pv.

This is just one example of using pv to display the progress of an operation. You can easily use it for any operation using pipe, and even when copying large files or synchronizing using rsync.

It’s almost too easy with a Raspberry Pi

No Gravatar

I am part of – a local social media monitoring service built on a LAMP stack. We are the server itself, but also quite a few older laptops. They do have a couple of downsides though:

  1. Big (-ish)
  2. Clunky
  3. Noisy
  4. Power consuming
  5. Have mechanical harddisks

They each need PHP, cUrl and a MySQL client library installed in order to function – but then they are actually able to support our aggregation tasks quite nicely.

So the other day when a harddisk crashed, I came to think of the Raspberry Pi (popularly dubbed the $25 computer) – It actually set me back DKK 638,- including P&P from a danish webshop (a little more over $100), but that was only because I insisted on a cabinet, a power supply and an SD card. Still I can get three of these for the same price as 1 very low-end netbook. An our after the postman delivered the envelope, it was up and running aggregating away.

Raspberry Pi

From my laptop I downloaded the Raspian “wheezy” SD card image – it downloads a .zip file to unzip into a .img-file. On Mac and Linux the image can easily be copied to the SD card (but taking about 20 minutes). I used this process.

Once the image was downloaded I moved the SD card to theRaspberry Pi unit, plugged a keyboard into one of the USB ports, connected my TV through the HDMI connector and powered up. First boot took about 30 seconds and took me directly to a very nice configuration UI, I set the locale and timezone and asked the ssh-daemon to start on boot.

The Raspi Config UI

Next step was to shut down and move the box over to a LAN connection by my modem. Now only the LAN connection and the power supply was connected.

Coming from the Ubuntu world, installing PHP, cUrl and the MySQL client libraries was a question of logging on,  running

sudo apt-get update
sudo apt-get install php5 curl php5-curl php5-mysql mysql-client

Now I could simply copy my PHP code to the filesystem and set up cron jobs just as I would in Ubuntu.

UPDATE 2014-02-21: It has been almost a year since we started using Raspberry PI’s for our aggregation purposes. Since then, we’ve added a couple of pi’s for this specific purpose and retired all other aggregations machines, probably saving quite a bit on our power bill.


Workaround to Facebook’s Graph API Search locale change

No Gravatar

So Facebook decided to change the meaning of the locale parameter to the search parts of the Graph API – and label the decision “by design” after 2 months of thinking. Not “design change” or “bug we don’t feel like fixing” – just “If it worked before – that was a mistake”. The handling of the problem is described in this bug-report.

If you do not use it, you probably don’t care, but if you do this would probably be fatal to your searches.

The problem

I would like to know what Facebook users in my country (Denmark) writes about my local branch of the phone company Telia. To find out, I would ask the graph:

The locale-parameter would ensure that I only get results from danish Facebook sources. At least that was the case until December 5, 2012. After that, the locale-parameter changed role – now it will just localize certain field identifiers in the JSON reply. This is useless and all of a sudden I get heaps of irrelevant results from users in other countries in other languages. Facebook remains silent about what to do instead.

The workaround

For my projects I’ve used the following workaround – it’s Q&D, ugly and it takes up unnecessary resources, both for my own service and for Facebook, but it has been tested to work. One known drawback – if the same Facebook user mentions our search term more than once, only the last one is returned.

First – perform the search as always – in simple PHP (make sure to add your own validations):

$searchterm = 'telia';
$geturl = "" . urlencode($searchterm) . "&type=post";
$fb_reply = file_get_contents($geturl, 0, null, null);
$fb_arr = json_decode($fb_reply, TRUE);
$reply_data = $fb_arr['data'];

Now $reply_data contains an array of posts – something like:

 [0] => Array
   [id] => 10002333417237_486897434763199
   [from] => Array
     [name] => some name
     [id] => 100027398117237
  [message] => Missing home

Look at the [from]-section to figure out who posted the message. Actually you are going to need all the from->id’s from all posts in the reply from Facebook. Why? Because you want to extract the senders locale and filter out any posts that does not match the locale you require. I use something like this to create an array of Facebook uid’s:

$users = array();
foreach ($reply_data as $pkey => $post) {
 $uid = $post['from']['id'];
 $users[$uid] = $pkey;

Next we ask Facebook about the users using FQL. Actually we only want to know about users if they have our locale:

$FQL = sprintf("SELECT uid, name, locale FROM user WHERE uid IN (%s) AND locale = 'da_DK';", implode(array_keys($users), ','));
$fql_query = "" . urlencode($FQL);
$fql_result = file_get_contents($fql_query, 0, null, null);
$fql_users = json_decode($fql_result, TRUE);

Now $users is a one-to-one mapping from users to matching posts, we’ll “join” the filtered user list with the $users-array and have – a list of posts from users with the requested locale:

$valid_posts = array();
foreach($fql_users['data'] as $dk_user) {
 $uid = $dk_user['uid'];
 $postkey = $users[$uid];
 $valid_posts[] = $reply_data[$postkey];

The $valid_posts-array now contains posts matching the needed locale – somewhat like before December 5th 2012.


I wonder why Facebook took the decision they did. It is just asking for workarounds like this. Now instead of one Facebook Graph request I will make two AND Facebook gets to send me a lot more data than I need. The data is still public (for now anyway) – let’s see if it stays that way.

A valuable lesson in the Facebook API

No Gravatar

So – A client wanted to enable users to pick a friend from Facebook and buy them a birthday-present on their e-Commerce site.

Being sort of a n00b in the world of Facebook i thought Hey – that’s just a question of creating a Facebook App and list the friends using FQL that would pick friends and their birthdays. It looks something like:

SELECT uid,name,first_name,middle_name,last_name,sex,birthday_date 
FROM user 
  (SELECT uid2 FROM friend WHERE uid1 = me()) 
  AND birthday_date <> '' 
ORDER BY substr(birthday_date, 0,5)

I did it – and it worked. For me. And for me only. Huh? Yep – just for me.



Permission seemed adequate. It was indeed a mystery, until i actually *RTFM*. The top of this screen clearly states:

Use these settings to enter the set of permissions your app requires when displayed in App-center or using Auth Referrals

which apparently has nothing to do with what my app could do at the website. My click handler to perform the login now looks like this:

$('.eg_facebook_login').live('click', function() {
 FB.login(function(response) {
  if (response.authResponse) {
   FB.api('/me', function(response) {
     // .. get fb friend list
  } else {
   console.log('User cancelled login or did not fully authorize.');
 }, {scope:'user_about_me,user_birthday,friends_about_me,friends_birthday,friends_photos});

Turns out FB.login is unaware of the permissions set in my app, so I specifically have to ask for them at login using the scope-part of the opts-object. FB.login actually has documentation.

Twitter OAuth with PHP and cURL multi

No Gravatar

I am maintaining a website that aggregate a large number of Twitter accounts’ tweets. For this purpose we’ve been given one of the accounts with the high API request rate limit – usually it is possible to call the Twitter API’s 150 times every hour. We have a rate limit of 20.000 hourly requests and we use most of them to keep up a near real time picture of the activity of the user base.

Recently Twitter disabled regular basic HTTP authentication supporting only OAuth – which generally is a good thing, In this case we simply read the RSS-version of the user_timeline and OAuth seems overkill for reading a simple public time line.¬†Aaaanyways – we are using PHP with cURL multi and OAuth introduces a bit of extra code. Sure – there are plenty of OAuth API’s and OAuth API’s for Twitter out there, but the specific combination of PHP cURL multi and GET’ing user_timelines required a combination of Google’ing and a bit of guesswork.


– First¬†Register your application with Twitter – You’ll need a couple of tokens and secrets from your app-page and from the specific token page.
– If you need to make more than 150 requests per hour, apply for¬†whitelisting here. ¬†It’ll take a while for Twitter to process your request.
– Make sure you have cURL for PHP installed

Specifically preparing the GET connections for cURL presented a challenge – this piece of code did the trick for us:

// $url_array is an array of Twitter RSS-feed URL's that we need to read
$mh = curl_multi_init();
foreach($url_array as $i => $item) {
	// Keys and secrets from
	$consumer_secret = '<get your own from>';
	$token_secret = '<get your own from>';

	// Build ud an array of OAuth parameters
	$params = Array();
	$params['oauth_consumer_key'] = '<get your own from>';
	$params['oauth_token'] = '<get your own from>';
	$params['oauth_signature_method'] = 'HMAC-SHA1';
	$thetime = time();
	$params['oauth_timestamp'] = $thetime;
	$params['oauth_nonce'] = SHA1($thetime);
	$params['oauth_version'] = '1.0';

	// Sort the array alphabetically

	// Build the paramter string for the GET request
	$concatenatedParams = '';
	foreach($params as $k => $v)
	  $k = urlencode($k);
	  $v = urlencode($v);
	  $concatenatedParams .= "{$k}={$v}&";
	$unencodedParams = substr($concatenatedParams,0,-1);
	// URL-encode the parameters for signature use
	$concatenatedParams = urlencode(substr($concatenatedParams,0,-1));
	$signatureBaseString = "GET&".urlencode($item['url'])."&".$concatenatedParams;

	$params['oauth_signature'] = base64_encode( hash_hmac('sha1', $signatureBaseString, $consumer_secret."&".$token_secret, true) );

	// Initiate a new cURL connection and set up it's URL
	$conn[$i] = curl_init();
	curl_setopt($conn[$i], CURLOPT_URL, $item['url'] . '?' . $unencodedParams);
	curl_setopt($conn[$i], CURLINFO_HEADER_OUT, 1);

	// Build the HTTP header for the request
	$curlheader = Array();
	$curlheader[]='Content-Type: application/x-www-form-urlencoded';
	$curlheader[] = 'Authorization: OAuth';
	foreach($params as $ky=>$va) {
		$curlheader[] = "$ky=$va,\n";

	// Initiate a new cURL connection and assign URL + HTTP header to it
	$conn[$i] = curl_init($item['url']);
	curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $curlheader);
	curl_setopt($conn[$i], CURLOPT_HTTPGET, 1);

	// Add the connection to the cURL multi handle
	curl_multi_add_handle ($mh,$conn[$i]);

// Now execute curl_multi_exec on the $mh handle
This can no doubt be improved and secured in many ways, but since this specific example runs from as a cron-job/a PHP CLI Рnot on a server that replies to  inbound connections Рthe keys are not secured any further in this implementation.

Ubuntu 10.4 – up with only one quirk so far

No Gravatar

This morning I let my Ubuntu 9.10 upgrade itself to 10.4. The upgrade itself ran fine with only a few interruptions where I had to confirm or deny changes to configuration files for PHP etc.

As I landed at the office and hooked up the laptop (Thinkpad T61P) however, my external monitor was flickering a lot. Tried booting into Windows and everything was fine. Google’d and found this post on Ubuntu Forums.

The post suggests making changes to menu.lst in /boot/grub. This is a somewhat short term solution, because the problem will probably reappear at the next kernel upgrade (which could be just around the corner). The fix may be on it’s way in an actual Ubuntu fix, but i chose a different approach which may also seem hack’ish – the fix could be more robust though.

Disclaimer: If you are not sure what you are doing, be very careful. Misspellings or deviations from the description below may lead to an unbootable system. This worked for me and doesn’t have to work for you. Make sure to back up any files you change.

OK? Let’s go..

First start a terminal (ALT-F2, enter “gnome-terminal”):

cd /etc/grub.d


sudo nano 10_linux

which will open an editor. Around line 96 inside the linux_entry-function is a line that looks like this:

linux    ${rel_dirname}/${basename} root=${linux_root_device_thisversion} ro ${args}

and it should be changed to:

linux    ${rel_dirname}/${basename} root=${linux_root_device_thisversion} ro radeon.modeset=0 ${args}

Having done this, you should ask grub to rebuild it’s configuration files with:

sudo update-grub

Reboot and go on without the flicker..

Will ideas of the Massachusetts law spread?

No Gravatar

On reddit, I found this article regarding a new law i Massachusetts that is to increase the security of personally identifiable information.

If you have personally identifiable information (PII) about a Massachusetts resident, such as a first and last name, then you have to encrypt that data on the wire and as it’s persisted. Sending PII over HTTP instead of HTTPS? That’s a big no no. Storing the name of a customer in SQL Server without the data being encrypted?  No way, Jose. You’ll get a fine of $5,000 per breach or lost record. If you have a database that contains 1,000 names of Massachusetts residents and lose it without the data being encrypted that’s $5,000,000. Yikes.

The idea is good (yay – better security for anyone registering on a website) and bad (expensive). As a consultant in a country not covered by the law I could care less, but is this in general a good idea? Securing data and sending sensitive information across secure connections is always a good idea, and it could hit the EU soon. Documenting how you secure the Personally Identifiable Information (PII)? Who can argue with that being a good idea? As an IT consultant you won’t hear me cry myself to sleep.

But the proportions of the legislation seem unclear. If you log on to my website (securely of course), and I feed you the following pages through clear text HTTP – can I be fined if your name appear on the page as in “Reader XYZ is logged in”?

I guess so.

It strikes me that there is no level of sensitivity defined – anything considered personal must be secured. As a legislator, it seems very easy to do this when you don’t have to pick up the bill.

If this kind of legislation should hit Europe, I hope someone would elaborate a bit on the do’s and dont’s:

  • Are all kinds of data included? Can I be fined if you, as a Massachusetts citizen, post a comment to this and include your name? (the database is secured, but nothing is sent through HTTPS)
  • Would it make sense to allow users to wave their rights, and thereby allow users to work with online applications that are not 100% secure?

Karmic Koala – post install notes

No Gravatar

Installed Ubuntu 9.10 a few days ago. Everything seemed OK (very very nice actually) except for a little network performance glitch that may also bother you.√Ǭ† Canonical has once again taken Linux one step further away from the status of geek-challenge to a user friendly alternative to other OS’es.

One thing bothered me though – general network responsiveness seemed seriously degraded. Digging around i found that IPv6 had taken precedence to IPv4 in a few ways – one being DNS lookup sequence. Apparently all lookups was attempted through IPv6 first. My router and network is in no way configured for IPv6 and therefore every connection-attempt to uncached hosts would have to wait for the IPv6 timeout. Two things gave me responsiveness back.

  1. Disable IPv6 at the OS level
  2. Disable IPv6 in Firefox (should give more responsiveness for any OS not running on an IPv6 network)

Maybe you would think that the first step would be enough, but Firefox seems more responsive after step 2. Follow this step to disable IPv6 in Firefox.

Ubuntu 9.10 introduces the GRUB2 bootloader on clean installations only. The upgrade process from, say version 9.04, will not upgrade GRUB and will leave you with the previous version.

To disable IPv6 OS wide, i followed this sequence and voila – responsiveness returned to my netbook (and soon to my other Ubuntu installations).

2009-11-22 update: Had to downgrade to 9.04 again – No matter what I did, which forums I visited – I could not get my Huawei 3G Modem to connect.

IE7 content disappears

No Gravatar

This is not a complaint – it is the solution to an issue that has been puzzling me over the weekend.

IE8 has been out for a long time – which does not necessarily mean that corporations has implemented it yet. Some large corporations tend to make their business depend on current browser versions – not allowing them to upgrade the browsers on their pc’s without extensive testing and some time expensive application upgrades. I know of at least 5 danish companies who still doesn’t dare to take the plunge from IE6 (!)

I usually develop web-stuff having booted Ubuntu running Firefox – then once in a while boot Windows and test with the latest IE, Safari, Chrome and Firefox. I leave it to others to test on Mac’s. If I am unsure how something renders on other platforms, I can recommend – a cool service that renders a specific URL in any browser on any platform and allow you to see the screenshots after a while.

In this case the client is at IE7. The layout has a header, a content area and a footer – very simple stuff. It looks fine i all browsers – except for IE7. For this particular browser the content section wouldn’t display. Even IE7 displayed the area. The space was allocated but it was blank. This article on gave a few tips:

  1. add
    * { zoom: 1.0; }

    to the css (this did not make a difference for me)

  2. add
    position: relative;

    to the


    in the css

Tip 2 made all the difference. #wrapper is a div that wraps around all the content on the page. Not knowing the inner workings of IE7, I can only guess why this fixed the issue – just hope Google finds this post for you if you run into the same problem.

Windows 7 – First learning points

No Gravatar

Installed Windows 7 RC2 for the first time yesterday.

In Virtual Machine

First installation in a VirtualBox – which worked fine except for the VirtualBox graphics adapter not being accelerated – which meant Solitaire ran extremely slow. Apart from that – no problems.

Dual/triple booting

Once the gut feeling was OK, i made room for Windows 7 on my laptop that already holds a Windows XP and an Ubuntu 9.04 partition. Installing Windows 7 leaves the existing OS’es alone, but the boot manager only supports Windows file-systems. My old GRUB bootloader would allow me to choose between booting Linux or Windows, but after the Windows 7 install, i could no longer boot Linux. The partition is still on the disk, but not in the MS bootloader. A bunch of hours later that was fixed – using the Ubuntu installer, GRUB was installed again – but I am still not quite satisfied; From GRUB i can boot into the MS bootloader. I now have two steps:

  1. Choose between booting Linux and Windows and (if choosing Windows)
  2. Choose between Windows XP and 7

I’ll try to fix this soon, but the priority of fixing this just dropped as I can now boot into all 3 OS’es.

One other thing: 3G USB modem

The last thing I needed was to the my 3 – 3G Huawei modem to work. In short: It didn’t – and it may not be because of Windows 7. Apparently the modem needs some Adobe Flash stuff and the drivers on the modems flashmemory are not Flash 10-compatible. 3 in Denmark knows and offers a download to fix the issue – It did the trick for me.

This post is my first written in Windows 7. Will post further findings..

Oh – and the wallpapers are beeeee-aaaaa-utiful!