Category: Code & Development

Blog posts primarily relating to bits of code, especially any tutorials or projects I put together.

The CASE of the Conditional Records

I’m working on a change request for a web service tied to an application we support at work, and I ran into one requirement (out of many) which I didn’t immediately know how to resolve. Maybe it was just the way it was worded in the spec, but it had me scratching my head for a good few hours before I got to the heart of the matter.

This post is mostly a reminder to myself to take a step back, and work through a problem in a notebook or similar; go back and forward a few times, from different angles until the puzzle is clear and you can find a way to approach it. Once I worked through it, and figured out exactly what was being asked for, the problem itself turned out to be significantly easier than I had first imagined.

The Problem

I have two linked tables of journey information: 1 for the journey record, one for each leg of that journey. For each leg, there could be up to 2 records, the planned leg (P), and the actual leg (A). When we return the results of the query, we want a row combining the flight information and the leg information, for every leg. The kicker was we never want to return a P leg record, when an A leg record existed. But we don’t know if an A leg record exists before we run the query.

[Side note – I am massively simplifying the data structure, and some of the problem for clarity and confidentiality]

 ---------------------------    ------------------------------------ 
| journey_id | journey_date |  | journey_id | leg_number | leg_type |
 ---------------------------    ------------------------------------ 
|     1      |  01/07/2015  |  |     1      |      1     |    P     |
 ---------------------------    ------------------------------------ 
|     2      |  01/07/2015  |  |     1      |      1     |    A     |
 ---------------------------    ------------------------------------ 
|     3      |  01/07/2015  |  |     1      |      2     |    P     |
 ---------------------------    ------------------------------------ 

So from the above data we want to return the following:

 ------------------------------------------------------- 
| journey_id | journey_date |  leg_number  |  leg_type  |
 ------------------------------------------------------- 
|     1      |  01/07/2015  |       1      |     A      |
 ------------------------------------------------------- 
|     1      |  01/07/2015  |       2      |     P      |
 ------------------------------------------------------- 

And not this, which a straightforward JOIN would produce.

 ------------------------------------------------------- 
| journey_id | journey_date |  leg_number  |  leg_type  |
 ------------------------------------------------------- 
|     1      |  01/07/2015  |       1      |     P      |
 ------------------------------------------------------- 
|     1      |  01/07/2015  |       1      |     A      |
 ------------------------------------------------------- 
|     1      |  01/07/2015  |       2      |     P      |
 ------------------------------------------------------- 

Obviously we need a WHERE clause in there, but like I said, we don’t know what records exist for each leg, so most common clauses aren’t going to give desirable results.

Effectively we are conditionally returning a record only if a certain other, nearly identical record in the same table does not exist – otherwise we want to return that record instead.

Additional Restraints

  1. Everything has to come from running a single SQL query against a table/set of tables. We can’t utilise Oracle packages or procedures. The query is embedded in the application and called as a prepared statement (to feed in the “real” filtering criteria later). We can use functions, but should be mindful of the performance implications
  2. We can’t do any “post-processing” of the data in the web service application code. Well, we could, but it’d be costly and clunky to do in the current framework.
  3. Everything should (obviously) be as performant as possible.

Solution

First Attempt

An infrequently used, but powerful, part of SQL is the CASE construct:

CASE
	WHEN foo THEN bar
	ELSE baz
END

By utilising a CASE statement and some subqueries, I could check if a row was in the actual legs of a journey, or whether it was a planned leg and there was no matching actual leg:

SELECT  j.journey_id, j.journey_date,
		jl.leg_number, jl.leg_type
FROM    journies j, journey_legs jl
WHERE
(CASE
	WHEN (jl.journey_id, jl.leg_number) IN
	     (SELECT journey_id, leg_number
	         FROM flight_legs
	        WHERE journey_id = fl.journey_id
	          AND leg_number = fl.leg_number
	          AND leg_type = 'A') THEN
	 (1)
	WHEN ((fl.journey_id, fl.leg_number) NOT IN
	     (SELECT journey_id, leg_number
	         FROM flight_legs
	        WHERE journey_id = fl.journey_id
	          AND leg_number = fl.leg_number
	          AND leg_type = 'A') AND fl.leg_type = 'P') THEN
	 (1)
	ELSE
	 (0)
END) = 1;

Execution time: 1.8 seconds

Second Attempt

A nice start, the above query returns what we want, but it’s a bit bloated. Do we need the first subquery at all? No, it turns out:

SELECT  j.journey_id, j.journey_date,
		jl.leg_number, jl.leg_type
FROM    journies j, journey_legs jl
WHERE
(CASE
	WHEN (jl.leg_type = 'A') THEN
	 (1)
	WHEN ((fl.journey_id, fl.leg_number) NOT IN
	     (SELECT journey_id, leg_number
	         FROM flight_legs
	        WHERE journey_id = fl.journey_id
	          AND leg_number = fl.leg_number
	          AND leg_type = 'A') AND fl.leg_type = 'P') THEN
	 (1)
	ELSE
	 (0)
END) = 1;

Execution time: 1 second

Final Attempt

Awesome, we are getting the data we want, the query is running at an acceptable pace for the number of records we have… but it’s kinda clunky looking. We want to try simplify it if possible so it’s easier for anyone who comes after us to maintain. Is it possible to rewrite this to take out the CASE, and make it more readable? Of course:

SELECT  j.journey_id, j.journey_date,
		jl.leg_number, jl.leg_type
FROM    journies j, journey_legs jl
WHERE (jl.leg_type = 'A' OR
       (jl.leg_type = 'P' AND
        (jl.flight_number, jl.leg_order) NOT IN
       	(SELECT journey_id, leg_order
             FROM journey_legs
            WHERE journey_id = jl.journey_id
              AND leg_order = jl.leg_order
              AND leg_type = 'A')));

Execution time: 0.85 seconds

That’s a bit better for anyone who maybe isn’t familiar with CASE to understand, and has a bit of hierarchy to it so you can mostly see at a glance what it’s doing. I don’t think we can get rid of the subquery, but sometimes these things are unavoidable.

Synchronising GitHub and an Internal Git server

Note: I found this mini How-To while having a clean-up of my GitHub repositories. I figured it would be worth sharing on my blog. Hopefully it is of use to someone. If you want to play around with the steps, but don’t want to use one of your existing projects, you can use this repository.


The Problem

  1. I have my repository hosted on GitHub
  2. I have an internal Git server used for deployments
  3. I want to keep these synchronised using my normal workflow

Getting Started

Both methods I’ll describe need a “bare” version of the GitHub repository on your internal server. This worked best for me:

cd ~/projects/repo-sync-test/
scp -r .git user@internalserver:/path/to/sync.git

Here, I’m changing to my local working directory, then using scp to copy the .git folder to the internal server over ssh.

More information and examples this can be found in the online Git Book:

4.2 Git on the Server – Getting Git on a Server

Once the internal server version of the repository is ready, we can begin!

The Easy, Safe, But Manual Method:

        +---------+          +----------+       /------>
        | GitHub  |          | internal | -- deploy -->
        +---------+          +----------+       \------>
             ^                     ^
             |                     |
             |     +---------+     |
             \-----|   ME!   | ----/
                   +---------+

This one I have used before, and is the least complex. It needs the least setup, but doesn’t sync the two repositories automatically. Essentially we are going to add a second Git Remote to the local copy, and push to both servers in our workflow:

In your own local copy of the repository, checked out from GitHub, add a new remote a bit like this:

git remote add internal user@internalserver:/path/to/sync.git

This guide on help.github.com has a bit more information about adding Remotes.

You can change the remote name of “internal” to whatever you want. You could also rename the remote which points to GitHub (“origin”) to something else, so it’s clearer where it is pushing to:

git remote rename origin github

With your remotes ready, to keep the servers in sync you push to both of them, one after the other:

git push github master
git push internal master
  • Pros: Really simple
  • Cons: It’s a little more typing when pushing changes

The Automated Way:

        +---------+         +----------+        /------>
        | GitHub  | ======> | internal | -- deploy -->
        +---------+         +----------+        \------>
             ^
             |
             |              +---------+
             L------------- |   ME!   |
                            +---------+

The previous method is simple and reliable, but it doesn’t really scale that well. Wouldn’t it be nice if the internal server did the extra work?

The main thing to be aware of with this method is that you wouldn’t be able to push directly to your internal server – if you did, then the changes would be overwritten by the process I’ll describe.

Anyway:

One problem I had in setting this up initially, is the local repositories on my PC are cloned from GitHub over SSH, which would require a lot more setup to allow the server to fetch from GitHub without any interaction. So what I did was remove the existing remote, and add a new one pointing to the https link:

(on the internal server)
cd /path/to/repository.git
git remote rm origin
git remote add origin https://github.com/chrismcabz/repo-syncing-test.git
git fetch origin

You might not have to do this, but I did, so best to mention it!

At this point, you can test everything is working OK. Create or modify a file in your local copy, and push it to GitHub. On your internal server, do a git fetch origin to sync the change down to the server repository. Now, if you were to try and do a normal git merge origin at this point, it would fail, because we’re in a “bare” repository. If we were to clone the server repository to another machine, it would reflect the previous commit.

Instead, to see our changes reflected, we can use git reset (I’ve included example output messages):

git reset refs/remotes/origin/master

Unstaged changes after reset:
M   LICENSE
M   README.md
M   testfile1.txt
M   testfile2.txt
M   testfile3.txt

Now if we were to clone the internal server’s repository, it would be fully up to date with the repository on GitHub. Great! But so far it’s still a manual process, so lets add a cron task to stop the need for human intervention.

In my case, adding a new file to /etc/cron.d/, with the contents below was enough:

*/30 * * * * user cd /path/to/sync.git && git fetch origin && git reset refs/remotes/origin/master > /dev/null

What this does is tell cron that every 30 minutes it should run our command as the user user. Stepping through the command, we’re asking to:

  1. cd to our repository
  2. git fetch from GitHub
  3. git reset like we did in our test above, while sending the messages to /dev/null

That should be all we need to do! Our internal server will keep itself up-to-date with our GitHub repository automatically.

  • Pros: It’s automated; only need to push changes to one server.
  • Cons: If someone mistakenly pushes to the internal server, their changes will be overwritten

Credits

Setting Up Chef

I just finished setting up Chef, to have a play around with this DevOps stuff I keep hearing about. While Chef is quite well documented, I found myself struggling in places where things weren’t quite clear enough. So naturally, I’m posting how I got myself up and running.

[Note: I haven’t actually done anything with this setup yet, other than get it working.]

Step One: Get A Server

There are 2 parts to a Chef install: client and server. You can run them all on one machine, but given how much Chef slows down my Joyent VM, I’d suggest keeping it off of your day-to-day workstation.

I used my Joyent credit to setup a new Ubuntu 12.04 64-bit server. Chef server only supports Ubuntu or RedHat/CentOS 64-bit. Once the server was provisioned, I followed this 5-minute guide to lockdown the server enough for my needs (this being just an experiment and all…)

Step Two: Set the Server FQDN

Once the server is prepared, make sure it has a resolvable, fully qualified domain name before going any further. While the Chef docs make mention of this, they do so after the rest of the setup instructions. This was one area I was banging my head against for ages, wondering why the built-in NginX server wasn’t working.

Setting the hostname on my Joyent VM was a case of running:

    $ sudo hostname 'chef.example.com'
    $ echo "chef.example.com" | sudo tee /etc/hostname

As I wasn’t on the same network as my Chef server, I added a DNS A record to match the server FQDN.

Step Three: Install Chef Server

This bit was really easy, probably the easiest part of the whole setup. In short: download the latest Chef Server package for your platform, install the package, run the reconfigure tool. In my case, this was:

    $ wget https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chef-server_11.0.10-1.ubuntu.12.04_amd64.deb
    $ sudo dpkg -i chef-server_11.0.10-1.ubuntu.12.04_amd64.deb
    $ sudo chef-server-ctl reconfigure

The Chef installer will whirr away, using Chef to setup your new installation automatically. How cool is that?

Step Four: Copy Server Certificates to Your Workstation

This wasn’t mentioned anywhere I could see, but I figured it out from some snippets written around the web. To successfully setup the Chef client, you need some security certificates from your new server. I used SCP from my local PC:

    $ scp user@chef.example.com:/etc/chef-server/admin.pem ~/tmp/
    $ scp user@chef.example.com:/etc/chef-server/chef-validator.pem ~/tmp/

If you find you don’t have permission to copy directly from their default location, SSH to the server and sudo copy them to somewhere you can.

Step Five: Install the Chef Client

Now we should be armed with everything we need to install the client tools. I’m using the Debian-derived Crunchbang, but any *NIX-based OS should be roughly the same as below. If you’re on Windows, I’m afraid you’re on your own.

Run the “Omniinstaller” for Chef:

    $ curl -L https://www.opscode.com/chef/install.sh | sudo bash

Create a .chef folder in your home directory, and add the certificates copied from the server

    $ mkdir ~/.chef
    $ cp ~/tmp/*.pem ~/.chef

Configure Knife (the main Chef CLI utility):

    $ knife configure --initial
    WARNING: No knife configuration file found
    Where should I put the config file? [/home/chris/.chef/knife.rb] /home/chris/.chef/knife.rb
    Please enter the chef server URL: [https://localhost:443] https://chef.example.com:443
    Please enter a name for the new user: [chris]
    Please enter the existing admin name: [admin]
    Please enter the location of the existing admin's private key: [/etc/chef-server/admin.pem] /home/chris/.chef/admin.pem
    Please enter the validation clientname: [chef-validator]
    Please enter the location of the validation key: [/etc/chef-server/chef-validator.pem] /home/chris/.chef/chef-validator.pem
    Please enter the path to a chef repository (or leave blank):
    Creating initial API user...
    Please enter a password for the new user:
    Created user[chris]
    Configuration file written to /home/chris/.chef/knife.rb

Test Knife by listing all users:

    $ knife user list
    admin
    chris

Wrap Up

That’s it! You now have a working Chef installation. Or at least, I do. Steps two and four are the steps I had to hunt out and piece together myself to get Chef up and running. Everything else is more or less as documented.

All that’s left to do now is figure out how to use Chef!

Learning New Stuff: Vim

It started as a throwaway comment:

But like all interesting ideas, it took root in my brain, and I started to wonder “no, really, what if…”

This morning I woke up deciding to finally learn Vim. It’s been on my “to-do” list for quite a while – years, in fact – my old boss used to be a bit of a Vim guru, who could edit file on the Linux boxes 10x faster than me using GEdit, which I used to marvel at as a junior.

Beyond the “code from my iPad” idea above, I have a couple more reasons for learning Vim:

  • Vim is near universal. It runs on pretty much any platform, and it’s standard on virtually any Linux distro
  • Since switching back to Linux, I spend half my time in the CLI anyway – Vim cuts down the need to switch away to something like Sublime
  • Having a CLI-based workflow reduces the amount of “stuff” I need to set-up on a computer to write some code

But coming back to the iPad, I’m quite keen to get this working because I’ll be moving house soon, and it’s more than likely I won’t have space for my current PC set-up, and a new laptop is out of my budget for now. So if this works it will let me keep tinkering on projects even without a “proper” computer. I’ve been in this position before, so I know the limitations. This time, however, I’m using an iPad Mini and a Logitech Bluetooth keyboard case.

My Set-up/Learning Resources

To get myself up and running, I’ve installed and configured Vim on my PC, and will install GVim on my work laptop. I’m going to try and use it in place of Sublime Text Editor as much as I can, but give no promises when it comes to anything work-related, as that generally needs done fast.

I’ve fired up a Ubuntu box on Joyent (starting with the first 5-minutes post I linked to last week), and replicated my PC’s Vim configuration, so I can start working from the iPad straight away. I installed Git, and authorised the box with my Github account, so I can push and pull to my heart’s content.

As I’m just starting out, the configuration I’ve gone for is very basic and minimal, but giving me room to grow as I get more experienced. Essentially, I’ve installed Pathogen, and applied their recommended beginner defaults.

For learning resources, I’ve found there’s a heck of a lot out there for Vim. So much so, I’ve chosen to limit myself for now so I don’t get overwhelmed.

  • The Pragmatic Programmers books are somewhere I always check when I want to learn something new. I find their books to be practical, well written, and informative. True to form, Practical Vim (from what I’ve read so far) is an excellent introduction to “real world” Vim.
  • Vim Adventures is a neat, interactive learning tool, which turns learning Vim into an 16-bit adventure game.
  • VimCasts has 50 free screencasts, dealing – unsurprisingly – with learning Vim

Feel free to suggest some more good resources though – as I get more experience I’ll need to branch out into other areas!

Licensing Your Open Source Code

I received an email from a developer the other day, who had forked the repository for my “IIS Express Here” shell extension on GitHub. He had noticed there was no license information available in the project, so asked if I could either add a license, or give him written permission to adapt my code and share it to others (as is the spirit of GitHub and OSS).

To be honest, this wasn’t something I’d thought about before, and was a bit of an oversight on my part. I’d not really considered the need to add explicit licenses to my repositories. After all, the code is out there anyway – it’s open to use on GitHub, and I’ve often shared it on this blog… if someone wanted to copy the code, they could, right?

Unfortunately, this creates a grey-area, which some are naturally uncomfortable with. Can I use this code in something else? Can I modify it at all? Do I have to pay royalties if I do?

But licensing is hard, isn’t it? All the different types, with different caveats, liabilities, and legal mumbo-jumbo… well, yes, it can be hard. The good folks at GitHub have a solution: ChooseALicense.com is attempting to demystify open source licenses so you can pick the right one for your project. More than this, when you create a new repository on GitHub, the site will ask if you want to add a template license during the initialisation process:

repo_licenses

Coming back to the developer who emailed me – I mailed him back to let him know that IIS Express Here is now licensed under the MIT license. This fits best with how I see the code and projects I share on this blog (unless noted otherwise) – free for anyone else to use, but with no warranty, so if something goes wrong then I’m not liable and it’s not my responsibility to fix it. This license has also been added to Pseudonym. I haven’t got around to updating all of my repos with licenses, as I’m evaluating each one in turn, based on my goals and even whether the project is going to archived.

ChooseALicense.com

Pseudonym for Jekyll

Earlier this summer I started messing around with Jekyll. In the end I didn’t migrate to it, but I at least gained a little knowledge.

One way that I tried to learn more about how Jekyll works, was by creating a test site and building a theme. That theme sat locked away in a private Github repository. Until now.

I’ve just opened up the source for Pseudonym, the name I gave to the project. It’s really very basic, but I hope at least somebody gets some use out of it.

You can find the demo site at http://dev.pseudonym.xxx/ (yes, I own a .XXX domain), and source can be found in the Github repository, here.

dev.pseudonym.xxx

A few notes and attributions (some copied from the readme)

  • I was also experimenting with Grunt.js. There’s a rudimentary gruntfile included, but the setup is incomplete. It was designed to build, concat and minify LESS files into CSS, various JavaScript files, and generate multiple sizes of the header images.
  • The theme is slightly responsive, but more work needs to be done here. Only tested on/targeted at desktop, iPad Mini, and my Nexus 4.
  • Header images are from Unsplash
  • Icons are by Font Awesome
  • I was messing around about with some newer CSS properties – the theme uses CSS columns in many places. While I’ve made it as cross-browser as I could in the time I spent on it, IE doesn’t look as pretty.
  • I’ve used Zepto, with a fallback to jQuery for IE.

Run Coder for Raspberry Pi on Your Linux PC

That cool little “Coder for Raspberry Pi” project from Google which I linked to earlier doesn’t just run on Raspberry Pi. You can run it on any old Linux PC (Mac works too, but the instructions are slightly different).

I set it up in less than 2 minutes using these commands (note that I’m running Debian Sid):

    sudo useradd -M pi
    sudo apt-get install redis-server
    cd ~/projects
    git clone https://github.com/googlecreativelab/coder.git
    cd coder/coder-base
    npm install
    npm start

Node.js is also a requirement, so if you don’t have that, you’ll need to install that at step 2 as well.

Once everything is up and running, point your browser at https://localhost:8081/. You’ll need to specify a password the first time you run Coder, after which you’ll be able to try the environment out. It’s pretty neat, and the sample clone of Asteroids is quite addictive!

Developers and “Ring Rust”

Skills are much like muscles: if you don’t use them for a while they start to atrophy. They say you never forget how to ride a bike, but there are many skills where you will forget things if you don’t do them frequently. The collection of skills needed to be a developer are no exception to the rule.

I’m somewhat speaking from experience here; my current role and workload has removed me from day-to-day development work for about a full year now. I still need to dive in to the code base every day to research issues or change requests, but actually writing something is quite rare these days. I’m aware of the skills problem, and I’ll describe below how I’m trying to address it, but never the less I’ve been self-concious enough about it I’ve recently found myself resisting taking on development tasks. I know it’ll take me a lot longer to get up to speed and complete as one of the developers who’re working on the application every day, and the time-scales involved are usually very tight. It’s a vicious circle: I’m rusty because I’m not doing development, but I’m avoiding development because I’ve been away from it for too long. In the corporate world it’s very easy to get rail-roaded into a niche – and incredibly hard to get out of it.

Time away for a developer is exacerbated by the speed in which technology and techniques moves forward in our industry. What was cutting edge a year-ago is old-hat today, and may even be something you’re encouraged not to do any more. If you haven’t been practising and keeping up developments then you may not be aware and get yourself into all sorts of bother.

So what can you do?

Read. Lots.

Subscribe to a load of developer sites and blogs in Feedly, for one source, but a more convenient way I’ve found to stay on top of things is using Flipboard:

  • Follow other developers on Twitter (actually, you don’t have to, but it’s nice to), and create/add them to a list, such as “Developers & News“.
  • Within Flipboard, add your Twitter account if you haven’t already.
  • Still within Flipboard, go to your Twitter stream. Tap your name at the top and select “Your Lists.”
  • Open the relevant list, then tap the subscribe button.

Your list will be added to your Flipboard sources and you’ll have an always-up-to-date magazine of what’s happening. The reason I suggest Flipboard is that it grabs the link in a tweet, pulls in the article, and will try to reformat it into something you can easily flip through. It makes reading on a tablet so much more enjoyable. Some of the links you get will not be relevant, but a large amount of it will be gold. I try to set aside 30 minutes a day to go through at least the headlines. If work is exceptionally busy I’ll aim for twice a week. Saving to a “Read it Later” service like Pocket is useful for storing the most interesting articles.

What about books? Yes, by all means, read plenty of technical books. They’re usually in far more depth than even the best online article. With tablets, eReaders, and eBooks, the days of thick tomes taking up lots of space are behind us, and no longer a major concern (at least for me). There is however, one major issue with books – they take a long time to write, and are often out of date quickly. The technology might have moved on by the time the book is published. Schemes such as the Pragmatic Programmer’s “Beta Book” scheme help a lot here – releasing unfinished versions of the book quickly and often, to iron out problems before publishing. Of course, you also need to be aware of the topic to be able to pick out a book about it!

Be Curious. Experiment.

Reading all the material in the world will not help you anywhere near as much as actually doing something. The absolute best thing you could do would be to develop side projects in your spare time. Admittedly, if you’re busy, time can be at a premium! Probably a good 99% of side projects I start lie unfinished or abandoned, simply for lack of time. So instead, I perform small experiments.

Curious about something? Do something small to see how it works, or “what happens if…”. Personal, recent, examples would be:

  • Looking into static site generators, and as a result, learning about Jekyll, Github pages for hosting… and as a result of trying out Jekyll templates I brushed up on Responsive Web Design, looked into Zepto, and fell in love with Less.
  • Trying out automating development workflows – installed Node.js (which then allowed me to run this), setup some basic Grunt.js tasks, Imagemagick batch processing, and some more Less.
  • Running Linux as my primary OS, and no Windows partition to fall back on – so in at the deep-end if something goes wrong… but it’s helped me brush up on my MySQL and Apache admin skills again, as well as generally working with the command-line again. The other week I fixed someone’s VPS for them via SSH  – something I would have struggled to do only a few weeks ago. In case you’re interested: the disk was filling up due to an out of control virtual host error log, which I had to first diagnose, and then reconfigure logrotate to keep the site in check.

An earlier example, from before I was entirely away from development: I wanted to see what was different in CodeIgniter 2, so I made a very small app. My curiosity then extended into “how does Heroku work?” – so I deployed to Heroku. I couldn’t pay for a database I knew how to work with, so I tried out a little bit of MongoDB. Then it was the Graph API from Facebook… so again, I extended the application, this time with the Facebook SDK.

Little experiments can lead to a lot of learning. I would never claim to be an expert in any of the technologies I mention, but neither am I ignorant.

Shaking it Out

I’d still need a major project to focus on and really shake off the “ring rust,” to get back up to full development potential, but I’m pretty confident it wouldn’t take as long as if I hadn’t been working on the trying to keep my skills as fresh as I can.