The return of the Dull Stack Engineer.

Every day we get emails from recruiters who are looking for a devops engineer to help their customer.

When we ask them if they are looking for a developer or an operations person they have no clue what to answer. Because a devops engineer doesn't exist.

DevOps Engineer is not a title

The DevOps community has a long time ago established that DevOps is not a job title, it's not a Java developer running production, and it's not an Ops girl patching java code. Neither is it the person in charge of tooling. And it certainly isn't an engineer working in a DevOps team.

But for a lot of recruiters and some IT professionals that are still working in a a traditional top/down environment and who want to become more agile, a #devops engineer simply sounds better than a senior Linux engineer, which is what they typically are looking for.

The Myth of the Full Stack Engineer
And when that mythical all in one DevOps engineer that could both write code and design, run and manage platforms couldn't be found, they narrowed their search margin.

They started looking for a Full Stack Engineer, who pretty much needed to be capable of doing the same. Or not... Because most so called full stack engineers are actually MEAN stack engineers. Engineers that typically barely understand the stack he is really working with aka the Mean Stack... Which is the stack composed of MongoDB, Express.js AngularJS and Node.js.

What a full stack engineer really means is an engineer who has experience in all components of the stack that its application touches. From the Linux kernel, to the networking, the middleware and the database components, all the way up to the web server and the frontend application.

But how many people do you know that both have their names in the Linux Kernel, can performance tune a SQL stack, debug a Java Stack-trace and even build customer facing web applications ?

Indeed, we don't know that many people with those skills or experiences And the people that mark all the checkboxes have their experience stretched over multiple decades... They didn't do all of those things together.

So both the concept of the Full Stack Engineer and the #devops engineer have come with a common problem : those people really don't exist...

The Knowledge is in the team

Building cross-functional application teams, it is about getting people with all of those skills to collaborate with each other and understanding each other. It's not about that one human being doing all the things.

Hype Driven Development

The second pattern we've seen is the concept of hype driven, conference driven, or CV driven development. Three variants of the same problem.

A developer forgets about the operational requirements for the applications being built, forgets about the non-functional requirements and jumps on the latest and newest fancy technology he/she wants to learn. Either because he/she just heard about it at a conference, because the tool he/she wants to use is really really being hyped. Or just because they want to build experience with that tool so they can add it to their CV to get that next job they want to have.

Often tools need a couple of releases to mature before they are ready to use in production, and before they are stable enough to trust with your data. It's not uncommon that frequently used tools lack crucial functionality so it can be used in a production environment. Things like support for good metrics and monitoring, or consistent data snapshots for backups.

But they jump on the hype... We take the fancy new tool and act like the cool kids. The fact that this new tool doesn't even come close to solving the business need they have, the fact that it actually often widens the gap between developers and operations people isn't important to them.

We fall into the trap and forget the most important #DevOps principle: Collaboration to bring business value.

Boring is Powerful, Boring is stable

The container movement of the past 4 years is often sadly a good example of this tendency to be driven by hype. Many organizations have jumped on Docker and other container tools, thinking that a tool adoption will help them become more "DevOps" and use microservices to gain speed and innovate faster.

Bridget Kromhout has been quoted to say "Putting your monolith in a container does not make it a micro service" - yet it's the default adoption pattern for a lot of Enterprises. They believe that containers are going to save them and solve all their development problems.

This isn't what #devops is about. DevOps was never about fancy new devops tools. Have we all forgotten one of the key principles of agile is "People over processes and tools" ?
DevOps was from the start about bringing value to the end user, the business, the customer.

And yes new technology is fun, but it doesn't mean it's stable or performant. Like Jon Topper mentioned at DevopsDays London:

"Boring is Powerfull, boring is stable"

A platform that is boring often means that there aren't any bleeding edges around it. It often means that it is more stable. Think about an engineer who is on call. He wants to be bored. He does not want to being paged every other hour.

And here is the Dull Stack Engineer

So imagine a number of people at a conference discussing this problem, joking about those Full Stack engineers with their fancy new tools who fail to collaborate. And by accident saying "Dull stack Engineer"

This is when I realized that this is actually what we want. People who care about the business and collaborating with their peers, rather than jumping on the latest and shiniest new tooling. A Dull Stack engineer.

The concept being born on a swing at a conference in Bucharest, quickly started to live it's own life. More engineers started taking pictures of themselves on a swing and endorsing others for Dull Stack engineering on LinkedIn.

Building stable platforms that brings value to your business can happen with boring old tools, there's nothing wrong with your team if they don't go for fancy containers but stick with trusted automation in VM's or even bare metal.

Because DevOps is about collaborating with your peers, it's not about tools, and once we realize that... The on-call engineers might be bored again :)

AWS Support and leaked credentials

Once you have enough people each working in multiple accounts it becomes a waiting game until you’ll eventually get the dreaded “Your AWS account 666 is compromised.” email. As someone who’s been using AWS since S3 arrived this is the first time I’ve encountered this so I thought I’d write up some notes about what actually happens.

First comes the easy, but not recommended, part of the whole experience; push some credentials to GitHub. You get bonus points, well faster discovery anyway, if you use the perfect combination of export AWS_ACCESS_KEY and export AWS_SECRET_ACCESS_KEY or a literal add of your .aws/credentials file. As all AWS_ACCESS_KEYs begin ‘AKIA’ I assume they scan the fire hose for commits containing that string.

Once the AWS scanning systems detect an access key in the wild, which in our case took them mere minutes, You’ll receive an email to your accounts contact address, and if you’ve got enterprise support a phone call too. The email looks like this:

Amazon Web Services has opened case 111111111 on your behalf.

The details of the case are as follows:

Case ID: 111111111
Subject: Your AWS account 6666666666666 is compromised
Severity: Urgent
Correspondence: Dear AWS Customer,

Your AWS Account is compromised! Please review the following notice and
take immediate action to secure your account.

Your security is important to us. We have become aware that the AWS
Access Key AKIATHEEVILHASESCAPED (belonging to IAM user
"ohgodpleasenotme") along with the corresponding Secret Key is publicly
available online at
https://github.com/deanwilson/aws-creds-test-repo/blob/11b1111d1

If, like me, it’s the first time you’ve seen one of these you might experience a physical flinch. Once the initial “Oh god, think of the Lambdas!” has passed and you read on you’ll find the email is both clear and comprehensive. It guides you through checking your accounts activity, deleting the root accounts keys (which you don’t have, right?) and all AWS access keys in that account that were created before the breach happened. Because we have quite good role and account isolation we made the repository private for future investigation, confirmed we don’t have root access keys and forced a rotation (re: deleted all existing keys in the account.)

A little while later we received a followup phone call and email from AWS support to check in and ensure we were OK and had actioned the advice given. The second email reiterates the recommendation to rotate both the root password / access key and all AWS access keys created before the breach happened. You’ll also get some handy recommendations for aws-labs git-secrets and a Trusted Advisor / Cloudwatch Events based monitoring mechanism to help detect this kind of leak.

All in all this is actually a well handled process for what could be an amazingly painful and stress filled event. While some of this communication may vary for people without Enterprise Support I’d assume at the very least you’d still get the emails. If you have good key rotation practices this kind of event becomes just another trello card rather than an outage and now we’ve seen one of these it’s an easy scenario to mockup and add to our internal game day scenarios.

Quarterly SRE Health check

At $WORK I’m one of the people responsible for our SRE community and in addition to the day to day mechanics of ensuring everyone is willing and able to meaningfully contribute I’ve been looking at ways to gain a higher level, people focused, view of how they’re feeling about their role. With our move to quarterly missions now department wide it seemed like the perfect time to try our first “Quarterly SRE Health check”.

As someone more comfortable asking about system implementation than personal feelings I heavily ‘borrowed’ the great questions from How to Tell If You’re a Great Manager and reworded some of them a little. For our initial survey I ended up with fourteen questions, all except the last were multiple choice with the canned answers of ‘Strongly disagree, Disagree, Neutral, Agree and Strongly Agree.

  • I know what is expected of me at work
  • I have the materials and equipment I need to do my work
  • At work, I have the opportunity to do what I do best every day
  • In the last seven days, I received recognition or praise for doing good work
  • My line manager, or someone at work, cares about me as a person
  • There is someone at work who encourages my development
  • At work, my opinions seem to count
  • The mission/purpose of my program makes me feel my job is important
  • My co-workers are committed to doing high-quality work
  • I work with people I consider close friends
  • In the last three months someone at work spoke to me about my progress
  • In the last six months, I had opportunities at work to learn and grow
  • It’s OK to make mistakes in my role
  • Any comments or feedback you would like to leave (free form response)

In order to keep the implementation as simple as possible, the answers being the only thing that actually mattered, I created a Google form, added a disclaimer explaining what the questions were for, who would see them and that it was completely anonymous and then emailed it to the SRE teams. Each of the finished questions looked like this:

A question with 5 checkbox answers

I then closed the tab and waited for a few days to see what would happen. I’m fortunate to work somewhere that’s very open and willing to participate in this kind of engagement attempt and within a few days 75% of our SREs had responded. A quick slack reminder and a few days later the poll was closed and a meeting was scheduled to go over the results. After the initial read through and presentation to the department heads we’ve spotted positive trends to encourage, and a couple of early warning signs to address. Considering the information it brings, and the small time investment required it seems it’d be worth doing again, with a couple of question changes, at the end of the next quarter.

Pi-hole – the first two weeks

It all started with someone trying to show me an article on their mobile phone. As an adblock user I’d forgotten how bad the world was with all the screen space being stolen by pop-unders, pop overs and I was soon done. After mulling it over for a little while I decided to use it as a flimsy excuse to buy another Raspberry Pi and trial running Pi-hole - ‘A black hole for internet advertisements’.

With the joy of Amazon Prime ensuring I’d have all the required pieces (Pi, SD Card, case and power supply) the very next day I downloaded the newest Raspian Lite image and went off to do something else. From the hardware arriving it took less than 30 minutes from getting the very pretty https://etcher.io/ to burn the image to having Pi-hole installed (wrapped up with flu I didn’t trust myself to use dd correctly) and running. The documentation, initial user experience, and web admin are all very well done and provide a very smooth first encounter. I moved my iPad over to using it, manually as Virgin media routers don’t allow you to set the DNS servers via their built-in DHCP functionality(!), and instantly noticed empty spaces on a few of the tabs I reloaded.

Number of DNS queries graph

I’ve been running pi-hole for a few weeks now and on a normal day it’s blocking between 7-12% of DNS queries, which is a nice little saving on bandwidth. The number rises significantly when family visits with all their horrible malware covered devices and other than adding it to my apt-get wrappers for security patching it’s been a great, zero touch, little service so far.

Swapping Pragmatic Investment Plans for OKRs

For a number of years I’ve maintained what I call my Pragmatic Investment Plan (PiP). It’s a collection of things that ensure I have to invest at least a little time each quarter into my career and industry. While it’s always been somewhat aspirational, in that I don’t often complete everything, it does give me a little prod every now and again and stops me becoming too stagnant. My first few PiPs were done annually, but after I started seeing ever decreasing completed items I moved to quarterly and had a lot more success.

While I’ve been laid up with flu I’ve had time to think about some stuff and a brief review of the items on my recent PiPs shows a distinct lack of cohesion. It’s always been more focused on motion rather than progress, so I’ve decided to make a change for a quarter and try a different approach, using the Objectives and Key Results (OKR) format.

If you’re looking for a concise introduction to OKRs, Introduction to OKRs from O’Reilly is an excellent little read. In essence Objectives and Key Results is a framework / process for defining and tracking objectives and their outcomes. It’s supposed to help with setting, communicating and monitoring quarterly goals and results. They seem quite handy for keeping the scope focused on the work that will provide the most value, which is what I’m looking for. So instead of my old PiPs, which have looked like this for the last 15 years:

A list of objectives from a previous PiP

I’ve decided to do 3 separate OKRs, each with their own objective, running for a month each. This should provide enough time to trial the approach, and allow a little tweaking and refinement, while being short enough that if it’s not working for me I can try something else. I’ve mocked up a small, slightly too vague example, based on the seemingly most common OKR format.

A 4 section OKR template with mock values

It’s been a decade and a half since I started using PiPs to keep myself honest and as my other commitments have grown I don’t have the time I used to for adhoc, exploratory, experimentation. I’m hoping this new approach will help focus my limited spare time a little more into providing more worth and progress over just motion. And if it doesn’t work? I can always go back.

Respect can be a local currency

In the IT industry we are reputed to be serial job hoppers. While this may seem a little unfair, if it applies to you then you should consider where you’re spending your limited additional time and effort. First, a disclaimer: you need to invest enough time and effort into your current job to stay employed.

Now that’s out of the way let’s look at our normal days. All those extra hours and hard work you put in everyday? That’s all local currency. In the best case your current employer and co-workers will hopefully appreciate it, you’ll be recognised for your outcome and ability, and hopefully be considered an integral part of the team. But that’s almost as far as it goes. No one outside your employer, and depending on the size and organisation, maybe not even throughout it, will ever know that you pulled a 70 hour week to get that one release done and dusted or stepped up and handled a Sunday emergency. It’s possible that a small part will transfer into the wider work. This typically appears as references, LinkedIn praise etc but most of it won’t go with you when you change roles.

If you’re someone who likes to change jobs, short permanent roles or contracting, you should carefully consider the balance between local and remote respect. Writing blog posts and articles, releasing open source projects, giving presentations, these things have value in the wider world as well as hopefully at work and may serve you better in reaching your career goals. Some companies are wonderful at unifying these two threads but at the end of the day it’s your career and you need to deliberately weigh the options.

All of those possible career value adds eat at your time, and not everyone is in a position to do all or even some of them, but where you can it helps to build up a portfolio of subjects larger than your day job and makes future interviews more about culture than demonstrating technical minutia. Nothing beats a pre-warmed audience, especially one that already uses your code or reads your blog.

This has been on my mind recently as my working hours creep up and my personal projects wither and I think it’s something worth taking a moment to deliberately consider every few quarters.

Accessing an iPads file system from Linux

Despite using Linux on pretty much every computer I’ve owned for the last 20 years I’ve made an exception when it comes to tablet devices and adopted an iPad into my life as commute friendly “source of all books.” Overtime it’s been occasionally pressed into service as a camera and I recently realised I’ve never backed any of those photos up. “That’s something easy to remedy” I naively thought as I plugged my iPad into a laptop and watched as it didn’t appear as a block device.

While there are many pages on the internet that explain parts of the process of accessing your iPad file system from Linux it was awkward enough to piece together that I decided to summarise my own commands in this post for future me. I used the following commands on a Fedora 28 install to access an iPad Air 2.

First add the software needed to make the connection work:

    # install the required packages (on fedora)
    sudo dnf install ifuse libimobiledevice-utils

Once this is installed unlock the iPad and run idevicepair pair to pair the iPad with your Linux host. You should see a message saying that pairing was successful. Now we have access to the device let’s access its file system. Create the mount point and make the current user the owner:

    sudo install -d /mnt/ipad -o $USER

Finally, mount the iPad so we can access its file system:

    ifuse /mnt/ipad

    ls -alh /mnt/ipad/

If this fails ensure the ifuse module is loaded by running lsmod, and run modprobe ifuse if it isn’t. Once you’ve finished exploring don’t forget to umount /mnt/ipad to release the iPad.

Slightly Shorter Meetings

A few jobs ago, as the number of daily meetings increased, I picked up a tiny meeting tweak that I’ve carried with me and deployed at each place I’ve worked since. End all meetings five minutes early. Instead of half past, end it at 25 and instead of on the hour (complex maths ahead) end at 55.

My reasoning is simple and selfish, I hate being late for things. This approach gives people time to get to their next meeting.

11.5 Factor apps

Each time someone talks about the 12 Factor Application a weird feeling overcomes me .. Because I like the concept .. but it feels awkward .. it feels as if someone with 0 operational experience wrote it. And this devil is in a really small detail.

And that is Part III. Config ... For me (and a lot of other folks I've talked to about this topic) , using environment variables (as in real environment variables) are just one of the worst ideas ever. Environment variables are typically set manually , or from a script that is being executed and there's little or trace to see fast how a specific config is set.

Imagine I launch an app with an env variable X=foo , then your collegue stops that app, and launches it with X=bar. The systems integrity has not changed, no config or binaries have been changed, but the application behaviour could have completely changed.

Sure I can go in to /proc/pid/environ to find the params it was launched with ... But I want to know what state the system is supposed to be in .. and have tooling around that verified that the system indeed is in that state.

https://12factor.net/config claims that using config files that are not stored
in revision control are a huge step forward from hardcoded config in code, but they claim that config files now start ending up all over the place. This obviously feels written by someone who never experienced the power of modern configuration management. Templates which are dynamically populated , or configs that are even calculated on the spot depending on the environment an application lives in are the norm in modern infrastructure as code .. yet people seem to think that managing an environment variable would be less painfull.

I feel the idea of what they wanted to achieve was good, but the way they suggest the implementation was foobar. I don't ever want critical config of an application (like wether is is talking to the PROD or DEV database) to be set from an environment variable that can be modified. This has to come from an automated system.

This kind of config should be idempotent, one should be able to trace back where it came from and who modified it (version control), and every redeploy of the application should end up with the same result. It can even be dynamic (service discovery), but placing it in an Environment variable is the last place where a config deserves to live.

So please let's stop calling it the 12 Factor application .. and call it the 11.5 Factor application ..

Rediscovering Age of Kings

About a year ago, I decided it’d been long enough since I last wasted significant amounts of time playing computer games that I could buy a gaming machine and play for a sensible amount of time and not impact other demands for my time. I looked at all of the current generation consoles and to be honest I was put off by the price of the games. I’m aware of the Steam sale and considering it’s been a decade since I played anything seriously (I still miss you, Left 4 Dead 2) my plan was to quickly recoup the extra cost of a gaming PC by sticking to the best games of a few years ago.

Other than obsessively 100 percenting a handful of Lego games (Lego Spider- man! Lego Quasar!) I’ve not really played anything new, instead I have an overly powerful, at least for its current usage, machine that is essentially now for Age of Empires II. I had fond memories of the game from when I was a kid and after looking a few things about it up I discovered that there’s actually a seriously skilled community keeping the game alive.

I’m only a passive observer, I played one online game and my god was it embarrassing, but there’s an amazing amount of Twitch content from a number of community games and even sponsored competitions. Most of the material I’ve seen has been cast by Zero Empires or T-90Official and there is currently the Escape Champions League tournament (with a 60k USD prize) that’s show casing some amazing team play. It’s great to see such an awesome old game still going strong.