Living the dream: 2017

Wednesday, December 13, 2017

What's in an URL?

The Internet, as we know it, has been slowly developed since the 1950’s. Originally designed by the Department of Defense and called ARPANET, it was meant to ensure communications through a wide variety of interconnected machines that could not be destroyed in the event of a nuclear attack. Now, over sixty years later, it serves up porn and endless Youtube videos, but I digress…

By its nature, the Internet was designed to be stateless. When a client such as a browser makes a request to a server, the client and server are only connected during the period of this, hopefully short, exchange. A user types in www.msn.com in their browser, the browser makes a request of the servers over at MSN, and MSN sends back a page full of HTML which is then displayed in the user’s browser. Immediately after this exchange, neither the browser nor the server has any knowledge of each other.

If a user just wanted to click on a single page, this would be fine. However, often times state matters. A lot. For example, if I was on Amazon’s page looking at hover boards suitable for an adorable nine year old girl with curly hair (it helps to be oddly specific), I can type hover board in the search bar. Amazon’s servers can send me a whole bunch of hover boards to look at, but my next request to Amazon’s server is going to need to let Amazon know which hover board I wanted to see.

When I click on any of the links above, the URL shown in my browser looks something like:

https://www.amazon.com/Megawheels-Hoverboard-Certified-Self-Balancing-Scooter/dp/B06ZYGV92P/ref=sr_1_2_sspa?s=outdoor-recreation&ie=UTF8&qid=1513195927&sr=1-2-spons&keywords=hover+board&psc=1

While the link itself is not very human readable, it is guiding the request to a certain script on Amazon’s server and passing in some parameters in the request’s querystring. When the server receives this request, based on the formation of the route and the querystring parameters, it knows which hover board should be displayed. State is maintained somewhat by generating URLs from the search request that have the right parameters in the querystrings to guide my next action. As a consequence of this, I can go to my browser and copy the URL and email it to my wife. She can bypass the search and go the exact hover board I was eyeing.

There are other mechanisms to maintain state. Sometimes web applications store data between requests on the client’s device by using a cookie or session. These objects cannot be easily seen by the end user. When a user logs into a site like Amazon, it is common practice to return back a token which will act as the user’s name and password for subsequent requests. The token is stored on the client, but never exposed in the URL. Most of the time, the token is used in future requests by placing it as an authorization header. While the token can be found by advanced users, it prevents the password from being cached on the local machine. It is also never placed directly in the URL so that it cannot be accidentally sent to another user who could then log in as an authenticated user and make unauthorized purchases.

The point here being is that some things belong in the URL, some things clearly do not. From Amazon’s perspective, me copying and pasting a product page is a good thing. More eyeballs means more sales. In the olden day, each URL corresponded to a specific script that would be executed. The HTML that was dynamically generated and returned back to the user would be routed correctly by the web application. However, more modern web architecture, when done correctly, usually utilizes a Single Page App (SPA).

A SPA contains all the magical javascript libraries, css sheets, and static text the user will ever need. The portions that are generated dynamically still go through the same request/response series as before, but now the exchange is much lighter with a few bytes of JSON objects going back and forth instead of resending all of the CSS, HTML, Javascript, etc. needed to render a page.

Clicking on links within a SPA often times changes the URL in the browser. Although there is no flicker as the page changes, an internal router may reach out to the query string in the URL to know which product to render if the example above were written in a SPA (it’s not). Having the URL change even though the page is not truly reloading has the advantage that a user can send a link to another user and have them land in the exact same location.

I have written a few SPAs recently and hosted them on Amazon Web Service’s (AWS) S3 service. S3 allows for highly scalable, highly available objects to be stored. With a minor configuration change, S3 can host a static website. The best part is, as of this writing, it costs about 2.4 cents per gigabyte per month to host content there. Since my SPA needed a place to live where I could access it from anywhere and 2.4 cents a month seemed like a reasonable enough price to pay, that’s where I put it.

Except, there was one hitch… Serving up the index page worked like a charm. But if there was anything after the index page, it would trigger an error. Pressed with other things, I didn’t dig too far into it at the time, but I realized what was happening. Having anything off the base URL was having S3 make a request to an object that did not exist. My SPA made use of the querystring and expected to route things internally. Clicking on the base URL and clicking around in the app worked fine. Refreshing the browser or sending a link to someone else failed.

Fortunately, I am not the first person to do this. I probably will not be the last. There is an easy solution that is available. While you can use any domain registrar and utilize AWS resources, I would recommend using the AWS Route 53 service. It makes life a lot easier and costs roughly the same as the others.

For the domain you want to forward to your static website, log into your AWS account. Make sure that your geography is set to N. Virginia (us-east-1). There is a weird quirk about Amazon Certificate Manager (ACM) in that it only works in us-east-1. Regardless of which region your other infrastructure will reside, using ACM set to us-east-1 will still work. Go to ACM.

Click Request a certificate. Provide the base URL the user will put in their browser. Click Next.

On the next screen, if you have the domain registered to you, it is probably easier to use email validation. I have been using email validation and receiving the email in a few seconds. After approving the email, the certificate is issued and it is on to the next step…

From the AWS Console, select CloudFront. CloudFront is an Amazon service that enables caching of static (or in most people’s case, near static) assets at “edge locations”. Most of my static content is stored somewhere in Oregon, but I currently (but not for long!) live in Texas. When I make a request to my static content located in Oregon, it has to go through several hops that cost precious milliseconds and degrade the overall performance of my site. By caching my content at locations throughout the United States, my request goes through fewer hops making the initial request much faster. Remember, in web development, every millisecond counts!

Create a new distribution by clicking the “Create Distribution” button. Select a Web distribution by clicking “Get Started” in that section.

For the Origin Domain Name, select the bucket that is hosting your static content. A drop down should appear listing your buckets when you click on it. Under Viewer Protocol Policy, I like to select Redirect HTTP to HTTPS. Most browsers will not allow a page served over HTTP to make calls to APIs using HTTPS. By selecting the redirect to HTTPS, any user who does not specifically type https://www.yourwebsite.com will automatically be pushed to the HTTPS version of it. This is seamless to the user and will allow the website to function properly when combined with APIs which probably should be communicating over HTTPS.

Under distribution Settings, make sure to provide the Alternate Domain Name that you just requested a certificate via ACM. Select the radio button for Custom SSL Certificate and find the newly created cert. For Price Class, take a look at the options. If you are in the US and do not anticipate taking Asia by storm, it might make sense to choose another option instead of the default All Edge Locations. Your mileage may vary.

It may take several minutes for the CloudFront distribution to be ready. Once it is, you will want to click on the “Error Pages” tab from the distribution. Also, take note of the Domain Name, in this case d209zgrmi...cloudfront.net. Asking users to remember that is an exercise in futility, so we will go to Route 53 and make sure our custom domain name goes to this distribution.

On the Error Pages, tab create two entries. You will want 403’s and 404’s to redirect to your /index.html with 0 error caching and return a 200 response code:

Now go to Route 53. Click “Hosted zones” on the left hand nav. Click the domain name you have registered and create two record sets. One will be an A record for ipv4 and one will be an AAAA record for ipv6. Both will look like this:

Set the Alias Target to the domain provided by your CloudFront distribution and save the record sets. It may take a few minutes to propagate, but shortly your users can type in your URL and be taken to the SPA hosted on S3 and cached throughout the country at edge locations. If the user refreshes the page when they are not on the root URL, instead of getting an error, the SPAs internal router will handle perform as expected. Additionally, users can copy and paste the URL and distribute non-sensitive information because we are smart enough to know not to put tokens or secure data in the querystring :) Recipients of the URL will be able to load the SPA and be taken to the appropriate page.

One last note, if there is a change made to the content of the SPA in S3, it can take up to 24 hours to propagate to all CloudFront edge locations. If you are making a critical UI change and don’t want your users to have to wait a day to see it, do not worry. After making the change and modifying the contents of your S3 bucket, go back to CloudFront and click on your distribution. You will see a tab labeled “Invalidations”. Click that tab! Click “Create Invalidation” and for the Object Paths, you can click /*. Once you have invalidated all objects in the cache, your users will see your changes on their next refresh. While invalidations are not free, each account gets 3,000 per day free of charge, plan accordingly.

Wednesday, November 15, 2017

Secrets in a Serverless World

I try not to get too enamored with any given technology. Technology is a fickle mistress and changes often. It is far better to try and remain objective and bring a pragmatic approach to technology. But I can’t help myself, I am smitten with Lambdas. I love them. They scale horizontally beautifully and effortlessly. More importantly, they turn themselves off and stop billing when idle. What more could you want?

However, for all the good they provide straight out of the box, there needs to be a shift in the developer’s thinking to fully utilize them. At some point, a Lambda is going to need a password or token that needs to be secure. In the old world order, using servers, it is not uncommon to have a hidden file that is deliberately placed in the .gitignore so it is never checked in. A developer working locally can modify the file with some placeholder values and then, post deployment, a trusted person can ssh into the server and modify the file if need be.

This method of keeping secrets is a bit of a conundrum for Lambdas. They are designed to be stateless. If a developer were to check a password or token into git, even if it is a private repo, it greatly increases the number of people with access to this sensitive information. The less people that have database access to a production system, the better. Why advertise it by keeping a record of it in source control?

There are some ways to handle this situation. Lambdas allow environmental variables right from the console or deployed with a service such as serverless. The values here are available to the code:

Unfortunately, anyone with access to the AWS console can see these values in clear text. Given that I have code in a public repo running against my AWS account with elevated permissions capable of creating infrastructure and running all kinds of jobs on my dime, I care a lot about keeping it secure. Anyone who knows my secret can make a post to my Lambda and do all kinds of nefarious things. In the screenshot above, the variable for GIT_SECRET looks pretty random because I encrypted the field.

My requirements for secrets were:

All secrets had to be secure. Secure enough for me to keep them in a public repo without worrying about someone doing something bad with my code that would wind up costing me a lot of money.
Secrets had to be stored in git. I like the idea of everything being consolidated in a single repo. If the secret changes, I want an audit trail of who changed it and when.
I did not want to rely on AWS IAM permissions to make sure users could not find secrets stored on S3 or on the Lambda console itself.

Given my requirements, the solution is not too difficult. I created a KMS key that I used to encrypt my values. Once encrypted, I store them in git. When the Lambda spins up, it decrypts the secret and keeps it in memory. I was also careful to never log the secret which would defeat the whole purpose.

WARM LAMBDA vs COLD LAMBDA

While Lambdas are supposed to be stateless, it turns out they do retain state. I have seen my lambdas that I use for ReSTful interfaces take up to 2.5 seconds on an initial call. Subsequent calls then take less than 100 milliseconds. The reason? The Lambda is warm. While 2.5 seconds is an eternity in web time, the good news is the more the Lambda is used, the more likely it is to be warm and fast.

In my code, I create a variable that is global in scope and instantiated with a null value. When the Lambda has a task, it first takes a look at my secret variable. If the variable is null, it knows it needs to first set it to the decrypted value. If the variable is not null, the Lambda can keep on processing.

Lines 156 - 179 handle all of this logic here:

https://github.com/PokerGuy/synergy/blob/master/handler.js

Lambdas offer so much in terms of flexibility, scalability, and cost control. They do, however, change the way certain things are done vs using dedicated servers. Security measures should never be sacrificed, but with a little thinking and creativity, developers can keep their secrets secure. While not necessarily intuitive, the benefits Lambdas bring make it well worth it.

Monday, November 6, 2017

Synergy

“All other things being equal, the simplest solution is the best.” - Me

I started my career in earnest with a consulting firm that was a part of an accounting firm in what was known collectively as the Big Six. As the consulting firm was still a part of an accounting firm, we were subject to some of the rules that governed the people in the accounting and audit side. One of these rules was that we needed to collect a certain amount of “Continuing Education” credits every year. To this day, the concept of continuing education remains important to me and I try to do at least one side project a year.

The side projects allow me to act as product owner, engineer, and architect. I get to explore technologies that I would never have touched otherwise. Through side work, I have gained some level of proficiency with React and Webpack, experimented with Mongo, and learned NodeJS. Some of my earlier work is still up on github and I cringe at the repos holding the code. My first side project did not have the .idea folder in the .gitignore, did not minify my React, did not follow the flux pattern, and there are plenty of other things about it that I can criticize.

On the other hand, it was the first thing I had ever done with React. I also integrated Server Sent Events so that the browser stayed current with what was on the server without the user needing to refresh. A couple of years ago, that was pretty cutting edge, so it is not a total waste.

I had originally hosted my web application on Amazon Web Services (AWS) taking advantage of the free tier. In two months, my daughter and I raised over $2,000 for the Juvenile Diabetes Research Fund by selling ice cream and collecting donations. I also had a nice site that I could use to walk through working examples of my code that was 100% my intellectual property. Although I was out of the ice cream business, I liked having it up and running as a demo of what I could pull off in my spare time.

A year went by and I realized I was spending $35 a month to host services I was not really using. I ported everything out of AWS and placed it on Digital Ocean where I had a modest Linux server for $5 a month that was more than capable of handling the miniscule traffic I expected on the site.

It was during this port I realized that the code I had placed in github had “drifted” from the code that was on my server. I was using AWS Elastic Beanstalk to deploy my code and instead of pushing to github and then doing a deploy, I skipped a step and just deployed. Since it was just me working on it, it was not the end of the world, but it was a really bad habit. I was working on another side project and vowed to be more diligent.

To help me ensure that what was in github represented what was live, I wrote a really small hook on my server in Digital Ocean. Its purpose was to receive a message from github, make sure that the message was authentic, and if it was legitimate - grab the latest copy of the code and restart itself. With a single controller and a five line shell script, I formed the basis of my devops philosophy. With this side project, I would write a unit test, get it to pass locally, commit the code, and voila it was released to my server.

A short while later, I was introduced to a, um different, devops philosophy. My company’s idea was to have dedicated Jenkins servers poll a git repository. From there, Jenkins would produce an artifact based on a script not checked in anywhere. The artifact would then get zipped and sit in Sonotype. Immediately after being placed in Sonotype, Rundeck would kick off a script sitting in a completely different repository. The script would grab the artifact that was just zipped and created and bring the artifact down, unzip it, and do whatever the deploy script required. In short, there were a ton of moving parts.

At times, Sonotype ran out of disk space jamming the whole deploy process. Every day at 4 o’clock, Jenkins would get flooded with jobs and it would delay deployments by fifteen to twenty minutes. Occasionally, a new package would need to be installed on the Jenkins server requiring an act of both Congress and God. There were at least fifteen dedicated Jenkins servers - all with different libraries, at different versions, and either over capacity causing delays or under utilized wasting money.

The deployment process spread what I considered a single unit of work into multiple repositories. All of it was wildly undocumented, but pushing a change could involve touching 3-4 repos. I kept on looking at my five lines of bash and wanting to go back to that.

In AWS land, there are multiple tutorials on how to create a Lambda or a Data Pipeline. Most will walk the developer through a couple of clicks on the console and it might seem pretty quick and easy, but DON’T DO IT! Everything that is introduced to the AWS environment should be auditable, capable of being rolled back, and kept under source control. Keeping with my KISS philosophy, I wanted to take the ease of laptop ops, but force deployments through git. After thinking about it for the better part of a year, this is what I came up with:

When a developer commits code to github, an action hook is kicked off. A Lambda in my sandbox environment inspects the content of the hook. If the body of the payload matches the encryption code, the request is considered legitimate and github receives a 200. If it does not match, github receives a 401 unauthorized response. If the branch that got committed matches the name sandbox, an SNS Topic is fired off to the deployment lambda. If not, the branch name is compared to test or master. If it matches one of these, the request is forwarded to the appropriate endpoint in a different AWS environment. Otherwise, the branch is considered to be a feature branch and is not deployed.

The deploy Lambda pulls the code at the specified branch from git. It then looks for a deploy.sh script and attempts to run it. The stderr and stdout of the deploy script are captured in DynamoDb and also streamed to any browsers that are receiving updates for the repo. A developer who just pushed can “follow” the deployment process. The deploy Lambda comes equipped with:

Amazon Linux
1.5 Gigs of RAM
512MB of disk space
Java 8
Node 6.1
AWS-CLI
Git
Serverless 1.2

So long as the process can be completed in under five minutes, all is well. I built a “Build Board” to provide monitoring across all the environments. The Build Board was deployed using this process. Its deploy script simply does an npm install, webpack build, and uploads the contents of the build directory to an S3 bucket dedicated to static website hosting. The S3 bucket was defined as part of my ecosystem in a cloud formation template. All of my infrastructure, code, build instructions, and secrets are stored in 1 (or 2) repositories.

Here is a quick demo showing how I deployed a “hello world” lambda to three different environments all by using git. Github itself allows various controls that can help to restrict deployments for a true controlled and gated experience. The branches test and master can be protected branches forcing a pull request before allowing a merge. Contributors permissions may be set up so that a product owner’s approval would be required before the merge/deployment. While there are some limitations in this approach - namely that the build process must finish within five minutes and not exceed 512MB of disk space, this tool should work with most small services. It scales up to allow for lots of deployments during the day, but does not cost anything when sitting idle.

I plan on using this as I build out my own product. Your milage may vary, but all the code is under MIT License and freely available on github.

Wednesday, November 1, 2017

Austin's Dirty Little Secret... It Kinda Sucks

“Regrets? I’ve got a few,

But then again, too few to mention.” - Frank Sinatra, “My Way”

I had a guilty pleasure of reading a professional blow hard/recruiter’s blog. This wizard of the recruiting industry and judge of technical talent, besides having never worked in the tech business, only works with elite engineers and only has amazing opportunities. His yarns usually center around PhD’s in machine learning being highly sought after by local well-funded startups. Anyhoo, the author in one of his post’s bragged about telling someone to “pack up the family and move to Austin.” (Source) Without a job. The implication being that the risk he faced moving to Austin jobless would be more than made up for by the potential reward. Almost the modern day equivalent of, “There’s gold in them thare hills!”

I don’t believe in crying over spilled milk, but this was some serious crappy advice. I have worked professionally in Los Angeles, Portland, Seattle, Salt Lake City, Baltimore, Dallas, San Francisco, and Melbourne, Australia. Very recently, a former co-worker I have known for well over ten years asked me my opinion about moving from Seattle to Austin. I let him know, in no uncertain terms, that Austin really sucks. Here are some things that you will never see on this blow hard’s blog.

THE COST OF LIVING MYTH

In today’s connected world, it is super easy to get data. I have gone through various periods of absolute infatuation with the website/app Zillow. With it, I can quickly and easily see the house of real estate, see pictures, and get a 1-10 rating on the nearby schools. This app convinced me that living in California again was an absolute pipe dream. However, home prices on Zillow seem much lower than equivalent home prices in my adopted home of Seattle. Do not be deceived!

The property taxes will vary a bit from location to location throughout Settle, but I paid roughly 0.9% on my home in Redmond. It didn’t click with me when I was looking at homes that were 33% cheaper in Austin that my monthly payment would be almost exactly the same because my property taxes in Austin are an eye popping 3.1%. The data was all there, but I willfully ignored it until it was too late. While the overall price of a home may be significantly lower, the property taxes are so much higher that it becomes a wash.

When having this conversation amongst Austinites, they are quick to point out that Texas does not have a state income tax. Well, neither does the great State of Washington. The next question is, “Well, where do they get their money?” I can’t answer that question, but I do know the facts. No state income tax. A 1% higher sales tax with groceries and staples excluded. Much lower property taxes. When looking at real estate, please do not make my mistake and be sure to factor in these insane property taxes.

Yet, to make up for the equivalent real estate cost as my adopted home of Seattle, I have found the salaries to be 30-40% lower. Part of me always thought that if I didn’t find a fulltime role right away, I could always go back to contracting. In Seattle, the contract rates paid more than the fulltime rates, as they should. With contracting, there are inherent risks, no benefits, and no vacation. Yet in Austin, the contract rates pay far less than the already low, low salaries.

So, equivalent cost. Lower salaries. What more could you want?

AN ABBREVIATED LIST OF REAL THINGS THAT HAVE HAPPENED TO ME PERSONALLY OVER THE LAST FOUR YEARS…

Maybe it’s not fair to blame all of this on Austin, but this has never happened to me outside this market:

Shown up to a formal interview. Been interviewed by multiple people as scheduled. One person on the interview loop does not show up and I am left in a room by myself for an hour. TWICE.
Interviewed with a small consulting firm and was asked to spend the day doing work, for free, for the company as part of the process.
Accepted an offer from a hiring manager only to have the hiring manager quit within six weeks. TWICE.
Worked at a “startup” founded in 1997 that routinely claimed that they would be the next Facebook, but in the span of a year half the company either quit or got fired
Worked at a startup that was in business for two years. The company consisted of twenty people and had a CEO, COO, CTO, Enterprise Vice President, two regular Vice Presidents, and two directors. 25% of the employees were direct relatives of the CEO. The company never made a dime in revenue. To this day, I am pretty sure the whole thing was a scam to bilk a rich, clueless investor, but this one company has now spawned two separate companies trying to peddle the same dumb idea as part of the broader Austin Software Ponzi ™ that runs rampant here. Rich clueless investors in fear of missing out give money hand over fist to non technical managers who promise to make them billions in tech. They hire their friends and family, produce nothing, then attempt to sell out. Sometimes they do, sometimes they don’t. If they don’t, they just leach onto the next clueless investor and the whole cycle repeats itself.

Gone through three rounds of interviews. Received an offer for $30k less than my current salary which I was upfront with the internal recruiter on our first call. After telling her that the whole experience was extremely unprofessional and a waste of everyone’s time, she decided to call my current employer for a “salary verification” after I declined the offer.
Stated every step of the way through the interview process that the only reason I was looking for a new job was I had a family health situation that was making it difficult for me to keep up my current high travel requirements. Multiple times, I asked about the travel requirements. Accepted the job, took a 40% pay cut and then was asked to travel just as much as my previous job.

OF COURSE, YOUR MILEAGE MAY VARY

Should no one move to Austin? No, of course not. But anyone doing so should do it with their eyes wide open. The cost of living here is really no better than larger cities such as Seattle that have more job opportunities and higher salaries. As I write this, I am sitting across the street from the Capital Factory, Austin’s largest tech incubator. Every single “CEO” in there truly believes they are going to have a unicorn ($1 billion) valuation. All believe they will follow the four steps from Eric Cartman’s meme to a tee. Most assuredly, it will not happen for the vast majority. While cronyism and nepotism is rampant just about everywhere, it is especially bad here in Austin.

If you are a young buck looking to make it in the tech scene and willing to job hop for a few years and gain experience, it might be well worth your time. If you are in mid career, coming here for an unknown could very well be career suicide. Take hede.

Wednesday, October 25, 2017

The Paradox of Competency

It has been argued that the most difficult thing to do in sports is to hit a ball off of a major league pitcher. I will respectfully disagree. Once every four years, I stare mesmerized at the image of young women competing at the Summer Olympics in the vault competition. The girls sprint down a narrow strip, jump onto a launching pad, get sprung ten feet in the air, land on a narrow apparatus, then push themselves back into the air from a handstand, perform multiple spins and twists, and are expected to nail the landing. For the life of me, I cannot think of anything that could be more difficult.

And yet… Should one of these ladies hop or slightly waive an arm, you can audibly hear the commentator suck in their breath and say, “Oooooohhh, it was a good jump but there was that little hop at the end. The judges are definitely going to deduct for that.”

There are perhaps one in a million young women with the requisite talent and training to be able to accomplish this mind blowing feet. These girls spend literally years of their lives training and preparing for this moment. All those years, all those workouts, all those sessions in the gym come down to those few seconds in front of the judges. Their goal is to do the near impossible and to make the impossible look easy.

No commentator has ever said, “Wow! That was an amazing vault. She did three somersaults with a twist! Who cares if she hopped a little on the landing? It was still incredible!”

I think the judges are so adamant that the girls make it look easy because we all implicitly know that what they are doing is both difficult and dangerous. Making it look easy is part of the competition. On the opposite end of the spectrum, in software engineering, software engineers are often judged by people who know little about software engineering. In order to be a good software engineer, in their minds, software engineers need to make their jobs look hard.

If an engineer takes a look at a difficult problem, thinks about it for a while, quietly sits and cranks out some code, writes a few tests, and then silently checks it in - the equivalent of a young woman nailing the perfect vault, most managers will think that it is easy and be sadly unimpressed with the accomplishment.

On the other hand, if the same engineer had called countless design sessions, stayed some late nights, checked in code and saw it fail, reworked it several times, and then after a few weeks got it to work; the manager judging the engineer would probably give him high marks. This would be the equivalent of the gymnast flailing around in the air and landing on her butt, yet receiving a perfect ten.

I like to call this the “Paradox of Competency”. In most pursuits, accomplishing a task with grace and style is seen as a good thing. Sadly, in this godforsaken industry, most managers feel developers are interchangeable parts. To a bad manager, a software engineer is like the example of a monkey randomly hitting keys on a keyboard and hammering out a work of Shakespeare. Given enough time, randomly, that monkey will do it by accident. In fact, I have worked on plenty of systems that looked like they were developed by monkeys banging on keyboards.

THE NASH EQUILIBRIUM

Mathematician John Nash, the subject of the movie “A Beautiful Mind”, won the Nobel Prize in Economics for his work on the Nash Equilibrium. It is a framework for evaluating the decision making process of parties involved in a non-cooperative game. In layman’s terms, it is often used to describe the “prisoner’s dilemma”.

Both prisoners would be best off if they could confess without the other one confessing. Both prisoners, unfortunately, know the stakes and if they both confess they face ten years of hard time. Rationally, both of them keep quiet and accept the one year of jail. In this framework, each prisoner has to take in account the decision making process of the other participant and they settle on the most likely, least bad outcome. According to the Economist Magazine, this framework helps to describe why “the decisions that are good for the individual can be terrible for the group.” (Source)

While I am a student of applied game theory and a lover of all things rational, I once joked that I had to leave the consulting industry because I had ethical problems. Mainly, I have ethics. Sadly, I have started to realize that game theory and questionable ethics mix quite often in software companies, especially in the presence of a non-technical manager. Mainly, doing a competent job and creating simple, maintainable solutions is not rewarded. Effort and complexity are rewarded. It is, therefore, in the best interest of the individual to make their solution as complicated as possible. As the Nash Equilibrium predicts, what is good for the individual is bad for the organization.

I worked in a group dominated by the Comic Book Guy. Well over forty, never had a family or wife. As far as I know, never had a girlfriend. The only thing he did, to the best of my knowledge, outside of work was collect comic books and talk about collecting comic books. He was an absolute delight to work with. He, somehow, had managed to convince just about the entire company that he was the ONLY person who could possibly understand all the data going into the data warehouse. Therefore, however he wanted to design the data warehouse was up to him.

With two members of the crack team of architects dedicated to him, two consulting firms, countless dollars, endless resources, and two full years to create the data warehouse; it is not complete. With less than 50 gigabytes (not a big amount in terms of data warehousing), it is not uncommon for queries to run for over an hour. By any reasonable measure, the data warehouse in question is an absolute disaster. And yet… The Comic Book Guy is still calling the shots. Our shared manager went out of his way to call out how good a job he and his small crew were doing giving they were new to Redshift. They had been working with Redshift for two years now.

On the other hand, my small crew and I pulled together a platform that would hopefully lead to new revenue. We did it in two months using technology we had never used before. The CEO announced it, PR Newswire carried a press release for it, and not a word was said to the two guys who actually wrote it. We made a mistake. We made it look easy.

Like the gymnasts hurling themselves at the vault, writing software is not easy. Making it look easy, sticking to schedules, and actually delivering should be rewarded like the gymnast who sticks the landing. Hopefully, the non-technical managers of the world can do a better job evaluating teams and eliminate the Paradox of Competency.

Wednesday, October 18, 2017

WTF is DevOps?

In the late nineties, I can remember having a conversation with my future wife about the number of pillows on her bed.

“What are all these pillows for?”

I can understand the pillow for sleeping on. That I got. But there must have been at least ten pillows on the bed.

Patiently, she explained. “They are decorative pillows.”

“So you don’t sleep on them?”

Her patience was starting to wear thin. “No, they are decorative.”

The next week, we were sitting on the couch watching TV when a promo for “The Man Show” came on. Jimmy Kimmel was doing a monologue with just a clip for the promo. In the monologue he said, “Hey, ladies, what’s with all the pillows?”

I absolutely lost it as the timing on the promo and our recent conversation was just uncanny. To end my laughter at her expense, Julie hit me in the head with a pillow. It only made me laugh harder, but in the same vein…

WHAT’S WITH ALL THESE ENVIRONMENTS?

The second most read thing I have ever written was entitled “SharePoint is a Colossal Piece of Shit and Should Not Be Used by Anyone”. One of my chief complaints is that SharePoint does not have a good way to migrate code between environments. Most SharePoint implementations have a single environment, production.

Typically, when writing software, there are multiple environments. Code moves from a developer’s laptop to a development environment where the code is then mixed in with everyone else’s changes. Tests are performed. If everything passes, it is moved to a testing environment where business users can take a look at the new feature or verify a bug has been squashed. If everything goes well, there is a final move from testing to production. The goal is to verify along the way that the amount of change an end user sees is limited, that new code is thoroughly tested before it reaches an end user, and that nothing breaks during the deployment of the code itself.

When it’s just one software engineer writing code all by herself, having multiple environments would probably be overkill. Our heroine could introduce a new feature or squash a bug on her laptop, test it, and then push it out to a production server. In fact, that’s what I have typically done on a lot of my hobby work. I have a copy that runs on my laptop and I have a server in Amazon Web Services (AWS) or Digital Ocean. I code, test, and push.

However, when working with a team, it is a bit more complicated. What if I make a change in code and it works but then the guy sitting across from me makes a change and it works? Sadly, when our code is merged together, weird stuff starts happening. On a big software project, the code base becomes a living organism that is constantly changing. It is not uncommon to start working on a feature and have it take over a week before the resulting changes are attempted to merge back into the main code base. During that week, numerous changes will probably have been made by other team members.

Although each individual developer should be responsible for unit testing their code, how all the changes are going to work together is usually an unknown. Sadly, software is usually so fragile that every change can cause a ripple effect. Therefore, every time a change is introduced, it is a best practice to run a suite of regression tests to verify that the stuff that used to work actually still does work after the change has been introduced.

It is also at the point that the new code is moved from the developer’s laptop to the development environment that issues can arise as the hardware and software the developer is working on is now different than the integrated environment. A developer may work on a Mac or a PC, but have their code run on a Linux box. Not only are all the changes now tested together, but the deployment mechanism and differences between operating systems and hardware can be tested.

It is not uncommon for developers to test edge cases and create users like “Daffy Duck” in a development environment. The development environment exists to merge code together, test deployments, and make sure everything is working before getting the business users involved. The data quality and the ability for a business user to actually use this environment is usually fairly limited. Additionally, it is not uncommon to make multiple deployments to this environment before calling it done. Sure, every time a developer pushes code they really think they are done, but it never works that way.

In test, it is not uncommon to have more production like data. Data can be sanitized for Personally Identifiable Information (PII) and pulled back into the test environment so that business users can run tests and have the system behave more like they are used to in production. When a deployment goes from dev to test, the change set should have been thoroughly tested and there should be a high confidence that the code will work as expected. Of course, business users may change their minds, in which case development starts all over again on the engineer’s laptop, pushed to dev, and then to test before the business user sees it again.

Once the product owner or business users sign off, the code can move from test to prod. Usually this is done in a specific maintenance window and the user will never know about all the work that went into writing and testing a feature. If this all seems like overkill, plenty of bugs and unintended consequences are usually caught during the process and it allows multiple developers to work on the same code base simultaneously.

LAPTOP OPS

There are services such as Heroku or AWS Elastic Beanstalk which will allow a user to deploy an application from their laptop. If the user goes to the command line (where the real work is done) and types, “git push heroku master”, the contents of the local git repository are teleported to a Heroku dyno. A few minutes later, any changes that were made are now in effect in that environment.

This is not Dev Ops. It is laptop ops. It might work well for one lonely developer, beavering away on a code base by themselves, but… What if I make some changes to my application and the guy sitting a few desks away from me makes some changes? What happens if we type the magical “eb deploy” invocation at the same time? How would we know whose changes just stomped on top of the previous deploy? Short answer, we really don’t. While it may be a fun way to prototype, way too much can go wrong when working with a team.

FINALLY, DEVOPS DEFINED

So I attended DevOps days in Austin this year. The first thing every speaker did was try to define DevOps. So here’s my checklist for a definition my personal definition of DevOps:

All code is kept in a source control repository
Developers can clone the repository and get a local copy of the application working with minimal headache
All third party dependencies are clearly marked, kept out of source control, locked at a specific version, and can be quickly installed locally
Any change is tracked and fully auditable
Deployments of changes happen in a controlled and gated manner

In my dream DevOps world, no one would deploy code from their laptop. The more I thought about it, the more I liked the idea of merging a change into git launching a build. Popular git applications like the near ubiquitous GitHub have controls built into them that prevent merging directly into a branch, only allow certain GitHub users to approve a merge, allow for private repositories, and more safety/security controls. At the time; I was working with a Frankenstein’s monster of Stash, Sonatype, Jenkins, and Rundeck which was cobbled together by a crack team of architects that took at least half an hour to do a simple deployment. After the deployment was approved by God Himself. And Congress.

THE CODE PUSH PYRAMID

On April 14, 2017; I was sitting with a small team late at night. It was Good Friday and the rest of the company went home at 1:00. We were just getting started. Someone above my pay grade had drawn a line in the sand and declared some level of feature parity would be available in the Data Warehouse that day.

The day started optimistically. A deployment was approved and a scant half hour later, it was in production. But it didn’t do what it was expected to do. I pulled together a script that set the production environment back to the way it was and waited for the next deployment. That didn’t work either. I ran my script. And so it went. Until 11:00 that night. It occurred to me somewhere around noon that every change made in dev barely touched the ground in test before it was shot straight to production. I had an epiphany. We were testing in production!

I tried my hardest to convey to a team of people unphased by the fact that every change was going straight to production that this was a bad idea. I coined the phrase “The Code Push Pyramid (™)”. The idea is lots of changes get introduced in dev. Once tested together, they can move to test. Nowhere near as many changes that get pushed to dev will make it to test. Even fewer changes will make it from test to prod. That’s the DevOps way.

I grew increasingly frustrated with the highly inefficient processes governing every aspect of software development from the tools and frameworks we could use, to the deployments using Frankenstein’s monster, and the micromanagement we encountered along the way. What made it all the worse was that for all of our vaunted process, we TESTED IN PRODUCTION! I have personally dropped tables or restaged data too many times to count IN PRODUCTION. At some point, I offered that we could save a lot of money and time if we just got rid of all the lower environments and deployed straight to production as that was where we did our testing anyway.

It came to my attention that we tested in production because we had PII data. With all the attention given to the Equifax data breach, protecting PII data seems reasonable enough. I immediately proposed that we should figure out which fields contain PII and replace it with mock data in the lower environments. That way, we could still test and protect PII. This practice is fairly common in the industry. A crack team of architects is still working on this solution, using AWS Kinesis (trust me, this is a great tool but makes absolutely no sense for this scenario) as we speak.

#SYNERGY

Before we broke up the long running and inefficiently used Docker application into Lambdas, my co-worker, The Kid, kept on advocating that we should be using Lambdas. He was looking at it from a “this could solve all of our problems in a very cost effective and fast way” kind of perspective. I was looking at it from a “how in the world am I going to deploy this on Frankenstein’s monster” sort of way. I created a little cheer I used to say every time The Kid brought up the subject of Lambdas.

“You say Lambda, I say no!”

(pause)

“Lambda!”

(pause)

“No!”

“You say Lambda, I say no! Lambda! No!”

And repeat. Maybe you had to be there, but it was kind of funny. The entire time I was doing my “no lambda” chant, I was working like mad to find a way to deploy them. Something better than Frankenstein’s monster. From a laptop ops perspective, I loved using an open source (read: free) framework called serverless. My hope was to introduce this framework to the monster and make it suck just a little bit less. The crack team of architects said no.

So here I am, months later, and I came up with my own framework for deploying things to AWS in a manner that meets my definition of DevOps. I have a GitHub action hook pointing to a lambda. The lambda reads the branch that has been merged and the branch determines which environment it will be deployed to. If the branch does not match the config file, the service assumes it is a feature branch and no deployment is made. The lambda then clones the repository and runs a deploy script kept within the repo. I built a build board and deployed it with the platform which shows the current state of every repository and streams the output in real time to the browser so anyone who is interested can follow along at home. I started calling it “Synergy” and I guess the name stuck. I hope to have a demo up next week.

Hopefully, anyone reading this far has a better understanding of what DevOps is and why we need all of those environments. Now, twenty years later, can someone please tell me why the ladies need all those pillows?