Allma

sign in

Incidentally#005

Welcome to Incidentally, Allma’s publication. We interview engineering and reliability leaders, founders, and makers on the secrets and tools they use to scale their systems and teams. We share incident collaboration stories and learnings, in solidarity and with openness. Our goal is actionable for you to implement and high-level to be applicable; your feedback is always sought.

From video games to SRE Shawn Hansen shares lessons and advice on establishing a path in tech and building reliability from the ground up

Shawn Hansen is an experienced Site Reliability Engineer and has had a diverse history in tech. After getting his start doing QA for video games, Shawn enrolled in The Flatiron School and went on to get roles at AppNexus and Knotch, picking up skills in reliability along the way. Shawn sat down with Allma to discuss his path into the tech sphere and share his advice on establishing a culture of incident collaboration and reliability at your company.

An interesting path into tech

Hi, I’m Shawn Hansen. My path to engineering was quite odd; to make a long story short, I played a lot of video games as a kid, and was able to start getting paid for it in a QA type role as I got older.

Unfortunately, I was semi-professional right before people started to make a lot of money playing video games, which is somewhat of a regret of mine.

While I was working with a third party company that did QA for games, I was encouraged to try programming from the people I was working with because they noticed that I was analytical and always wrote really detailed reports when reviewing games, finding bugs, etc. I wanted to give it a try, so I started out purely by self teaching myself online before I moved to a course that was more structured. That’s how I came to The Flatiron School.

Discovering The Flatiron School

When I was fortunate enough to come across Flatiron, I was still fully self taught and just doing programs online like Codecademy and Khan Academy. However, both are more focused on educational, general topics that aren’t necessarily all about programming, and I was ready to dive into the nitty-gritty. I decided to google around to try and find different services or opportunities, and I came across The Flatiron School, a bootcamp to learn more about backend programming and coding. When I initially found it, the deadline had already passed to hand in an application, but I saw you could still hit submit on the website. I just went for it and poured my heart into my answers.

Tech was not my first choice by any means, but at some point in my journey, I realized that I had the mind for it- Flatiron was my chance to give it my all and apply myself to pursue a career in tech.

From there it was all a rush: I heard back on a Friday, did the entire interview process in one day, and then stated the program on that following Monday. I was very fortunate to get into the Flatiron School in a special program called the New York Tech Talent Pipeline for people who came from lower income families, so I was able to attend for free. Of course it was a big deal for me, but it was also a big deal for my family that I was able to get in and placed in a position; I’m one of the first people in my family to make more than minimum wage. The Flatiron community is on the smaller size, getting to learn from such talented and influential people made it such an amazing program, and I’m grateful for it every day.

Diving into the tech world

Once I finished at Flatiron, my first job was at AppNexus as an intern. Funnily enough, I didn’t use most of the skills I had just acquired, but the learnings, logic, and processing knowledge carried over.

It was really like someone saying, “okay, so you're going to use some of the foundation principles from everything that you just learned, and otherwise you're gonna learn all new stuff”... I personally think this is a good description of software engineering as a whole.

This demonstrated how important it is to be a good learner to be a good software engineer, or any kind of tech person, really. I had to use these skills because AppNexus was one of the most complex environments that I could be exposed to as a junior with no college degree just coming out of the Flatiron bootcamp. Overall, it was a huge cultural shock for me.

Dealing with Imposter Syndrome in tech

I deal with, and have had to deal with, some degree of imposter syndrome every day I’ve been an engineer because of the path I took to get here. People who go to bootcamps tend to face this more, because there is always that worry that you don’t know as much because you don’t have that engineering degree like most other engineers, that there might be gaps in your education. The workforce can also be difficult, because not every organization uses the same rubric; you can be amazing somewhere, but then change jobs and be absolutely terrible or strictly mediocre somewhere else.

My ability to communicate well has helped me a lot, which is something I didn’t know I had until people kept telling me I had good communication skills “for an engineer”. Being able to communicate ideas, questions, problems, and so on has been key for me getting where I am today and dealing with some of the imposter syndrome because I am more willing to ask for help and explain my own thinking.

Getting my start in reliability

This may sound facetious, but I think my natural disposition and anxiety continues to draw me towards reliability. I can get to sleep at night, but I will wake up in the morning and think about how certain things are a problem waiting to happen. I got my first reliability role because they were looking for someone to help build modern visualizations for metrics and dashboards, and after that job I stuck with it. Working in DevOps and reliability allowed me to get some of the background experience that I felt I was missing from my education at Flatiron. I got to learn containerization, more about architecture, big data, and all of these great technologies, while also leveraging my experience at AppNexus. Of course Google’s SRE book is the bible, hands down, but I’ve gotten to see the nuances in SRE because everyone will always have slightly different definitions and ways SRE and DevOps overlap.

Reliability philosophy

SRE and reliability can be tough topics because proving the business value and need for reliability can be an uphill battle. I always like to start out with a general audit to get a lay of the land, for instance, a survey to see if there’s a baseline of reliability to start out with. Then, I go through and try to see what systems and processes I can observe and monitor as another good starting point. When it comes to optimizing this foundation, starting small is great, like getting at least one application or microservice fully observable. It’s all about getting a first level safety net because with so many different pieces running, it’s super easy to not know what is helping you or hindering you- no loops, beeps, dashboards, no phone to blow up when there’s problems, all of the things that can be annoying but are also necessary.

It is all about bridging the business need for reliability and the quality that reliability brings out of the business.

The reliability sector is constantly developing as people understand what it is and what the roles of SREs are within it. For me, being an SRE is about going through and understanding what kind of technical hoops older systems may be going through and cleaning up any redundant, complicated, or tedious systems that could easily lead to an incident. It’s about keeping the pulse on things and monitoring to make new suggestions. At the end of the day, the most important metric will always be uptime.

It’s like working in the pit crew for NASCAR: nobody wants to stop for very long to risk losing their place, losing time, or losing their business.

You have to keep improving, and you may have to change the wheels to make them more reliable, tune up the engine… you never know until you see the inside.

Where to start - lessons and advice for investing in reliability

Honestly, the most productive way to start is to first figure out if reliability is essential to your business because sometimes it’s not worth the effort, research, and resources required (which, I know, sounds counter for me to say as an SRE). It’s when you’re losing too much business due to down time, problems, etc. that you need to do something. I wouldn’t even hire an SRE right away; start with a current engineer, like someone who is full-stack or a DevOps type, to undergo an audit of your different systems. Once you see where you stand, it’s helpful to also review the historical incidents you’ve had in the past, which can be anything from an outage to something even bigger. Finally, evaluate the frequency of these incidents and then see if it’s worth spending existing resources to pursue some early forms of reliability. Look at your cost metrics through whichever cloud provider that you currently use (like AWS, for example), to get information on your infrastructure to see if there’s obvious errors. If these errors are too much to handle or the team is too small to pull in some other resources, it would then be a great time to consider either a consultant or a full time SRE. At some point, reliability will be right for everyone because you’re going to reach a volume or scale that needs some degree of monitoring, scaling, or observability, because otherwise your website could suddenly be down for a month.

Hot take: the concept of micromanagement

This may be a controversial thought, but I have always thought that micromanagement can have utility in certain use cases. There is a degree of observability involved when it comes to monitoring progress, but is it a true “micromanager'' if there isn’t a human manager over your shoulder demanding things? Being able to transparently see where things are at all times and micromanage digital infrastructure is important to me. Micro actions prevent macro incidents!

Something that I had to learn that I still see people get stuck with is knowing how and when to fail fast. Now, I have my own mental maps. For instance, I’ll give something 15 minutes and if I feel like I’m not learning anything or catching the trail, then I will reach out for help or more assistance. It’s about being open and honest with yourself, because ultimately it’s better to escalate for help if you’re stuck on something and get an answer in 5 minutes than spend 8 hours on the same question. It is very hard, especially if you’re suffering from imposter syndrome or you’re a junior engineer, but if anything, it’s better to over reach out!

Recommendations

One of my favorite board games right now is the Sheriff of Nottingham, which is a fun little board game that is great for the whole family.

The other one I’ve been into recently is just called The Farming Game, which you can play with the whole family as well, but, as a warning, it is a brutal, difficult game simulation of real life farming. It’s very competitive, like Settlers of Catan meets farming with some added cruelty thrown in. Sounds fun, right?

SH

Shawn Hansen

Shawn is a Site Reliability Enginer at Electric.ai

Continue the conversation

join the Allma Discord community

incident
management
collaboration.

Allma– UI-less Incident Collaboration. Natively in Slack.

Continue reading

How HubSpot’s Former Director of Reliability Uses First Principles and Customer-centric Philosophy to Scale ReliabilityWhat the former CTO of Artsy learned about automation on his way to principal engineer at AWS
view all issues

our newsletter is cool