Welcome to Allma’s Incidentally, a newsletter where we interview Engineering Leaders on how they’ve scaled their companies and the secrets and systems that they use to build and evolve reliability practices on their teams. Our goal is actionable for you to implement and high-level to be applicable to you; your feedback is always sought.
Founder of Architect.io on Scaling from a Startup to a Corporation and Back
David Thor is an experienced engineer and now founder of Architect.io. Previously, David has worked for several big companies and startups, one of which was then acquired by Facebook. If anyone has a lot of experience scaling from a small startup to a huge company and back again, it’s David Thor. Allma sat down with David to talk about everything from drivers licenses to protecting sensitive data to his future plans as a new startup founder.
Starting his Journey Into Engineering
Hi, My Name is David Thor. I got into engineering from playing video games and playing around on personal computers while growing up. I ended up going to Northeastern to study computer science and math, and loved the gratification of being able to build things from the ground up with nothing more than my computer.
After graduating, I jumped into working with some larger teams in a few big companies, like PayPal and TripAdvisor, before dipping my toes into the startup community. I quickly caught the startup bug and found that I like building teams and scaling them from zero to 60 just as much as I like building software itself. This process proved more interesting than I could have anticipated, a highlight being my prior work at Confirm.io.
The Journey with Confirm.io: From Four Employees to Facebook
Before Confirm, I had been running a small dev shop with about four employees until I was asked to merge into this budding startup. We had several repeat entrepreneurs telling us we could build up Confirm, be acquired in 1000 days, and build it into a real innovative force in the forensic authentication market as a SaaS product. They brought in a bunch of money, my business, and another business that focused more on the forensic side of the security features. I was tasked with effectively building the team, patterns, and processes to incorporate our domain knowledge into a cloud API first application, which was right within my wheelhouse. However, it was a totally new experience learning about both the computer vision side, the security features on a driver’s license (which I now know way more about than I ever thought I would), and how to make malleable AI models that could work around the nuances of how people take pictures with their phones.
Sure enough, we ended up being acquired by Facebook in about 970 days.
It was amazing that our executive team was able to put that together. Through this, I got an amazing experience building a team from 4 to 15 engineers working on the product, and then changing from a transactional SaaS business into something that’s a critical tool for Facebook’s high risk surface.
Establishing Product- Managing and Implementing a Vision
At Confirm, we were very fortunate in two respects when it came to establishing our product. The first was that our executive team set up a stellar partnership right out of the gate with a company that manufacturers the driver's licenses for most of the United States today.
Setting up a partnership with a company like that gave us unique insights and capabilities to both learn about the security features of driver's licenses and work collaboratively to build models that would detect if they exist in pictures.
The second was another acquisition they made outside of my dev shop was a small business that was selling tools to bars and liquor stores to let them identify security features on licenses using higher-end equipment. So we had a lot of inspiration for where the security features would be and how that would help us build the product, which we probably wouldn’t have been able to find with a Google search.
Building API's is something that I've been doing for a long time, but trying to figure out how to set up the back of house processes that would power the training and experimentation pipelines for artificial intelligence and machine learning was new to us.
There were 2 big teams that were built out and we wanted to drive autonomy, rapid experimentation, and rapid release processes.
The minute we saw improvements, we want to get them out there because this was an area that the lowest bar we could have for a product needed to still be extraordinarily high -- doing something just okay was going to leave a lot of things misidentified and a lot of driver's licenses getting through what should be a highly secure wall. At the end of the day, we really evolved into two independent teams that had to find the touch points to connect together and the ability to move fast and autonomously.
Scaling and Reliability
For the team side first, I think largely, the tools circa 2015-2016 were relatively new around AI training and modeling. Since we were dealing with sensitive data, we actually weren't able to train on a lot of the traditional hardware that you would find from AWS or cloud providers. In fact, a lot of our partnerships led to us trying to platform-ize our pipeline so that we could run the entire training process inside customer environments, never letting data hit our servers that could be considered sensitive for hackers to attack.
We knew that from day one, we didn’t want to take on the risk exposure.
You were never going to see a log or dump of licenses on our servers. We knew that we didn’t want to expose our customers or their end-users to that kind of risk, so we made it a principle and philosophy for our business from day one not to store sensitive data.
That idea went into how we operationalized our team and how we thought about reliability. We never relied on the thought that “oh, we’re up 99.99% of the time,” it was about accounting for the other 0.01% and what happened if we were attacked maliciously. So, at the end of the day, we needed to answer all these questions with “no, nothing bad can really happen”.
Incidents and Sensitive Data
In our case, because of the way we designed the system, we knew there wasn't data sitting around for hackers to find, but we also knew people would be trying to find out if there was. What we wanted to do was make sure that we had anomaly detection and analytics at every point of entry into our cloud environment. At the time, this was really difficult because a lot of the stuff we were doing, regrettably, was DIY, even though for this category today, I’d be able to go to a lot of vendors. How API products and cloud applications get attacked is a shared problem faced by a lot of people. That makes it a really good opportunity for vendors to be quite good at anomaly detection, error reporting, point of entry management, auditing, and a lot of stuff for securing your cloud environment. I'd say it’s fruitless to try and reinvent the wheel in those regards in this day and age. Let people learn from what they've experienced with someone else's product and bring it to you. But back then we did constant audits, trying to make sure we were bulletproof.
Transitioning to Facebook
When Confirm was acquired by Facebook, the biggest shift by far was that we went from operating a customer facing business to being a product exclusively for Facebook almost overnight. This change was a huge shift in motivation for building the product.
After being acquired, we no longer worried about revenue coming in -- it was all about performance and scalability of the product.
We had access to many new resources, both in regards to staffing and in the sheer amount of computer hardware we had at our disposal to train models and improve our API performance. We were now in one of five companies that have that level of computing available, and all of us were very excited.
The other big change we went through was that the expectation for AI systems was so much higher. Because Facebook is doing a lot of modelling on a ton of data, the expectation is that you have really, really high efficacy of your models -- we’re talking 99.9% success rate or higher. When you think about it, Facebook has 2 billion users, so even .1% of that is still 2 million people. The worst thing that happens for us is that a bad actor gets corrupted, which meant that we needed to reject more people than Facebook was typically comfortable with. As much as we were investing in the AI, we had to convince them that we were not going to achieve the 99.9% positive success rate. We had to do a lot of educating on the fact that we’re a security product even though we’re using AI tools, so the expectations are going to be different.
Another big shift going from startup mode to a larger company was the amount of time that passes because of all of the communication required. We had no shortage of work to keep busy but convincing people of the reality of the metrics takes some time. We didn't really start the support side of that engagement until later in my time at Facebook.
The Birth of Architect.io
At the beginning of 2019, I left my job to start a new business that I had been conceptualizing for a while. The core of the idea was based on solving problems that I had seen with my dev shop and with Confirm when selling to many customers with a hosted API and an API deployed to the customer. We had a tough time maintaining all the discrepancies between the customer deliverables, because every time we deployed the application every customer needed something a little bit different.
*If we had six customers, we would have six unique, independent products to maintain and contribute to. That quickly became a burden for us because it made it hard to create new services as it introduced high complexity.*
What we are doing with Architect is finding a way to give developers more control to create new services with less friction, and without needing to think constantly about where it’s going to go. You describe your container and your dependencies, and our system can automatically resolve those, deploy them, connect to them, secure them, and account for all the different preferences of the specific environment. We just released our DevOps-as-a-Service product into open beta this month!
Watching a New Company Grow
It’s been great being able to spend the last 18 months working on one product instead of working on a new one or with a different customer every 3 months as I had done with my dev shop. It’s been enormously satisfying not just getting to start my own product-business, but to start one that sets up groundwork that impacts so many developers like me so they can collaborate more effectively. The past year, I found a cofounder, subsequently brought on our first 2 engineers, and we have really gotten into being able to do DevOps in an automated fashion. I’m going to get to experience the in-between stages that I missed from jumping from the 3-year company life to the giant corporate company -- that’s the adventure I’m excited to get into.
As we’ve been in quarantine, I've been playing a ton of Dominion. It’s a deck building card game and that’s been really fun to play, especially after I got my wife and a couple other friends into it.
When it comes to reading I’m actually much more of a fan of reading blogs than books, one of my favorites being “Stratechery”. It takes opinionated views on various different practices, principles, and patterns in the field of technology and really breaks it apart into some interesting conversational topics. I highly recommend that one as well.
Continue the conversationjoin the incident collaboration slack community
Allma is a tool built with incident best practices baked in, designed for everyone in your organization to collaborate on incidents.