How ARInspect Is Building Their Inspection Platform Using Microservice Architecture

Inspections are the front-line defense against disasters like the water crisis in Flint, Michigan.

After seeing that crisis, the team at ARInspect began working on an augmented robotic process automation (RPA) platform for Environmental Health and Safety programs.

Their platform includes smart scheduling and dispatch, dynamic inspection checklists, automated workflows, and more. The team is using next-generation technologies (like wearables) and data insights from the platform to make inspections more frequent and insightful.

We spoke with CTO Sonia Sanghavi to learn how they’re building their platform.

In this article, Sonia shares why she joined the company, how she is implementing microservice architecture, and the pros and cons of streaming architecture.

At what stage in the product lifecycle did you join ARInspect?

I’ve been with the company for a little under two years. I started in a consulting/advisory position when they were first demoing the product.

I then joined full-time as CTO and am currently in the process of building beyond the minimum viable product (MVP). We’re setting up the product and roadmap and moving towards the target architecture.

While I was working at Capital One, they built out a production-ready MVP. But they were having some issues around scalability and volume.

You’re not used to seeing that volume of data analytics and then making it available in an offline fashion, especially on a mobile platform where your resources and memory are limited.

But I saw the vision and we started building our next generation application.

When I came on board, we started building out a new team, building out the architecture, implementing the architecture, and taking it towards the target state while still keeping our current clients in production.

How do you keep architecture simple and scalable within a microservice architecture?

The overall theme I like to follow when I start solutioning a business problem is to keep it simple and to make it available as soon as possible to the end users.

Keep it fast and keep it simple.

To keep things simple, I first:

  1. Understand the scaling requirements and the SLAs.
  2. Determine the size of data we’re working with.
  3. Determine how critical the application is.

You need to understand the domain and everything else that comes with it, of course. But from an overall reference architecture standpoint, those are things I like to look at.

In my previous work at Capital One and an ad tech company, I saw classic cases for a streaming platform.

When I moved to ARInspect, I wanted streaming. I want to stream everything. I want to make everything asynchronous.

But it’s not yet the use case for a streaming platform.

We need to make the architecture in a specific way because we’re asynchronous and have smaller services that are domain-specific.

There are divisions, and they have references to each other, but each division is responsible for only doing certain things.

That’s how I look at a microservice architecture.

Most people implement rest services. That’s certainly part of it. You have your UI, client, backend, and database.

But how do you separate out the domains?

Let’s go back to the classic case of inspections. When you conduct inspections, you have notifications, you have different microservices, you have enforcements and post-processing steps that come after you have submitted the inspection.

The idea is to understand the domain and divide it into different context-bound services.

That’s the good thing about microservices, where smaller engines run and interact in a way that they’re independent of each other. So even if one of the notification services goes down, inspections can still be conducted.

We map out which domains depend on each other and build according to the domain need – not according to just the technology needs.

Secondly, we built our DevOps around microservices.

As you horizontally scale the microservices, it’s a beautiful world of smaller services doing their own little thing.

However, problems occur when one of the servers goes down.

We set up a monitoring system and an auto-scaling system around a microservice architecture and built a CICD pipeline around it, because you want to work differently according to the unique needs of the services.

That’s where I think you want to architect a microservice driven solution. But you need to figure out the process of how to get there.

How will you design and implement your CICD pipeline so you can actually use the advantages of the microservice architecture?

Same thing with delivery.

Building out five to six services with a lean team can also be an issue. How do you decide?

Let’s say I know my end goal is to build these 10 services. The key is to build out the interfaces under the hood so that, as the services expand or grow or disconnect, it doesn’t impact the end client.

While everybody wants to go into target architecture mode from day one, I like to focus on interim steps before getting there.

We have production clients, so we can’t just turn off the switch. We have to look at our current architecture, determine our target, and decide the interim architecture until we get there.

There are times in my career where I’ve gone high risk and just turned it on and off. But because we are in production, we went to an interim approach of splitting out the services and scaling horizontally.

Running a lean team, you must sacrifice in certain areas. How do you decide where to prioritize?

Denormalization of data

The number one thing we sacrifice in a microservice architecture is denormalization of data.

Data duplication is a big sacrifice.

You might have one service that is responsible for notifications. It stores email addresses and metadata about what it needs.

But there could be duplicated information in another service. Because you want that self-contained black box service, you might store similar data in two places. That means sacrificing updates.

For example, if someone updates an email address, how do we notify all the other services that they need to update themselves?

The level of complexity increases but it helps in the long run because if one of the services goes down, you don’t have to worry about notifications not working.

Maintenance

Another area that we sacrifice is maintenance. That’s why I emphasized setting up the CICD pipeline and the monitoring pipeline.

The more services you have, the more maintenance and versioning you must deal with. Keeping a close eye on all of that and setting up automated processes from the get-go will help you solve some of those disadvantages of a microservice architecture.

Some of your architecture is streaming. What technologies or services are you using to facilitate that? What are some of the pros and cons when it comes to streaming architectures?

As of right now, within ARInspect we have an event-driven architecture because the data coming in from our inspections are not high volume. So we’re not using a streaming platform on that side of things.

But we are using a streaming process as we connect to different IoT devices or weather devices.

Coming back to some of the actual implementation I’ve used, it has been mostly around Kafka.

Apache Kafka is a streaming platform in conjunction with Spark.

Spark is now a full-blown streaming platform, but when I started working with it about five years ago, they were still doing some back processing. But I’ve used Spark for the streaming side.

A situation where you’d want to use streaming is when you want near real-time analytics or want to give instant results to the users. Plus, streaming enables you to have multiple consumers. Just like a broadcast.

Event-driven architectures have been around for a really long time. It used to be called enterprise integrations, and you had messaging cues. It’s ultimately a message going from a producer to a consumer with a broker in the middle.

Where streaming is different from messaging cues is that the producer of the information is not invested in the consumer of the information.

Coming back to the case of ARInspect, when an inspection is submitted, the end client and/or compliance manager needs to know. Someone needs to get notified.

But the person submitting the inspection doesn’t really care where it ends up. They don’t care about the post-processing that’s going on or who else is consuming and using the information.

You may not want to use streaming when you have a lot of data and it can only be processed if it comes in a certain order.

When ordering comes into play, you don’t take full advantage of the processing power of a streaming engine or a streaming job.

Another use case where it doesn’t work very well is when you need guaranteed delivery.

In terms of architecture, it’s a command pattern. So if you’re producing something with the intention to make it go to a particular client, and only sending it to another client after they process it (like in a workflow process), then streaming is not the framework for you.

You could still stand up a Kafka and make it work. But you’re creating an entire infrastructure of a distributed streaming environment and using only 1% off it.

I’ve seen people do it that way. They stand up Kafka and allow a lot of consumers to take advantage of the streaming infrastructure.

But it doesn’t make sense. You’re getting speed, but you’re still writing a lot of code and your consumers care about the timestamps or the ordering of events.

I’m always a big fan of streaming frameworks because it keeps the system distinct. No one cares whether the other components are upstream or downstream. They just care about getting what they need, processing it, and spitting it right back out.

Do you look for people with specific backgrounds or knowledge of specific programming languages that work well with Kafka or Spark? Or do you feel like you can mentor and train new team members?

I’ve worked on teams that have functioned both ways.

For example, when we were entertaining the notion of Spark, I only had Java developers on my team.

A bit of background on Spark:

Spark is a framework. You can write your code in Scala, Java, and Python and they can all run under the same Spark distributed environment.

Ultimately, I leave the decision of what we use to the team.

I like to keep things simple.

I know Scala is the way to go to build a Spark job because it takes me four lines of code versus writing 15 lines of code in Java and then debugging and unit testing it.

But if the team collectively says, “we don’t want to do it,” I’m like, okay don’t do it. That’s where I try and encourage the team to come up with what they want to do and what makes them most productive.

If the team is open to learning, then I become a champion for learning. We find the resources. We find training. We find other teams in the company that have this knowledge.

The easiest way to learn something new is to find someone in the company who’s done it before. I’ve worked in environments where other teams, that focused on Scala, code reviewed for our team.

Yes, there are articles, but when you can pick up the phone and talk to someone you actually feel better. I love that energy and sense of collaboration.

I always encourage any team member to do what makes them happy when it comes to coding. But it has to be a team decision, not an individual decision.

It depends on the team, and I’ve been fortunate to work with wonderful people who are always enthusiastic about learning.

I love bringing in newer technologies and newer libraries and learning from them.

Is there anything you’d like to highlight about your tech stack?

First, we use Kotlin multi-platform. When we started redesigning our MVVM architecture on the mobile side, we thought about Kotlin and multi-platform and thought about how we would scale it to other operating systems mobile platforms like Android, iOS, or Windows tablets.

It’s definitely bleeding-edge. It’s something I don’t do very often, but I have a great lead on my team who’s a big Android fan. He brought it in and everybody rallied with him to bring that to life.

The second part is our event-driven system in the backend. We predict the compliance scores and the risk scores with other external data. We’re excited about that platform and excited about all the inspectors using it.

I’m also excited at how we’ve put our product on handheld and mobile platforms. It opens us up to many IoT devices that conform to inspections.

For example, say you have a UV Ray tester or a water sampler detector that can connect to your smartphone. Or a RFID or barcode reader that you can attach to your smartphone to take advantage of all the IoT and Bluetooth-enabled devices within your mobile apps.

That part fascinates me and brings me back to my hardware days.

There are so many use cases. I’m excited to continue integrating with drones, wearables, and other devices so we can enhance inspections through our platform.

Interviewer: Aswin John

More Related Insights