They say data is king. But that’s only true if you can get insights from the data that guide your choices.
To get those insights, you need the right database. But choosing between database options isn’t easy. You have to know your data, your users, and the purpose of the data.
Brighthive, a public benefits startup, moved from a relational database to a graph database. There were a lot of deciding factors to consider when deciding which one to use.
In this interview, Brighthive’s former VP of Engineering, Greg Mundy, delves into 3 key points:
- The differences between various databases
- Tips for picking the best one for your particular use
- The learning curve of transitioning from a relational database to a graph database
This conversation has been edited for clarity and length.
Can you tell us about Brighthive?
I would say we’re a mission-driven organization with a goal of helping organizations discover the power of their combined data. We do this by providing them with a data collaboration platform where they can safely share their data with each other, resulting in an increase in their value and impact.
We work in a space where oftentimes we find that data collaboration and data sharing is something that organizations don’t know how to do well. By providing a platform that allows the business and technical aspects to be in place for this data collaboration, it makes the process a lot easier and more transparent to customers.
Can you talk through some of the comparisons between traditional, relational databases and graph databases?
One thing I’ve been a huge advocate for is using the right tool for the right job. With that mindset, we use a variety of database technologies at Brighthive.
Our main decision-making points include:
- The type of data we’re working with
- Where the data lies
- How the data is structured
- The needs of the end users
If we need to store large amounts of unstructured data in a cost-effective way, then we tend to go for the NoSQL alternative. We also tend to make use of AWS for a lot of our infrastructure. We know that our data is unstructured and it’s really hard to put a scheme on that sometimes. So we use AWS and DynamoDB for that kind of storage.
Most of the time, there isn’t a need to do analytics or sophisticated queries on the data. It’s often just stored for a bulk pull. By using technologies like DynamoDB, we’re able to store this data in a very efficient and cost-effective way.
If we’re doing anything related to data analytics or data warehousing then we would most likely go for a database solution, like Amazon Redshift, just because that’s the purpose of Redshift: to provide the data warehouse backend for us to do simple to moderately complex analytics.
Structured, relational data.
When we have highly structured relational data, we opt for PostgresSQL or something similar. Amazon Aurora is a common choice since it essentially emulates the behavior of a PostgresSQL database, but at a fraction of the cost.
Graph databases are really important to us, but in a slightly different way. Our graph databases are the technology that enables our platform to unlock insights and provide visibility into data.
Our customers have this craving to understand not just what data they have but also how they can use that data. Both pieces are critical for responsible data sharing.
By using a graph database, we’re able to construct a knowledge graph that connects people within a “collaboration” to the organization they’re affiliated with. It also allows us to link those organizations to the data they contribute. This capability is ideal for use cases that specify when the data should and shouldn’t be used. By providing that kind of visibility and transparency, it gives a lot more sanity to the process of collaborating on your data.
Our team uses Neo4J as our graph database. Of the many languages that it supports, there’s one called Cypher, which we use to write our queries.
How do you decide between graph databases and relational databases?
In my opinion, graph databases are great but they’re not necessarily a silver bullet for any kind of data function. It’s important to understand the limitations.
The one thing we tend to ask ourselves when deciding between a graph database solution and something else is: what are the parameters of the data?
For example, if the data isn’t structured well and doesn’t have many relationships, but the connection between the database elements is critical, then a graph database may not be the best fit.
Was there a learning curve for your engineering team to adjust to graph databases?
Most of my development team had never worked with graph databases before we decided to go this route. There is somewhat of a learning curve when it comes to graph databases, particularly if you’ve been working in the realm of relational databases for a long time. But for the most part they’re pretty simple structures and have been very intuitive for the team.
The same concepts are seen all the time in computer science: You have a graph made up of nodes and edges. The concept isn’t very complicated. The challenge is in figuring out how to apply it to an actual production site issue.
Once our engineers got a sense of how graph databases work, they started to realize that it was actually a lot easier to get insights from the graphs, as opposed to super complicated joints in relational databases.
Over time, some of our engineers got really hooked on the tech and the question became, “when should we not use a graph database?”
As a whole, our team encountered a learning curve when it comes to managing a graph database in production. But once we got the hang of things it helped boost our ability to unlock insights through our knowledge graph.