What was the scope of their involvement?
We talked to Caserta about our goals, and they did a week-long dive into our data lake. We went through potential use cases, and they helped design and architect the data lake, using AWS. We also discussed using different processing layers and scheduling orchestration tools, giving us a comprehensive platform blueprint. We considered using a pre-made data lake platform, but they weren’t flexible enough, and Caserta came up with an excellent outline.
Once we agreed on a proposed blueprint, Caserta provided a few consultants to work with our in-house team to build the infrastructure and write the code. They helped train our team members to get them up to speed on our project. After that, we began to generate our first set of analyses for our portfolio managers.
As we worked on the project, Caserta suggested we use Databricks as our Spark layer to process the larger datasets. We wanted to be able to scale, so they recommended elastic computing to easily control the cluster of Spark machines. We also use AWS Redshift, and our processing generates parquet files that are stored on Amazon S3. We use MicroStrategy to manage our internal components, and Caserta uses Tableau for interactive visuals that the analysts and portfolio managers want. They also recommended Airflow as our orchestration server for the cloud.
What is the team composition?
They provide two expert resources in Python and data analytics and bring in different experts, depending on the phase of the project. We also interact with Joe (Founder, Caserta).
How did you come to work with Caserta?
Joe was the keynote speaker at a conference a while back, and he had a great approach to handling alternative data. He knew how to approach larger datasets and build data lakes, so we got in contact and eventually hired them.
How much have you invested with them?
We’ve spent around $700,000.
What is the status of this engagement?
We partnered in November 2017, and the work is ongoing.