Big Compute or Big Data?
This question comes up on a fairly regular basis. So I thought it would be interesting to share my understanding in hopes to help you make the right decision.
Both are enablers, and they create opportunities through various approaches. When the problem is understood, and the algorithms vary by parameter, then Big Compute is definitely an approach to consider. When we know our input data, and are experimenting with various algorithms, Big Data is a clear winner.
This being said, let’s try to materialize this into something more concrete.
Big Compute shines at large scales. Easily parallelizable workloads are the best use cases, because they allow us to break the workload into independent tasks. This is where we can gain the most from large numbers of compute cores. Big Compute is all about executing any software package, written in any language by passing in variables. This creates an amazing opportunity for developers to optimize their code to be extremely efficient. Optimizations range from concurrency management, memory management, limiting IOPS and other aspects like network communication optimization. Possible scenarios are well known algorithms like Monte Carlo simulations, rendering and work flows.
Big Data is all about empowering us to experiment with our data by providing us with tools, query languages and scripting capabilities that are geared at giving us a lot of agility. Tinkering with algorithms, is the perfect use case. We know our data, and want to extract insights from it. This means that we’re going to clean it, shape it and question it. Big Data is built for this; it makes it possible to iterate through multiple versions of our algorithms in a way that’s difficult with Big Compute.
So now that we’ve nailed this down, which is right for your workload?
Share your thoughts in the comments below