Skip to main contentOpen Source Cloud Guide

Big Data

What is big data?

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. 1

Why is this important for hybrid cloud developers?

Data is getting larger, more complex, and moving faster than ever - a growth that is largely spurred by the widespread adoption of cloud computing.

With so many computers around the world connecting and transmitting data via the cloud, the challenge for developers is how to harness enormous amounts of data in a meaningful way.

Solution Sketch

  • Batch data processing with Spark
  • Real-time data processing with Kafka

Limitations

  • TBD

Key open source projects

Cloud comparision

ComponentIBM CloudGCPAWSAzureOperator
Managed SparkAnalytics EngineDataprocEMRHDInsightSpark Operator
Stream ProcessingEvent StreamsPub/SubMSKEvent HubsKafka Operator

Additional resources

Blogs

Videos

Tutorials