When it comes to loading data into Apache Hadoop™, the de facto choice for bulk loads of data from leading relational databases is Apache Sqoop™. After initially entering Apache Incubator status in 2011, it quickly saw wide spread adoption and development, eventually graduating to a Top-Level Project (TLP) in 2012.
In StreamSets Data Collector (SDC) 2.7 we added additional capabilities that enable SDC to behave in a manner almost identical to Sqoop. Now customers can use SDC as a way to modernize Sqoop-like workloads, performing the same load functions while getting the ease of use and flexibility benefits that SDC delivers.
ClarkeHow to Convert Apache Sqoop™ Commands Into StreamSets Data Collector Pipelines
Three months into my journey here at StreamSets and I’ve had a chance to talk with many of our customers and prospects to understand how they are using the open source StreamSets Data Collector (SDC) across a number of different use cases. As it turns out, behind solving technical problems in areas such as cybersecurity, IoT or plain old data lake ingestion lies a treasure trove of value that IT teams realize as part of a typical deployment. While this is not an exhaustive list, let’s take a quick look at some of the more common benefits our customers call out.
ClarkeStraight from Our Customers: The Benefits of Modern Ingestion