skip to Main Content

StreamSets Data Integration Blog

Where change is welcome.

Synchronize HDFS Data into S3 Using the Hadoop FS Standalone Origin

By July 10, 2018

Introduction: from HDFS Data to S3

I am very excited to announce the new Hadoop FS Standalone origin in StreamSets Data Collector Data Collector has long supported the Hadoop FS origin, but only in the cluster mode. The Hadoop FS (HDFS) Standalone origin does not need MapReduce or YARN installed and can run in multithreaded mode, with each thread reading one file at a time in parallel.

Back To Top