Synchronize HDFS Data into S3 Using the Hadoop FS Standalone Origin
Introduction: from HDFS Data to S3
I am very excited to announce the new Hadoop FS Standalone origin in StreamSets Data Collector 3.2.0.0. Data Collector has long supported the Hadoop FS origin, but only in the cluster mode. The Hadoop FS (HDFS) Standalone origin does not need MapReduce or YARN installed and can run in multithreaded mode, with each thread reading one file at a time in parallel.