skip to Main Content

The DataOps Blog

Where Change Is Welcome

Synchronize HDFS Data into S3 Using the Hadoop FS Standalone Origin

By July 10, 2018

Introduction

I am very excited to announce the new Hadoop FS Standalone origin in StreamSets Data Collector 3.2.0.0. Data Collector has long supported the Hadoop FS origin, but only in the cluster mode. The Hadoop FS (HDFS) Standalone origin does not need MapReduce or YARN installed and can run in multithreaded mode, with each thread reading one file at a time in parallel.

Back To Top