Install Additional Stage Libraries

Install additional stage libraries to use stages that are not included in the core installation of Data Collector. This is an optional step, but generally you'll want to install additional libraries to process data after completing a core installation.

Important: You must perform additional steps to install the MapR stage libraries, as described in MapR Prerequisites.

For a complete list of the stages installed with each stage library, see Available Stage Libraries.

You can install additional RPM stage libraries using the Data Collector command line program.

You can install additional tarball stage libraries using the Package Manager within Data Collector or using the Data Collector command line program.

Installing for RPM

Use the following commands to install additional stage libraries for an RPM installation:

To install one or more stage libraries:
Use the following command to install the stage libraries downloaded to the current directory:
yum localinstall <libraryID>-<version>-1.noarch.rpm,<libraryID>-<version>-1.noarch.rpm,...
Use the full name of the library package that you want to install, separating them with commas. Do not include spaces in the command.
For example, to install the Amazon S3 origin and destination, as well as the Kudu destination for Data Collector version 2.6.0.0, use the following command:
yum localinstall streamsets-datacollector-aws-lib-2.6.0.0-1.noarch.rpm,streamsets-datacollector-apache-kudu_1_0-lib-2.6.0.0-1.noarch.rpm 
To list the stage libraries installed on the current Data Collector:
Use the following command:
yum list installed | grep streamsets
To uninstall libraries when necessary:
Use the following command:
yum remove <libraryID>,<libraryID>,...
Use the full name of the libraries that you want to uninstall, separating them with commas. Do not include spaces in the command.
For example, to uninstall the Amazon S3 origin and destination, use the following command:
yum remove streamsets-datacollector-aws-lib

Installing for Tarball Using the Package Manager

You can use the Package Manager within Data Collector to install additional tarball stage libraries.

Complete one of the following steps to display the Package Manager:

  • Click the Package Manager icon .

  • Click Add/Remove Stages in the stage library when viewing a pipeline in the pipeline canvas.

The Package Manager lists all available stage libraries, displaying a check mark next to each installed library. You can filter the stage libraries by type or you can search for a stage library in the list.

For example, the following image displays three installed stage libraries:

To install an additional stage library, click the More icon for the library, and then click Install. Or to install multiple stage libraries, select the libraries in the list and then click the Install icon . Confirm that you want to install the libraries, and then restart Data Collector for the changes to take effect.

To uninstall a stage library, click the More icon for the library, and then click Uninstall. Or to uninstall multiple stage libraries, select the libraries in the list and then click the Uninstall icon . Confirm that you want to uninstall the libraries, and then restart Data Collector for the changes to take effect.
Note: If Data Collector does not have internet connectivity, you can view the installed stage libraries and can uninstall a stage library. However, you cannot view all stage libraries or install an additional stage library.

For information about the stages installed with each stage library, see Available Stage Libraries.

Installing for Tarball Using the Command Line

You can use the stagelibs command to install additional tarball stage libraries.

The stagelibs command requires that curl version 7.18.1 or later and sha1sum utilities are installed on the machine. Verify that these utilities are installed before running the command.

Use the following commands to install additional tarball libraries:
To view the list of available libraries:
Use the following command:
$SDC_DIST/bin/streamsets stagelibs -list
This provides a list of all available stage libraries and whether they are already installed. For more information about the stages installed with each stage library, see Available Stage Libraries.
To install one or more stage libraries:
Use the following command:
$SDC_DIST/bin/streamsets stagelibs -install=<libraryID>,<libraryID>,...
Use the full name of the libraries that you want to install, separating them with commas. Do not include spaces in the command.
For example, to install the Amazon S3 origin and destination, as well as the Cassandra destination, use the following command:
$SDC_DIST/bin/streamsets stagelibs -install\
=streamsets-datacollector-aws-lib,streamsets-datacollector-cassandra_2-lib
When successful, the command line indicates that the stage libraries have been installed as follows:
Downloading: https://archives.streamsets.com/datacollector/<version>/tarball\
/streamsets-datacollector-aws-lib-<version>-SNAPSHOT.tgz
######################################################################## 100.0%
Downloading: https://archives.streamsets.com/datacollector/<version>/tarball\
streamsets-datacollector-jdbc-lib-<version>-SNAPSHOT.tgz
######################################################################## 100.0%
Downloading: https://archives.streamsets.com/datacollector/<version>/tarball\
streamsets-datacollector-rabbitmq-lib-<version>-SNAPSHOT.tgz
######################################################################## 100.0%

Stage library streamsets-datacollector-aws-lib installed
Stage library streamsets-datacollector-jdbc-lib installed
Stage library streamsets-datacollector-rabbitmq-lib installed
To generate the command required to perform the current installation (optional):
You can use the stagelibs command to generate the command to install the libraries that are installed on the current Data Collector. This allows you to easily replicate the installation elsewhere.
For example, say you installed three libraries above, and then installed another two. You can generate the command required to install all five libraries on additional machines.
To generate an installation script based on the current Data Collector installation, use the following command:
$SDC_DIST/bin/streamsets stagelibs -installScript
The command returns an install command, such as the following:
=================================================================================
streamsets stagelibs -install=streamsets-datacollector-apache-kafka_0_8_1-lib,\
streamsets-datacollector-aws-lib,streamsets-datacollector-basic-lib,\
streamsets-datacollector-cdh_kafka_1_3-lib,streamsets-datacollector-jdbc-lib,\
streamsets-datacollector-jython_2_7-lib,streamsets-datacollector-rabbitmq-lib
=================================================================================
To uninstall libraries when necessary:
To uninstall a library, use the following command:
$SDC_DIST/bin/streamsets -uninstall=<libraryID>,<libraryID>,...
Use the full name of the libraries that you want to uninstall, separating them with commas. Do not include spaces in the command.

Available Stage Libraries

The following table describes the stages installed with each stage library:
Stage Library Name Included Stages
streamsets-datacollector-apache-kafka_0_8_1-lib For Kafka version 0.8.1.
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_8_2-lib For Kafka version 0.8.2.
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_9-lib For Kafka version 0.9.
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-apache-kafka_0_10-lib For Kafka version 0.10.
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-apache-kudu_1_0-lib For Kudu version 1.0.x.

Includes the Kudu destination.

streamsets-datacollector-apache-kudu_1_1-lib For Kudu version 1.1.x.

Includes the Kudu destination.

streamsets-datacollector-apache-kudu_1_2-lib For Kudu version 1.2.x.

Includes the Kudu destination.

streamsets-datacollector-apache-kudu_1_3-lib For Kudu version 1.3.x.

Includes the Kudu destination.

streamsets-datacollector-apache-solr_6_1_0-lib For Apache Solr version 6.1

Includes the Solr destination.

streamsets-datacollector-aws-lib For Amazon Web Services 1.10.
Includes:
  • Amazon S3 origin
  • Kinesis Consumer origin
  • Amazon S3 destination
  • Kinesis Firehose destination
  • Kinesis Producer destination
streamsets-datacollector-azure-lib For Microsoft Azure Data Lake Store.

Includes the Azure Data Lake Store destination.

streamsets-datacollector-basic-lib Installs automatically with the core installation.
Includes the following origins:
  • CoAP Server
  • Directory
  • File Tail
  • HTTP Client
  • HTTP Server
  • MQTT Subscriber
  • SDC RPC
  • SFTP/FTP Client
  • TCP Server
  • UDP Source
  • WebSocket Server

Includes all processors except the Groovy Evaluator, Jython Evaluator, HBase Lookup, Redis Lookup, and Spark Evaluator.

Includes the following destinations:
  • CoAP Client
  • HTTP Client
  • Local FS
  • MQTT Publisher
  • SDC RPC
  • To Error
  • Trash
  • WebSocket Client
Includes the following executors:
  • Email
  • Pipeline Finisher
  • Shell
streamsets-datacollector-bigtable-lib For Google Cloud Bigtable.

Includes the Google Bigtable destination.

streamsets-datacollector-cassandra_3-lib For Cassandra 1.2, 2.x, and 3.x.

Includes the Cassandra destination.

streamsets-datacollector-cdh_5_2-lib

For the Cloudera CDH version 5.2 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Solr destination
  • HDFS File Metadata executor
  • MapReduce executor
streamsets-datacollector-cdh_5_3-lib

For the Cloudera CDH version 5.3 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Solr destination
  • HDFS File Metadata executor
  • MapReduce executor
streamsets-datacollector-cdh_5_4-lib

For the Cloudera CDH version 5.4 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-cdh_5_5-lib

For the Cloudera CDH version 5.5 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-cdh_5_7-lib

For the Cloudera CDH version 5.7 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_8-lib

For the Cloudera CDH version 5.8 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_9-lib

For the Cloudera CDH version 5.9 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_10-lib

For the Cloudera CDH version 5.10 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_11-lib

For the Cloudera CDH version 5.11 distribution of Hadoop.

Includes:

  • Hadoop FS origin for cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Solr destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
  • Spark executor
streamsets-datacollector-cdh_5_4-cluster-cdh_kafka_1_2-lib

For the Cloudera version 5.4 distribution of Apache Kafka 1.2.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_5_4-cluster-cdh_kafka_1_3-lib

For the Cloudera version 5.4 distribution of Apache Kafka 1.3.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_5_5-cluster-cdh_kafka_1_3-lib

For the Cloudera version 5.5 distribution of Apache Kafka 1.3.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_5_7-cluster-cdh_kafka_2_0-lib

For the Cloudera version 5.7 distribution of Apache Kafka 2.0.

Includes the Kafka Consumer origin for cluster mode pipelines.
streamsets-datacollector-cdh_5_8-cluster-cdh_kafka_2_0-lib

For the Cloudera version 5.8 distribution of Apache Kafka 2.0.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_5_9-cluster-cdh_kafka_2_0-lib

For the Cloudera version 5.9 distribution of Apache Kafka 2.0.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_5_10-cluster-cdh_kafka_2_1-lib

For the Cloudera version 5.10 distribution of Apache Kafka 2.1.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_5_11-cluster-cdh_kafka_2_1-lib

For the Cloudera version 5.11 distribution of Apache Kafka 2.1.

Includes the Kafka Consumer origin for cluster mode pipelines.

streamsets-datacollector-cdh_kafka_1_2-lib For the Cloudera distribution of Apache Kafka 1.2 (0.8.2.0).
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_1_3-lib For the Cloudera distribution of Apache Kafka 1.3 (0.8.2.0).
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_2_0-lib For the Cloudera distribution of Apache Kafka 2.0 (0.9.0).
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-cdh_kafka_2_1-lib For the Cloudera distribution of Apache Kafka 2.1 (0.9.0).
Includes:
  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • Kafka Producer destination
streamsets-datacollector-cdh_spark_2_1_r1-lib For the Cloudera distribution of Spark 2.1.
Includes:
  • Spark Evaluator processor
  • Spark executor
streamsets-datacollector-elasticsearch_5-lib For Elasticsearch 1.x, 2.x, and 5.x.

Includes the Elasticsearch origin and destination.

streamsets-datacollector-groovy_2_4-lib For Groovy version 2.4.

Includes the Groovy Evaluator processor.

streamsets-datacollector-hdp_2_2-lib For the Hortonworks version 2.2 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-hdp_2_3-lib For the Hortonworks version 2.3 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • HTTP to Kafka origin
  • Kafka Consumer origin
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS File Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-hdp_2_4-lib For the Hortonworks version 2.4 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • HTTP to Kafka origin
  • Kafka Consumer origin for standalone and cluster mode pipelines
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-hdp_2_5-lib For the Hortonworks version 2.5 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • HTTP to Kafka origin
  • Kafka Consumer origin for standalone and cluster mode pipelines
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-hdp_2_6-lib For the Hortonworks version 2.6 distribution of Apache Hadoop.
Includes:
  • Hadoop FS origin for cluster mode pipelines

  • HTTP to Kafka origin
  • Kafka Consumer origin for standalone and cluster mode pipelines
  • SDC RPC to Kafka origin
  • UDP to Kafka origin
  • HBase Lookup processor
  • Hive Metadata processor
  • Flume destination
  • Hadoop FS destination
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination
  • Kafka Producer destination
  • HDFS Metadata executor
  • Hive Query executor
  • MapReduce executor
streamsets-datacollector-influxdb-0_9-lib For InfluxDB version 0.9 or greater.

Includes the InfluxDB destination.

streamsets-datacollector-jdbc-lib

For JDBC access to databases.

Includes:
  • JDBC Multitable Consumer origin
  • JDBC Query Consumer origin
  • Oracle CDC Client origin
  • JDBC Lookup processor
  • JDBC Tee processor
  • JDBC Producer destination
  • JDBC Query executor
streamsets-datacollector-jms-lib For Java Messaging Services (JMS).

Includes the JMS Consumer origin.

streamsets-datacollector-jython_2_7-lib For Jython version 2.7.

Includes the Jython Evaluator processor.

streamsets-datacollector-mapr_5_0-lib For MapR version 5.0.

Includes the MapR FS destination.

streamsets-datacollector-mapr_5_1-lib For MapR version 5.1.

Includes:

  • MapR DB JSON origin
  • MapR FS origin for cluster mode pipelines
  • MapR Streams Consumer origin for standalone and cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination using the MapR library
  • MapR Streams Producer destination
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
streamsets-datacollector-mapr_5_2-lib For MapR version 5.2.

Includes:

  • MapR DB JSON origin
  • MapR FS origin for cluster mode pipelines
  • MapR Streams Consumer origin for standalone and cluster mode pipelines
  • HBase Lookup processor
  • Hive Metadata processor
  • Spark Evaluator processor
  • HBase destination
  • Hive Metastore destination
  • Hive Streaming destination using the MapR library
  • MapR Streams Producer destination
  • MapR DB destination
  • MapR DB JSON destination
  • MapR FS destination
streamsets-datacollector-mapr_spark_2_1_mep_3_0-lib For the MapR distribution of Spark 2.1.
Includes:
  • Spark Evaluator processor
  • Spark executor
streamsets-datacollector-mongodb_3-lib For MongoDB 3.0.

Includes the MongoDB and MongoDB Oplog origins, and the MongoDB destination.

streamsets-datacollector-mysql-binlog-lib For MySQL binary logs.

Includes the MySQL Binary Log origin.

streamsets-datacollector-omniture-lib For Omniture.

Includes the Omniture origin.

streamsets-datacollector-rabbitmq-lib For RabbitMQ version 3.5.6.

Includes the RabbitMQ Consumer origin and RabbitMQ Producer destination.

streamsets-datacollector-redis-lib For Redis versions 2.8 and 3.0.
Includes:
  • Redis Consumer origin
  • Redis Lookup processor
  • Redis destination
streamsets-datacollector-salesforce-lib

For Salesforce.

Includes:
  • Salesforce origin
  • Salesforce Lookup processor
  • Salesforce destination
  • Wave Analytics destination
streamsets-datacollector-stats-lib Dataflow Performance Manager (DPM) requires that the statistics stage library be installed on each registered Data Collector.