skip to Main Content

The DataOps Blog

Where Change Is Welcome

Announcing Data Collector ver

By Posted in Engineering September 1, 2016

It’s been a busy summer here at StreamSets, we’ve been enabling some exciting use-cases for our customers, partners and the community of open-source users all over the world. We are excited to announce the newest version of the StreamSets Data Collector.

This version has a host of new features and over 100 bug fixes.

Download it now.

New Features

  • Whole file transfer – You can now use the Data Collector to bring any type of binary data into your data lake in Hadoop, or move files between on-prem or cloud systems.
  • High throughput writes to the S3 destination – While writing files to multiple partitions in S3, you can achieve linear scaling based on the number of threads allocated in the threadpool.
  • Enterprise security in the MongoDB origin and destination including SSL and login credentials.
  • Enterprise security in the Solr destination including Kerberos authentication.
  • Simplified getting started for the MapR integration – Run a simple command to automatically configure MapR binaries with the Data Collector and get up and running in seconds.
  • Support for our MapR integrations with the powerful Data Collector feature – Automatic updates to Hive/Impala schemas based on changing data.
  • HTTP Client/Lookup processor can now add response headers to the data record.
  • Field Converter processor (now called Field Type Converter) can now convert fields en-masse by field name or by data type.
  • New List Pivoter processor that pivots List datatypes.
  • New JDBC Lookup processor that performs in-stream lookups/enrichment from relational databases.
  • New JDBC Tee processor that writes data to relational databases and reads back additional columns to enrich the record in-stream.
  • Reading from JDBC sources no longer require WHERE or ORDER BY clauses.
  • The HTTP origin now supports reading data from paginated webpages, one-shot batch transfers, and support for reading compressed and archive files.
  • Smaller installer packages – We previously introduced the concept of a small sized Core tarball file that lets you install individual stages manually. We’ve extended this concept now to our RPM packages – you can now install the smaller Core RPM Data Collector package, and individual stages manually.
  • Updates to the Kafka Consumer to generate a record per message (datagram) for collectd, netflow and syslog data.
  • Updated versions on the following integrations: Apache Kafka 0.10, Cassandra 3.x, Cloudera CDH 5.8, Elasticsearch 2.3.5.
  • New EL’s to support to trim time portions of Date/Time fields.

Download the Data Collector to get started now. Visit Documentation for more details.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy