Spark Streaming + Data Collector + Secure Kafka
When we first introduced cluster streaming mode with Apache Spark Streaming 1.3 and Apache Kafka 0.8 several years ago, Kafka didn’t support security features such as TLS (transport encryption, authentication) and Kerberos (authentication). In Spark 2.1, an updated Kafka connector was introduced with support for these features when used with Kafka 0.10 or newer.
In Data Collector 220.127.116.11 (est. mid to late May 2018) we’ll be introducing support for these features! However, this also means that we’ll be deprecating (in 18.104.22.168) and removing (in 22.214.171.124) support for Spark 1.x. If you want to continue using cluster streaming execution mode you’ll need to have Spark 2.x available.
Currently all major Hadoop distribution vendors provide a means for Spark 1.x and Spark 2.x to coexist on the same cluster in case you haven’t already made the move to Spark 2.x. You can find details on Spark 2 from vendors for each supported distribution below.
We’re always working to provide support for features our users need. The need for Kafka + TLS/Kerberos in cluster execution mode was heard loud and clear. Let us know what you’d like to see in the future by sending your ideas to email@example.com!
Cloudera distribution of Spark 2.1 release 1 will be the earliest supported.
Spark 2.x for Cloudera CDH
Hortonworks Data Platform (HDP) since 2.6 ships with Spark 2.2.0.
Hortonworks HDP 2.6 release notes
MapR provides Spark 2.x in the MapR Ecosystem Pack (MEP) 3.0 and newer.
MapR with MapR Ecosystem Pack (MEP) 3.0 and newer