StreamSets Live: Demos with Dash — Your Questions Answered

By Dash Desai Posted in Data Integration April 2, 2020

UPDATED June 22, 2021

Here’s the recording of the special edition introducing StreamSets Summer ’21 from June 3, 2021.

On March 25, 2020 I hosted my first StreamSets Live: Demos with Dash session and I want to thank everyone that took the time to join despite what might have been going on in your lives. Times are definitely not “normal”, and more challenging for some than others, but I hope everyone is continuing to stay safe and well.

During the live session I showed three different demos covering variety of use cases using the StreamSets DataOps Platform.

Demos Summary

Streaming Twitter data to Kafka, performing sentiment analysis on tweets using Microsoft’s Cognitive Service API and storing the scores in MySQL
Parsing & transforming web application logs from Amazon S3 and storing the enriched logs in Snowflake cloud data warehouse for analysis–including location and HTTP response code based queries
Running ML pipelines on Apache Spark on Databricks cluster and storing the output on Databricks File System (DBFS)

Your Questions Answered

Below I have answered some of the burning questions that were asked during the live demo session.

Q: “Can pipelines only be made via the GUI? or can we create pipelines in code?”
Answer: Pipelines can be created using the GUI, REST APIs and Python SDK. For available REST API endpoints, browse to http(s)://StreamSetsDataCollector_HOST:StreamSetsDataCollector_PORT/collector/restapi where your Data Collector instance is running.

Q: “The connectors list is based on jdbc available drivers?”
Answer: There are several JDBC-based origins, processors, and destinations available as well as many others that don’t rely on JDBC drivers. For example, Kafka, Amazon S3, Azure Data Lake, Google Pub Sub, HTTP Client, Redis, Elasticsearch, UDP, WebSocket, Azure IoT Hub, etc. Refer to our documentation for a comprehensive list of origins, processors, and destinations.

Q: “How on-prem data can be transferred to for example GCP? publish to pub sub”
Answer: On-prem data can be transferred using available origins to Google Cloud Storage, Google BigQuery and published to Google Pub Sub.

Q: “Is there documentation how to add the PySpark processor you mentioned?”
Answer: You don’t need to add or download it. It comes bundled with StreamSets Transformer.

Design Your First Pipeline

To take advantage of these features, visit our getting started page to design, deploy, and operate smart data pipelines.

Thank You

I hope to see you and your fellow data engineers, data scientists, and developers during my next StreamSets Live: Demos with Dash session where I will be showing a different set of demos!

Related Resources

Webinar

Integration Roadmap: Navigating the Future of iPaaS with webMethods and StreamSets

Get introduced to the newest capabilities of webMethods.io and StreamSets. Plus get a sneak peek into Software AG’s vision for the iPaaS...

Watch Now

Whitepapers & Ebooks

The Data Integration Advantage: Building a Foundation for Scalable AI

Explore the state of AI in the enterprise including challenges of scaling and optimizing data flows.

Download Now

Report

Creating Order from Chaos: Governance in the Data Wild West

UPDATED June 22, 2021 Here’s the recording of the special edition introducing StreamSets Summer ’21 from June 3, 2021. On March...

Download Now

StreamSets Live: Demos with Dash — Your Questions Answered

Demos Summary

Your Questions Answered

Design Your First Pipeline

Thank You

Topics

Authors

Quick Links

Conduct Data Ingestion and Transformations In One Place