Join Us at the First Annual DataOps Summit

Scaling Data Collectors on Azure Kubernetes Service

In this blog post, I will present a step-by-step guide on how to scale Data Collector instances on Azure Kubernetes Service (AKS) using provisioning agents—which help automate upgrading and scaling resources on-demand, without having to stop execution of pipeline jobs. AKS removes the complexity of implementing, installing, and maintaining Kubernetes in Azure and you only […]

Execute Machine Learning Jobs in Microsoft Azure Databricks from StreamSets

In my previous blog post, I demonstrated how to achieve low-latency inference using Databricks ML models in StreamSets. Now let’s say you have a dataflow pipeline that is ingesting data, enriching it, performing transformations, and based on certain condition(s), you’d like to (re)train the Databricks ML model. For instance, using different value for hyperparameter n_estimators […]

Ingesting Data from Relational Databases to Cassandra with StreamSets

This post is summarized content from a full tutorial at https://academy.datastax.com/content/ingesting-data-relational-databases-cassandra-streamsets                 How do you ingest from an existing relational database (RDBMS) to an Apache Cassandra or DataStax Enterprise cluster? What about a one-time batch loading of historical data vs. streaming changes? I know what some of you are […]

How to Bulk Load Amazon RedShift from Relational Databases with StreamSets

Overview You have options when bulk loading data into RedShift from relational database (RDBMS) sources.  These options include manual processes or using one of the numerous hosted as-a-service options. But, if you have broader requirements than simply importing, you need another option.  Your company may have requirements such as adhering to enterprise security policies which […]

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!

Pin It on Pinterest