skip to Main Content

The DataOps Blog

Where Change Is Welcome

StreamSets Transformer: Natural Language Processing in PySpark


In two of my previous blogs I illustrated how easily you can extend StreamSets Transformer using Scala: 1) to train Spark ML RandomForestRegressor model, and 2) to serialize the trained model and save it to Amazon S3. In this blog, you will learn a way to train a Spark ML Logistic Regression model for Natural Language Processing (NLP) using PySpark…

By December 12, 2019

Announcing StreamSets Data Collector 3.12.0 and StreamSets Data Collector Edge 3.12.0

Engineering, StreamSets News

StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.12.0 and StreamSets Data Collector Edge 3.12.0. StreamSets Data Collector is open source under Apache License 2.0 and a powerful design and execution engine. It enables moving data between any source and destination, performing transformations, and push down analytics along the way. To download, click here. StreamSets Data…

By December 9, 2019

StreamSets Transformer:
Design Patterns For Slowly Changing Dimensions


In this blog, we will look at a few design patterns for Slowly Changing Dimensions (SCD) Type 2 and see how StreamSets Transformer, the newest addition to the StreamSets DataOps Platform, makes it easy to implement them. While relatively static data like locations and addresses of entities, such as customers, change rarely (if at all) over time, in most cases…

By November 19, 2019
Back To Top