StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, is the newest addition to the StreamSets DataOps Platform. StreamSets enables next-generation ETL through the StreamSets Transformer tool. The product provides enterprises with the flexibility to create ETL pipelines for both batch and streaming data as well as clear visibility […]
Today we are opening the StreamSets Cloud Beta program, inviting you to experience and give feedback on the latest addition to the StreamSets product family. StreamSets Cloud is a cloud service for designing, deploying and operating smart data pipelines, combining the ease and scalability of the cloud with the flexibility to execute pipelines anywhere – […]
Overview You have options when bulk loading data into RedShift from relational database (RDBMS) sources. These options include manual processes or using one of the numerous hosted as-a-service options. But, if you have broader requirements than simply importing, you need another option. Your company may have requirements such as adhering to enterprise security policies which […]
Angel Alvarado is a senior software engineer at One Degree, a San Francisco-based non-profit, and also helps run the Molanco data engineering community. In his spare time, Angel enjoys playing Minecraft with his 11 year-old-cousin. Recently, Angel, found a fun way to combine his gaming with data engineering. This blog entry, reposted from the original with Angel’s […]
A couple of weeks ago, as May the 4th approached, a lively Star Wars debate brewed at StreamSets: “Do new school characters get as much play as old favorites like Darth Vader, Yoda and Han Solo?” “Does the Dark Side of the Force dominate the Light?” “Does Yoda prevail over Darth Vader?” It occurred to us […]
A key differentiator of StreamSets Data Collector (SDC) is that it operates in continuous mode – set a pipeline running and it will continue to read files from a directory or take messages from a queue. A Twitter conversation with Richard Tuttle, a solution architect at CRM Science, prompted me to wonder, would it be possible to ingest […]
Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of infrastructure, semantics and schema that leads to corrupted data and broken pipelines. Download Open Source StreamSets Data Collector at www.streamsets.com/opensource.