Mini MapR Academy: How the ACT Government Uses Data Collector w/ MapR (videos)

By Pat Patterson Posted in Data Integration April 23, 2018

Selvaraaju (‘Selva’) Murugesan is Senior Manager for Innovation and Data Analytics in the Australian Capital Territory (ACT) Government. Selva focuses on data management practices and data analytics, using StreamSets Data Collector to extract data from different databases, perform data cleansing on the fly and push data to the ACT Government’s Open Data Portal. Over the past few months, Selva has assembled a short playlist of videos demonstrating various aspects of Data Collector. From the basics of installation to advanced topics such as configuring impersonation for MapR-FS, Selva’s mini MapR Academy via video provides a great introduction to Data Collector. We’re excited to feature them in this blog post!

Get Started with the Mini MapR Academy

Installing StreamSets with MapR

In this first video, Selva installs Data Collector on Red Hat Enterprise Linux 7 via the full RPM package, configures Data Collector to work with MapR, and sets up an admin user.

Documentation

Installing MapR Libraries for StreamSets Data Collector

Selva installs the necessary libraries for Data Collector to integrate with MapR 6.0.0.

Documentation

MapR Prerequisites

Configuring Impersonation for MapR-FS

By default, Data Collector will write to MapR-FS as the currently logged in Data Collector user, however, it is possible to configure MapR-FS impersonation so that data is written as the user configured in the MapR-FS destination settings.

Documentation

Hadoop Impersonation Mode

Reading and Writing Data to the Local File System

Selva creates a simple pipeline to read CSV data from a local file, remove most of the fields, and writes it back to another local file.

Documentation

Masking Fields in the Pipeline

Data engineers often need to mask sensitive data when moving it between systems. Here, Selva shows how to use Data Collector’s Field Masker processor.

Documentation

Field Masker Processor

Ingesting Data from a Web Service

In what is currently the last video in the series, Selva shows how Data Collector can read CSV data from a web service and write it to a local file.

Documentation

HTTP Client Origin

Conclusion

Many thanks to Selva for his permission to share these videos in our mini MapR Academy!

Have something to share yourself? Join us in our Community!

Related Resources

Webinar

Integration Roadmap: Navigating the Future of iPaaS with webMethods and StreamSets

Get introduced to the newest capabilities of webMethods.io and StreamSets. Plus get a sneak peek into Software AG’s vision for the iPaaS...

Watch Now

Whitepapers & Ebooks

The Data Integration Advantage: Building a Foundation for Scalable AI

Explore the state of AI in the enterprise including challenges of scaling and optimizing data flows.

Download Now

Report

Creating Order from Chaos: Governance in the Data Wild West

Selvaraaju (‘Selva’) Murugesan is Senior Manager for Innovation and Data Analytics in the Australian Capital Territory (ACT)...

Download Now

Mini MapR Academy: How the ACT Government Uses Data Collector w/ MapR (videos)

Get Started with the Mini MapR Academy

Installing StreamSets with MapR

Documentation

Installing MapR Libraries for StreamSets Data Collector

Documentation

Configuring Impersonation for MapR-FS

Documentation

Reading and Writing Data to the Local File System

Documentation

Masking Fields in the Pipeline

Documentation

Ingesting Data from a Web Service

Documentation

Conclusion

Topics

Authors

Quick Links

Conduct Data Ingestion and Transformations In One Place

Related Resources

Webinar

Integration Roadmap: Navigating the Future of iPaaS with webMethods and StreamSets

Whitepapers & Ebooks

The Data Integration Advantage: Building a Foundation for Scalable AI

Report

Creating Order from Chaos: Governance in the Data Wild West

Stay in Touch

Connect