Dataflow Performance Blog

Getting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets, Arcadia Data and Centrify)

ODM Windows PipelineThis post was originally published on the Cloudera VISION blog by Sam Heywood.   StreamSets configurations and images of Apache Spot Open Data Model ingest pipelines can be found here on Github.

A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer is to adopt a machine learning and analytics platform as a fundamental and durable part of their data strategy. However, many CISOs fear deployment of an initial use case will be somewhat daunting. Cloudera has partnered along with Arcadia Data and StreamSets to make it easier than ever for CISOs to take the first step and deploy basic use cases leveraging data sources common to many environments.

The first use cases are focused on the basics of ingesting data, transforming that data and storing in accordance with the Apache Spot Open Data Model (ODM), and then providing basic analytics and visualizations. The initial data sources are

  • Qualys KnowledgeBase
  • Qualys Vulnerability Scans
  • Windows Security Logs
  • Centrify Identity Platform Logs

Future use cases will add additional data sources as well as machine learning analytics.

Assuming a security team has access to an Apache Hadoop cluster running CDH 5.x or later with Impala, the instructions below will let them quickly deploy their first cyber use cases.

Step One – Configuring Apache Spot ODM in HDFS

The first step is to create the core tables that make up the Spot ODM. The main tables are

  • event
  • vulnerability_context
  • user_context
  • endpoint_context
  • threat_intelligence_context
  • network_context

Full instructions and scripts for creating the tables are contained in https://github.com/apache/incubator-spot/blob/SPOT-181_ODM/spot-setup/odm/README.md

Step Two – Installing StreamSets

The second step is to install StreamSets Data Collector (SDC). StreamSets Data Collector is a lightweight, powerful engine that streams data in real time. SDC will be used to route and transform data into the Spot ODM.

Instructions for installing StreamSets Data Collector via Cloudera Manager are available in the StreamSets Data Collector User Guide

Step Three – Configure StreamSets Data Collector Pipelines

A StreamSets Data Collector Pipeline describes the flow of data from the origin system to destination systems and defines how to transform the data along the way. Pipelines for four new data sources have been added to the Spot project

  • Qualys KnowledgeBase
  • Qualys Vulnerability Scans
  • Windows Security Logs
  • Centrify Identity Platform Logs

The StreamSets Data Collector User Guide contains instructions for importing the pipelines from Spot into SDC. The pipeline configuration files are

Streamsets pipeline

Step Four – Install Arcadia Instant

Arcadia Instant is a free downloadable tool for big data visual analytics.

Arcadia Instant is available for Windows or Mac and downloads directly from Arcadia Data, and helps users quickly gain insight into their data. Users can explore and visualize your data with interactive visuals, dashboards, and apps. Instructions for installing Arcadia Instant are on the Arcadia Data Knowledge Base

Step Five – Configure Spot Application in Arcadia Data

Arcadia Data allows developers to build custom applications made up of different visualizations, workflows and dashboard. A new Arcadia Data application, spot_app.json has been added to the Spot repo. Instructions for configuring Arcadia Instant to use the application are contained in the associated Readme.md file

Step Six – Launch the Spot Application

The final step is to launch the Spot App from within Arcadia Instant. The Spot App can be added to your App Listing by going to the “Visuals” tab and clicking on “Modify App Menu”. Select the “Spot App” and click “Save”.

Launching the Spot ApplicationWhen you see the Spot App on the Visuals page, you can launch it by selecting it and then clicking “Launch App”.

Launching the Spot Application 2Now that the Spot App is launched, you can begin to view and explore your security events within the User Activity Summary, Endpoint Activity Summary, and Vulnerabilities dashboards provided.User Activity Summary dashboardEndpoint Activity Summary,Vulnerabilities dashboard

Rick BilodeauGetting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets, Arcadia Data and Centrify)