Data Collector User Guide

Content
Search Results
Index
Loading, please wait ...

Loading

  • Getting Started
    • What is StreamSets Data Collector?
      • How should I use Data Collector?
      • How does this really work?
    • Logging In and Creating a Pipeline
    • Data Collector User Interface
      • Configuring the Display
    • Data Collector UI - Pipelines on the Home Page
  • What's New
    • What's New in 3.1.0.0
    • What's New in 3.0.3.0
    • What's New in 3.0.2.0
    • What's New in 3.0.1.0
    • What's New in 3.0.0.0
    • What's New in 2.7.2.0
    • What's New in 2.7.1.1
    • What's New in 2.7.1.0
    • What's New in 2.7.0.0
    • What's New in 2.6.0.1
    • What's New in 2.6.0.0
    • What's New in 2.5.1.0
    • What's New in 2.5.0.0
    • What's New in 2.4.1.0
    • What's New in 2.4.0.0
    • What's New in 2.3.0.1
    • What's New in 2.3.0.0
    • What's New in 2.2.1.0
    • What's New in 2.2.0.0
  • Installation
    • Installation
      • Installation Requirements
        • JCE for Oracle JVM
        • Configuring the Open File Limit
    • Full Installation and Launch (Manual Start)
    • Full Installation and Launch (Service Start)
      • Installing from the RPM Package
      • Installing from the Tarball for Systems Using SysV Init
      • Installing from the Tarball for Systems Using Systemd Init
    • Core Installation
      • Installing the Core RPM Package
      • Installing the Core Tarball
    • Install Additional Stage Libraries
      • Installing for RPM
      • Installing for Tarball Using the Package Manager
      • Installing for Tarball Using the Command Line
      • Available Stage Libraries
      • Legacy Stage Libraries
    • Run Data Collector from Docker
    • Installation with Cloudera Manager
      • Step 1. Install the StreamSets Custom Service Descriptor
      • Step 2. Manually Install the Parcel and Checksum Files (Optional)
      • Step 3. Distribute and Activate the StreamSets Parcel
      • Step 4. Configure the StreamSets Service
      • Configuring Data Collector with Cloudera Manager
    • MapR Prerequisites
      • Supported Versions
      • Step 1. Install Client Libraries
      • Step 2. Run the Command to Set Up MapR
        • Running the Command in Interactive Mode
        • Running the Command in Non-Interactive Mode
      • Step 3. Connect to a MapR Cluster Secured with Built-in Security
      • Step 4. Run Data Collector as a MapR Ticket User
    • Creating Another Data Collector Instance
    • Uninstallation
      • Uninstalling the Tarball (Manual Start)
      • Uninstalling the Tarball (Service Start)
      • Uninstalling the RPM Package
      • Uninstalling from Cloudera Manager
  • Configuration
    • User Authentication
      • Configuring LDAP Authentication
        • Step 1. Configure LDAP Connection Information
        • Example for OpenLDAP
        • Example for Active Directory
        • Step 2. Configure Secure Connections to LDAP (Optional)
        • Step 3. Map LDAP Groups to Data Collector Roles
        • Step 4. Configure Multiple LDAP Servers (Optional)
        • Step 5. Enable LDAP Authentication for MapR Stages
      • Configuring File-Based Authentication
        • Step 1. Configure Authentication Properties
        • Step 2. Configure Users, Groups, and Roles
    • Roles and Permissions
      • Roles
      • Pipeline Permissions
      • Roles and Permissions for Common Tasks
      • Transfer Pipeline Permissions
        • Transferring Permissions
    • Data Collector Configuration
      • Kerberos Authentication
        • Enabling Kerberos for RPM and Tarball
        • Enabling Kerberos with Cloudera Manager
      • Sending Email
      • Referencing Sensitive Values in Files
      • Referencing Environment Variables
      • Running Multiple Concurrent Pipelines
      • HTTP Protocols
        • Configuring HTTPS for Standalone Pipelines
        • Configuring HTTPS for Cluster Pipelines
      • Hadoop Impersonation Mode
        • Lowercasing User Names
        • Working with HDFS Encryption Zones
      • Blacklist and Whitelist for Stage Libraries
      • Configuring Data Collector
    • Data Collector Environment Configuration
      • Modifying Environment Variables
      • Data Collector Directories
      • User and Group for Service Start
      • Java Configuration Options
        • Java Heap Size
        • Remote Debugging
        • Garbage Collector
      • Java Security Manager
      • Root Classloader
      • Heap Dump Creation
    • Install External Libraries
      • Install Using the Package Manager
        • Step 1. Set Up an External Directory
        • Step 2. Install External Libraries
      • Install Manually
        • Installing Manually for RPM and Tarball
        • Installing Manually for Cloudera Manager
    • Custom Stage Libraries
      • Storing Custom Libraries for RPM and Tarball
      • Storing Custom Libraries for Cloudera Manager
    • Credential Stores
      • Group Access to Credentials
      • CyberArk Credential Store
        • Step 1. Install the Credential Store Stage Library
        • Step 2. Configure the Credential Store Properties
        • Step 3. Call the Credentials from the Pipeline
      • Java Keystore Credential Store
        • Step 1. Install the Credential Store Stage Library
        • Step 2. Configure the Credential Store Properties
        • Step 3. Add Credentials to the Credential Store
        • Step 4. Call the Credentials from the Pipeline
        • jks-cs Command
      • Vault Credential Store
        • Step 1. Install the Credential Store Stage Library
        • Step 2. Configure the Credential Store Properties
        • Step 3. Call the Credentials from the Pipeline
    • Accessing Vault Secrets with Vault Functions (Deprecated)
      • Step 1. Configure Vault Properties
      • Step 2. Call Vault from the Pipeline
    • Publishing Metadata to Cloudera Navigator
      • Prerequisites
      • Viewing Published Metadata
      • Supported Stages
      • Configuring Data Collector to Publish Metadata
    • Enabling External JMX Tools
      • Viewing JMX Metrics in External Tools
      • Custom Metrics
  • Upgrade
    • Upgrade
    • Pre Upgrade Tasks
      • Verify Installation Requirements
      • Migrate to Java 8
      • Upgrade Cluster Streaming Pipelines
    • Upgrade an Installation from the Tarball
      • Step 1. Shut Down the Previous Version
      • Step 2. Back Up the Previous Version
      • Step 3. Install the New Version
        • Installing from the Tarball (Manual Start)
        • Installing from the Tarball for Systems Using SysV (Service Start)
        • Installing from the Tarball for Systems Using Systemd (Service Start)
      • Step 4. Update Environment Variables
      • Step 5. Update the Configuration Files
      • Step 6. Install Additional Libraries for the Core Installation
      • Step 7. Start the New Version of Data Collector
    • Upgrade an Installation from the RPM Package
      • Step 1. Shut Down the Previous Version
      • Step 2. Back Up the Previous Version
      • Step 3. Install the New Version
      • Step 4. Update Environment Variables
      • Step 5. Update the Configuration Files
      • Step 6. Install Additional Libraries for the Core Installation
      • Step 7. Uninstall Previous Libraries
      • Step 8. Start the New Version of Data Collector
    • Upgrade an Installation with Cloudera Manager
      • Step 1. Stop All Pipelines
      • Step 2. Back Up the Previous Version
      • Step 3. Install the StreamSets Custom Service Descriptor
      • Step 4. Manually Install the Parcel and Checksum Files (Optional)
      • Step 5. Distribute and Activate the New StreamSets Parcel
      • Step 6. Verify Modified Safety Valves
      • Step 7. Restart the StreamSets Service
    • Post Upgrade Tasks
      • Update Value Replacer Pipelines
      • Update Einstein Analytics Pipelines
      • Update Control Hub On-premises
      • Update Pipelines using Legacy Stage Libraries
      • Disable Cloudera Navigator Integration
      • JDBC Multitable Consumer Query Interval Change
      • Update JDBC Query Consumer Pipelines used for SQL Server CDC Data
      • Update MongoDB Destination Upsert Pipelines
      • Time Zones in Stages
      • Update Kudu Pipelines
      • Update JDBC Multitable Consumer Pipelines
      • Update Vault Pipelines
      • Configure JDBC Producer Schema Names
      • Evaluate Precondition Error Handling
      • Authentication for Docker Image
      • Configure Pipeline Permissions
      • Update Elasticsearch Pipelines
    • Working with Upgraded External Systems
      • Working with Kafka 0.11 or Later
      • Working with Cloudera CDH 5.11 or Later
      • Working with an Upgraded MapR System
    • Troubleshooting an Upgrade
  • Pipeline Concepts and Design
    • What is a Pipeline?
    • Data in Motion
      • Single and Multithreaded Pipelines
      • Delivery Guarantee
      • Data Collector Data Types
    • Designing the Data Flow
      • Branching Streams
      • Merging Streams
    • Dropping Unwanted Records
      • Required Fields
      • Preconditions
    • Error Record Handling
      • Pipeline Error Record Handling
      • Stage Error Record Handling
      • Example
      • Error Records and Version
    • Record Header Attributes
      • Working with Header Attributes
        • Internal Attributes
      • Header Attribute-Generating Stages
      • Record Header Attributes for Record-Based Writes
        • Generating Attributes for Record-Based Writes
      • Viewing Attributes in Data Preview
    • Field Attributes
      • Working with Field Attributes
      • Field Attribute-Generating Stages
      • Viewing Field Attributes in Data Preview
    • Processing Changed Data
      • CRUD Operation Header Attribute
        • Earlier Implementations
      • CDC-Enabled Origins
      • CRUD-Enabled Stages
      • Processing the Record
      • Use Cases
    • Control Character Removal
    • Development Stages
  • Pipeline Configuration
    • Data Collector UI - Edit Mode
    • Retrying the Pipeline
    • Pipeline Memory
    • Rate Limit
    • Simple and Bulk Edit Mode
    • Runtime Values
      • Using Runtime Parameters
        • Step 1. Define Runtime Parameters
        • Step 2. Call the Runtime Parameter
        • Step 3. Start the Pipeline with Parameters
        • Viewing Runtime Parameters
      • Using Runtime Properties
        • Step 1. Define Runtime Properties
        • Step 2. Call the Runtime Property
      • Using Runtime Resources
        • Step 1. Define Runtime Resources
        • Step 2. Call the Runtime Resource
    • Event Generation
      • Pipeline Event Records
    • Webhooks
      • Request Method
      • Payload and Parameters
      • Examples
    • Notifications
    • SSL/TLS Configuration
      • Keystore and Truststore Configuration
      • Transport Protocols
      • Cipher Suites
    • Implicit and Explicit Validation
    • Expression Configuration
      • Basic Syntax
      • Using Field Names in Expressions
        • Field Names with Special Characters
      • Referencing Field Names and Field Paths
        • Wildcard Use for Arrays and Maps
      • Field Path Expressions
        • Supported Stages
        • Field Path Expression Syntax
      • Expression Completion in Properties
        • Tips for Expression Completion
      • Data Type Coercion
    • Configuring a Pipeline
  • Data Formats
    • Data Formats Overview
    • Delimited Data Root Field Type
    • Log Data Format
    • NetFlow Data Processing
      • Caching NetFlow 9 Templates
      • NetFlow 5 Generated Records
      • NetFlow 9 Generated Records
    • Protobuf Data Format Prerequisites
    • SDC Record Data Format
    • Text Data Format with Custom Delimiters
      • Processing XML Data with Custom Delimiters
    • Whole File Data Format
      • Basic Pipeline
      • Whole File Records
      • Additional Processors
      • Defining the Transfer Rate
      • Writing Whole Files
        • Access Permissions
        • Including Checksums in Events
    • Reading and Processing XML Data
      • Creating Multiple Records with an XML Element
        • Using XML Elements with Namespaces
      • Creating Multiple Records with an XPath Expression
        • Using XPath Expressions with Namespaces
        • Simplified XPath Syntax
        • Sample XPath Expressions
        • Predicates in XPath Expressions
        • Predicate Examples
      • Including Field XPaths and Namespaces
      • XML Attributes and Namespace Declarations
      • Parsed XML
    • Writing XML Data
      • Record Structure Requirement
  • Origins
    • Origins
      • Comparing HTTP Origins
      • Comparing MapR Origins
      • Comparing UDP Source Origins
      • Comparing WebSocket Origins
      • Batch Size and Wait Time
      • Maximum Record Size
      • File Compression Formats
      • Previewing Raw Source Data
    • Amazon S3
      • AWS Credentials
      • Common Prefix, Prefix Pattern, and Wildcards
      • Record Header Attributes
        • Object Metadata in Record Header Attributes
      • Read Order
      • Buffer Limit and Error Handling
      • Server Side Encryption
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring an Amazon S3 Origin
    • Amazon SQS Consumer
      • AWS Credentials
      • Queue Name Prefix
      • Multithreaded Processing
      • Including SQS Message Attributes
        • Including Sender Attributes
      • Data Formats
      • Configuring an Amazon SQS Consumer
    • Azure IoT/Event Hub Consumer
      • Storage Account and Container Prerequisite
      • Resetting the Origin in Event Hub
      • Multithreaded Processing
      • Data Formats
      • Configuring an Azure IoT/Event Hub Consumer
    • CoAP Server
      • Prerequisites
      • Multithreaded Processing
      • Network Configuration Properties
      • Data Formats
      • Configuring a CoAP Server Origin
    • Directory
      • File Name Pattern and Mode
      • Read Order
      • Multithreaded Processing
      • Reading from Subdirectories
        • Post-Processing Subdirectories
      • First File for Processing
      • Late Directory
      • Record Header Attributes
      • Event Generation
        • Event Records
      • Buffer Limit and Error Handling
      • Data Formats
      • Configuring a Directory Origin
    • Elasticsearch
      • Batch and Incremental Mode
      • Query
        • Incremental Mode Query
      • Scroll Timeout
      • Multithreaded Processing
      • Configuring an Elasticsearch Origin
    • File Tail
      • File Processing and Archive File Names
      • Multiple Paths and File Sets
      • First File for Processing
      • Late Directories
      • Files Matching a Pattern - Pattern Constant
      • Record Header Attributes
        • Defining and Using a Tag
      • Multiple Line Processing
      • File Tail Output
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring a File Tail Origin
    • Google BigQuery
      • Credentials
        • Default Credentials Provider
        • Service Account Credentials File (JSON)
      • BigQuery Data Types
      • Event Generation
        • Event Record
      • Configuring a Google BigQuery Origin
    • Google Cloud Storage
      • Credentials
        • Default Credentials Provider
        • Service Account Credentials (JSON)
      • Common Prefix, Prefix Pattern, and Wildcards
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring a Google Cloud Storage Origin
    • Google Pub/Sub Subscriber
      • Credentials
        • Default Credentials Provider
        • Service Account Credentials File (JSON)
      • Multithreaded Processing
      • Record Header Attributes
      • Data Formats
      • Configuring a Google Pub/Sub Subscriber Origin
    • Hadoop FS
      • Reading from Other File Systems
      • Kerberos Authentication
      • Using a Hadoop User
      • Hadoop Properties and Configuration Files
      • Data Formats
      • Configuring a Hadoop FS Origin
    • HTTP Client
      • Processing Mode
      • Pagination
        • Result Field Path
        • Keep All Fields
      • HTTP Method
      • OAuth 2 Authorization
        • Example for Twitter
        • Example for Microsoft Azure AD
        • Example for Google
      • Data Formats
      • Response Header Fields in Header Attributes
      • Configuring an HTTP Client Origin
    • HTTP Server
      • Prerequisites
        • Send Data to the Listening Port
        • Include the Application ID in Requests
      • Multithreaded Processing
      • Data Formats
      • Record Header Attributes
      • Configuring an HTTP Server Origin
    • HTTP to Kafka
      • Prerequisites
      • Pipeline Configuration
      • Kafka Maximum Message Size
      • Enabling Kafka Security
        • Enabling SSL/TLS
        • Enabling Kerberos (SASL)
        • Enabling SSL/TLS and Kerberos
      • Configuring an HTTP to Kafka Origin
    • JDBC Multitable Consumer
      • Installing the JDBC Driver
        • Working with a MySQL JDBC Driver
      • Table Configuration
        • Table Name Pattern
        • Offset Column and Value
      • Reading from Views
      • Multithreaded Processing Modes
      • Multithreaded Table Processing
      • Multithreaded Partition Processing
        • Partition Processing Requirements
        • Multiple Offset Value Handling
        • Best Effort: Processing Non-Compliant Tables
      • Non-Incremental Processing
      • Batch Strategy
        • Process All Available Rows
        • Switch Tables
      • Initial Table Order Strategy
      • Understanding the Processing Queue
        • Multiple Tables, No Partition Processing
        • Multiple Partitions, No Table Processing
        • Both Partition and Table Processing
      • JDBC Header Attributes
      • Event Generation
        • Event Record
      • Configuring a JDBC Multitable Consumer
    • JDBC Query Consumer
      • Installing the JDBC Driver
      • Offset Column and Offset Value
      • Full and Incremental Mode
      • Recovery
      • SQL Query
        • SQL Query for Incremental Mode
        • SQL Query for Full Mode
        • Stored Procedures
      • JDBC Record Header Attributes
        • Header Attributes with the Drift Synchronization Solution
      • CDC for Microsoft SQL Server
        • CRUD Record Header Attribute
        • Group Rows by Transaction
      • Event Generation
        • Event Record
      • Configuring a JDBC Query Consumer
    • JMS Consumer
      • Installing JMS Drivers
      • Data Formats
      • Configuring a JMS Consumer Origin
    • Kafka Consumer
      • Initial and Subsequent Offsets
      • Processing Available Data
      • Additional Kafka Properties
      • Record Header Attributes
      • Enabling Security
        • Enabling SSL/TLS
        • Enabling Kerberos (SASL)
        • Enabling SSL/TLS and Kerberos
      • Data Formats
      • Configuring a Kafka Consumer
    • Kafka Multitopic Consumer
      • Initial and Subsequent Offsets
      • Processing All Unread Data
      • Multithreaded Processing
      • Additional Kafka Properties
      • Record Header Attributes
      • Enabling Security
        • Enabling SSL/TLS
        • Enabling Kerberos (SASL)
        • Enabling SSL/TLS and Kerberos
      • Data Formats
      • Configuring a Kafka Multitopic Consumer
    • Kinesis Consumer
      • Multithreaded Processing
      • AWS Credentials
      • Read Interval
      • Lease Table Tags
      • Resetting the Origin
      • Data Formats
      • Configuring a Kinesis Consumer Origin
    • MapR DB CDC
      • Multithreaded Processing
      • Handling the _id Field
      • CRUD Operation and CDC Header Attributes
      • Additional Properties
      • Configuring a MapR DB CDC Origin
    • MapR DB JSON
      • Handling the _id Field
      • Configuring a MapR DB JSON Origin
    • MapR FS
      • Kerberos Authentication
      • Using a Hadoop User
      • Hadoop Properties and Configuration Files
      • Data Formats
      • Configuring a MapR FS Origin
    • MapR Multitopic Streams Consumer
      • Initial and Subsequent Offsets
      • Processing All Unread Data
      • Multithreaded Processing
      • Additional Properties
      • Record Header Attributes
      • Data Formats
      • Configuring a MapR Multitopic Streams Consumer
    • MapR Streams Consumer
      • Processing All Unread Data
      • Data Formats
      • Additional Properties
      • Record Header Attributes
      • Configuring a MapR Streams Consumer
    • MongoDB
      • Credentials
      • Offset Field and Initial Offset
      • Read Preference
      • Event Generation
        • Event Records
      • BSON Timestamp
      • Enabling SSL/TLS
      • Configuring a MongoDB Origin
    • MongoDB Oplog
      • Credentials
      • Oplog Timestamp and Ordinal
      • Read Preference
      • Generated Records
        • CRUD Operation and CDC Header Attributes
      • Enabling SSL/TLS
      • Configuring a MongoDB Oplog Origin
    • MQTT Subscriber
      • Topics
      • Record Header Attributes
      • Data Formats
      • Configuring an MQTT Subscriber Origin
    • MySQL Binary Log
      • Prerequisites
        • Configure MySQL Server for Row-based Logging
        • Install the JDBC Driver
      • Initial Offset
      • Generated Records
      • Processing Generated Records
      • Tables to Include or Ignore
      • Configuring a MySQL Binary Log Origin
    • Omniture
      • Configuring an Omniture Origin
    • OPC UA Client
      • Processing Mode
      • Providing NodeIds
      • Security
      • Configuring an OPC UA Client Origin
    • Oracle CDC Client
      • LogMiner Dictionary Source
      • Oracle CDC Client Prerequisites
        • Task 1. Enable LogMiner
        • Task 2. Enable Supplemental Logging
        • Task 3. Create a User Account
        • Task 4. Extract a Log Miner Dictionary (Redo Logs)
        • Task 5. Install the Driver
      • Schema, Table Name and Exclusion Patterns
      • Initial Change
      • Choosing Buffers
        • Local Buffer Resource Requirements
        • Uncommitted Transaction Handling
      • Include Nulls
      • Unsupported Data Types
        • Conditional Data Type Support
      • Generated Records
        • CRUD Operation Header Attributes
        • CDC Header Attributes
      • Event Generation
        • Event Records
      • Working with the Drift Synchronization Solution
      • Data Preview with Oracle CDC Client
      • Configuring an Oracle CDC Client
    • RabbitMQ Consumer
      • Queue
      • Record Header Attributes
      • Data Formats
      • Configuring a RabbitMQ Consumer
    • Redis Consumer
      • Channels and Patterns
      • Data Formats
      • Configuring a Redis Consumer
    • Salesforce
      • Query Existing Data
      • Using the SOAP and Bulk API
        • Example
      • Using the Bulk API with PK Chunking
        • Example
      • Repeat Query
      • Subscribing to Notifications
      • Processing PushTopic Events
        • PushTopic Event Record Format
      • Processing Platform Events
      • Reading Custom Objects or Fields
      • Processing Deleted Records
      • Salesforce Attributes
        • Salesforce Header Attributes
        • CRUD Operation Header Attribute
        • Salesforce Field Attributes
      • Event Generation
        • Event Record
      • Changing the API Version
      • Configuring a Salesforce Origin
    • SDC RPC
      • Configuring an SDC RPC Origin
    • SDC RPC to Kafka
      • Pipeline Configuration
      • Concurrent Requests
      • Batch Request Size, Kafka Message Size, and Kafka Configuration
      • Additional Kafka Properties
      • Enabling Kafka Security
        • Enabling SSL/TLS
        • Enabling Kerberos (SASL)
        • Enabling SSL/TLS and Kerberos
      • Configuring an SDC RPC to Kafka Origin
    • SFTP/FTP Client
      • Read Order
      • First File for Processing
      • Credentials
      • Record Header Attributes
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring an SFTP/FTP Client Origin
    • SQL Server CDC Client
      • Installing the JDBC Driver
      • Supported Operations
      • Multithreaded Processing
      • Batch Strategy
      • Table Configuration
      • Initial Table Order Strategy
      • Allow Late Table Processing
      • Checking for Schema Changes
      • Generated Record
        • Record Header Attributes
        • CRUD Operation Header Attributes
      • Event Generation
        • Event Record
      • Configuring a SQL Server CDC Origin
    • SQL Server Change Tracking
      • Permission Requirements
      • Installing the JDBC Driver
      • Multithreaded Processing
      • Batch Strategy
      • Table Configuration
      • Initial Table Order Strategy
      • Generated Record
        • Record Header Attributes
        • CRUD Operation Header Attributes
      • Event Generation
        • Event Record
      • Configuring a SQL Server Change Tracking Origin
    • TCP Server
      • Multithreaded Processing
      • Closing Connections for Invalid Data
      • Sending Acknowledgements
        • Using Expressions in Messages
      • TCP Modes
      • Data Formats
      • Configuring a TCP Server Origin
    • UDP Multithreaded Source
      • Processing Raw Data
      • Receiver and Worker Threads
      • Packet Queue
      • Multithreaded Pipelines
      • Metrics for Performance Tuning
      • Configuring a UDP Multithreaded Source
    • UDP Source
      • Processing Raw Data
      • Receiver Threads
      • Configuring a UDP Source
    • UDP to Kafka
      • Pipeline Configuration
      • Additional Kafka Properties
      • Enabling Kafka Security
        • Enabling SSL/TLS
        • Enabling Kerberos (SASL)
        • Enabling SSL/TLS and Kerberos
      • Configuring a UDP to Kafka Origin
    • WebSocket Client
      • Read REST Response Data from Data Collector
      • Data Formats
      • Configuring a WebSocket Client Origin
    • WebSocket Server
      • Prerequisites
        • Send Data to the Listening Port
        • Include the Application ID in Requests
      • Multithreaded Processing
      • Data Formats
      • Configuring a WebSocket Server Origin
    • Windows Event Log
      • Configuring a Windows Event Log Origin
  • Processors
    • Processors
    • Aggregator
      • Window Type, Time Windows, and Information Display
        • Rolling Windows
        • Sliding Windows
      • Calculation Components
      • Aggregate Functions
      • Event Generation
        • Event Record Root Field
        • Event Records
        • Sample Event Records
      • Monitoring Aggregations
      • Configuring an Aggregator
    • Base64 Field Decoder
      • Configuring a Base64 Field Decoder
    • Base64 Field Encoder
      • Configuring a Base64 Field Encoder
    • Data Parser
      • Data Formats
      • Configuring a Data Parser
    • Delay
      • Configuring a Delay Processor
    • Expression Evaluator
      • Output Fields and Attributes
      • Record Header Attribute Expressions
      • Field Attribute Expressions
      • Configuring an Expression Evaluator
    • Field Flattener
      • Flatten the Entire Record
      • Flatten Specific Fields
      • Configuring a Field Flattener
    • Field Hasher
      • Hash Methods
      • List, Map, and List-Map Fields
      • Configuring a Field Hasher
    • Field Masker
      • Mask Types
      • Configuring a Field Masker
    • Field Merger
      • Configuring a Field Merger
    • Field Order
      • Missing and Extra Fields
      • Configuring a Field Order Processor
    • Field Pivoter
      • Generated Records
      • Configuring a Field Pivoter
    • Field Remover
      • Configuring a Field Remover
    • Field Renamer
      • Renaming Sets of Fields
      • Configuring a Field Renamer
    • Field Replacer
      • Replacing Values with Nulls
      • Replacing Values with New Values
      • Data Types for Conditional Replacement
      • Configuring a Field Replacer
    • Field Splitter
      • Not Enough Splits
      • Too Many Splits
      • Example
      • Configuring a Field Splitter
    • Field Type Converter
      • Valid Type Conversions
      • Changing the Scale of Decimal Fields
      • Configuring a Field Type Converter
    • Field Zip
      • Merging List Data
      • Merging List-Map Data
      • Pivoting Merged Lists
      • Configuring a Field Zip Processor
    • Geo IP
      • Supported Databases
      • Database File Location
      • GeoIP Field Types
        • Full JSON Field Types
      • Configuring a Geo IP Processor
    • Groovy Evaluator
      • Processing Mode
      • Groovy Scripting Objects
      • Processing List-Map Data
      • Type Handling
      • Event Generation
      • Working with Record Header Attributes
        • Viewing Record Header Attributes
      • Accessing Whole File Format Records
      • Calling External Java Code
      • Granting Permissions on Groovy Scripts
      • Configuring a Groovy Evaluator
    • HBase Lookup
      • Lookup Key
      • Lookup Cache
      • Kerberos Authentication
      • Using an HBase User
      • HDFS Properties and Configuration File
      • Configuring an HBase Lookup
    • Hive Metadata
      • Output Streams
      • Metadata Records and Record Header Attributes
        • Custom Record Header Attributes
      • Database, Table, and Partition Expressions
        • Hive Names and Supported Characters
      • Decimal Field Expressions
      • Time Basis
      • Cache
        • Cache Size and Evictions
      • Kerberos Authentication
      • Hive Properties and Configuration Files
      • Configuring a Hive Metadata Processor
    • HTTP Client
      • HTTP Method
        • Expression Method
      • Parallel Requests
      • Response Header Fields
      • OAuth 2 Authorization
        • Example for Twitter
        • Example for Microsoft Azure AD
        • Example for Google
      • Data Formats
      • Configuring HTTP Client Processor
    • JavaScript Evaluator
      • Processing Mode
      • JavaScript Scripting Objects
      • Processing List-Map Data
      • Type Handling
      • Event Generation
      • Working with Record Header Attributes
        • Viewing Record Header Attributes
      • Accessing Whole File Format Records
      • Calling External Java Code
      • Configuring a JavaScript Evaluator
    • JDBC Lookup
      • Installing the JDBC Driver
      • Lookup Cache
        • Retry Lookups for Missing Values
      • Monitoring a JDBC Lookup
      • Configuring a JDBC Lookup
    • JDBC Tee
      • Installing the JDBC Driver
      • Define the CRUD Operation
      • Single and Multi-row Operations
      • Configuring a JDBC Tee
    • JSON Generator
      • Configuring a JSON Generator
    • JSON Parser
      • Configuring a JSON Parser
    • Jython Evaluator
      • Processing Mode
      • Jython Scripting Objects
      • Processing List-Map Data
      • Type Handling
      • Event Generation
      • Working with Record Header Attributes
        • Viewing Record Header Attributes
      • Accessing Whole File Format Records
      • Calling External Java Code
      • Configuring a Jython Evaluator
    • Kudu Lookup
      • Column Mappings
      • Kudu Data Types
      • Lookup Cache
        • Cache Table Information
        • Cache Lookup Values
      • Configuring a Kudu Lookup
    • Log Parser
      • Log Formats
      • Configuring a Log Parser
    • Postgres Metadata
      • Installing the JDBC Driver
      • Schema and Table Names
      • Decimal Precision and Scale Field Attributes
      • Caching Information
      • Configuring a Postgres Metadata Processor
    • Record Deduplicator
      • Comparison Window
      • Configuring a Record Deduplicator
    • Redis Lookup
      • Data Types
      • Lookup Cache
      • Configuring a Redis Lookup Processor
    • Salesforce Lookup
      • Lookup Mode
      • Lookup Cache
      • Salesforce Attributes
      • Changing the API Version
      • Configuring a Salesforce Lookup
    • Schema Generator
      • Using the avroSchema Header Attribute
      • Generated Avro Schema
      • Caching Schemas
      • Configuring a Schema Generator
    • Spark Evaluator
      • Spark Versions and Stage Libraries
      • Standalone Pipelines
      • Cluster Pipelines
      • Developing the Spark Application
      • Installing the Application
      • Configuring a Spark Evaluator
    • Static Lookup
      • Configuring a Static Lookup Processor
    • Stream Selector
      • Default Stream
      • Sample Conditions for Streams
      • Configuring the Stream Selector
    • Value Replacer (Deprecated)
      • Processing Order
      • Replacing Values with Nulls
      • Replacing Values with Constants
        • Data Types for Conditional Replacement
      • Configuring a Value Replacer
    • XML Flattener
      • Generated Records
      • Configuring an XML Flattener
    • XML Parser
      • Configuring an XML Parser
  • Destinations
    • Destinations
    • Aerospike
      • Configuring an Aerospike Destination
    • Amazon S3
      • AWS Credentials
      • Bucket
      • Partition Prefix
      • Time Basis and Data Time Zone for Time-Based Buckets and Partition Prefixes
      • Object Names
        • Whole File Names
      • Event Generation
        • Event Records
      • Server-Side Encryption
      • Data Formats
      • Configuring an Amazon S3 Destination
    • Azure Data Lake Store
      • Prerequisites
        • Step 1. Create a Data Collector Web Application
        • Step 2. Retrieve Information from Azure
        • Step 3. Grant Execute Permission
      • Directory Templates
      • Time Basis
      • Timeout to Close Idle Files
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring an Azure Data Lake Store Destination
    • Azure Event Hub Producer
      • Data Formats
      • Configuring an Azure Event Hub Producer
    • Azure IoT Hub Producer
      • Register Data Collector as an IoT Hub Device
      • Data Formats
      • Configuring an Azure IoT Hub Producer Destination
    • Cassandra
      • Authentication
        • Kerberos (DSE) Authentication
      • Cassandra Data Types
      • Configuring a Cassandra Destination
    • CoAP Client
      • Data Formats
      • Configuring a CoAP Client Destination
    • Elasticsearch
      • Time Basis and Time-Based Indexes
      • Document IDs
      • Define the CRUD Operation
      • Configuring an Elasticsearch Destination
    • Einstein Analytics
      • Changing the API Version
      • Define the Operation
        • Metadata JSON
      • Dataflow (Deprecated)
      • Configuring an Einstein Analytics Destination
    • Flume
      • Data Formats
      • Configuring a Flume Destination
    • Google BigQuery
      • BigQuery Data Types
      • Credentials
        • Default Credentials Provider
        • Service Account Credentials File (JSON)
      • Configuring a Google BigQuery Destination
    • Google Bigtable
      • Prerequisites
        • Install the BoringSSL Library
        • Configure the Google Application Default Credentials
      • Row Key
      • Cloud Bigtable Data Types
      • Column Family and Field Mappings
      • Time Basis
      • Configuring a Google Bigtable Destination
    • Google Cloud Storage
      • Credentials
        • Default Credentials Provider
        • Service Account Credentials (JSON)
      • Partition Prefix
      • Time Basis, Data Time Zone, and Time-Based Partition Prefixes
      • Object Names
        • Whole File Names
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring a Google Cloud Storage Destination
    • Google Pub/Sub Publisher
      • Credentials
        • Default Credentials Provider
        • Service Account Credentials File (JSON)
      • Data Formats
      • Configuring a Google Pub/Sub Publisher Destination
    • Hadoop FS
      • Directory Templates
      • Time Basis
      • Late Records and Late Record Handling
      • Timeout to Close Idle Files
      • Recovery
      • Data Formats
      • Writing to Azure HDInsight
      • Event Generation
        • Event Records
      • Kerberos Authentication
      • Using an HDFS User
      • HDFS Properties and Configuration Files
      • Configuring a Hadoop FS Destination
    • HBase
      • Field Mappings
      • Kerberos Authentication
      • Using an HBase User
      • Time Basis
      • HDFS Properties and Configuration File
      • Configuring an HBase Destination
    • Hive Metastore
      • Metadata Processing
      • Hive Table Generation
      • Cache
        • Cache Size and Evictions
      • Event Generation
        • Event Records
      • Kerberos Authentication
      • Hive Properties and Configuration Files
      • Configuring a Hive Metastore Destination
    • Hive Streaming
      • Hive Properties and Configuration Files
      • Configuring a Hive Streaming Destination
    • HTTP Client
      • HTTP Method
        • Expression Method
      • Number of Requests
      • OAuth 2 Authorization
        • Example for Twitter
        • Example for Microsoft Azure AD
        • Example for Google
      • Data Formats
      • Configuring an HTTP Client Destination
    • InfluxDB
      • Configuring an InfluxDB Destination
    • JDBC Producer
      • Installing the JDBC Driver
      • Define the CRUD Operation
      • Single and Multi-row Operations
      • Configuring a JDBC Producer
    • JMS Producer
      • Installing JMS Drivers
      • Data Formats
      • Configuring a JMS Producer
    • Kafka Producer
      • Broker List
      • Runtime Topic Resolution
      • Partition Strategy
      • Additional Kafka Properties
      • Enabling Security
        • Enabling SSL/TLS
        • Enabling Kerberos (SASL)
        • Enabling SSL/TLS and Kerberos
      • Data Formats
      • Configuring a Kafka Producer
    • Kinesis Firehose
      • AWS Credentials
      • Delivery Stream
      • Data Formats
      • Configuring a Kinesis Firehose Destination
    • Kinesis Producer
      • AWS Credentials
      • Data Formats
      • Configuring a Kinesis Producer Destination
    • KineticaDB
      • Multihead Ingest
      • Inserts and Updates
      • Configuring a KineticaDB Destination
    • Kudu
      • Define the CRUD Operation
      • Kudu Data Types
      • Configuring a Kudu Destination
    • Local FS
      • Directory Templates
      • Time Basis
      • Late Records and Late Record Handling
      • Timeout to Close Idle Files
      • Recovery
      • Event Generation
        • Event Records
      • Data Formats
      • Configuring a Local FS Destination
    • MapR DB
      • Field Mappings
      • Time Basis
      • Kerberos Authentication
      • Using an HBase User
      • HDFS Properties and Configuration File
      • Configuring a MapR DB Destination
    • MapR DB JSON
      • Row Key
        • Row Key Data Types
      • Writing to MapR DB JSON
        • Define the CRUD Operation
        • Insert and Set API Properties
      • Configuring a MapR DB JSON Destination
    • MapR FS
      • Directory Templates
      • Time Basis
      • Late Records and Late Record Handling
      • Timeout to Close Idle Files
      • Recovery
      • Event Generation
        • Event Records
      • Data Formats
      • Kerberos Authentication
      • Using an HDFS User
      • HDFS Properties and Configuration Files
      • Configuring a MapR FS Destination
    • MapR Streams Producer
      • Data Formats
      • Runtime Topic Resolution
      • Partition Strategy
      • Additional Properties
      • Configuring a MapR Streams Producer
    • MongoDB
      • Credentials
      • Define the CRUD Operation
        • Performing Upserts
      • Enabling SSL/TLS
      • Configuring a MongoDB Destination
    • MQTT Publisher
      • Topic
      • Data Formats
      • Configuring an MQTT Publisher Destination
    • Named Pipe
      • Prerequisites
        • Create the Named Pipe
        • Configure the Named Pipe Reader
      • Working with the Named Pipe Reader
      • Data Formats
      • Configuring a Named Pipe Destination
    • RabbitMQ Producer
      • Data Formats
      • Configuring a RabbitMQ Producer
    • Redis
      • Mode
        • Data Types for Batch Mode
        • Data Formats for Publish Mode
      • Define the CRUD Operation
      • Configuring a Redis Destination
    • Salesforce
      • Changing the API Version
      • Define the CRUD Operation
      • Field Mappings
      • Configuring a Salesforce Destination
    • SDC RPC
      • RPC Connections
      • Disabling Compression
      • Configuring an SDC RPC Destination
    • Solr
      • Index Mode
      • Kerberos Authentication
      • Configuring a Solr Destination
    • To Error
    • Trash
    • WebSocket Client
      • Data Formats
      • Configuring a WebSocket Client Destination
  • Executors
    • Executors
    • Amazon S3 Executor
      • AWS Credentials
      • Create New Objects
      • Tag Existing Objects
      • Configuring an Amazon S3 Executor
    • Email Executor
      • Prerequisite
      • Conditions
      • Using Expressions
      • Configuring an Email Executor
    • HDFS File Metadata Executor
      • Related Event Generating Stages
      • Changing Metadata
        • Specifying the File Path
        • Changing the File Name or Location
        • Defining the Owner, Group, Permissions, and ACLs
      • Creating an Empty File
      • Removing a File or Directory
      • Event Generation
        • Event Records
      • Kerberos Authentication
      • HDFS User
      • HDFS Properties and Configuration Files
      • Configuring an HDFS File Metadata Executor
    • Hive Query Executor
      • Related Event Generating Stages
      • Installing the Impala Driver
      • Hive and Impala Queries
      • Impala Queries for the Drift Synchronization Solution for Hive
      • Event Generation
        • Event Records
      • Configuring a Hive Query Executor
    • JDBC Query Executor
      • Installing the JDBC Driver
      • Configuring a JDBC Query Executor
    • MapR FS File Metadata Executor
      • Related Event Generating Stage
      • Changing Metadata
        • Specifying the File Path
        • Changing the File Name or Location
        • Defining the Owner, Group, Permissions, and ACLs
      • Creating an Empty File
      • Removing a File or Directory
      • Event Generation
        • Event Records
      • Kerberos Authentication
      • HDFS User
      • HDFS Properties and Configuration Files
      • Configuring a MapR FS File Metadata Executor
    • MapReduce Executor
      • Prerequisites
      • Related Event Generating Stages
      • MapReduce Jobs
        • Avro to Parquet Job
      • Event Generation
        • Event Records
      • Kerberos Authentication
      • Using a MapReduce User
      • Configuring a MapReduce Executor
    • Pipeline Finisher Executor
      • Recommended Implementation
      • Related Event Generating Stages
      • Notification Options
      • Configuring a Pipeline Finisher Executor
    • Shell Executor
      • Data Collector Shell Impersonation Mode
      • Script Configuration
      • Configuring a Shell Executor
    • Spark Executor
      • Spark Versions and Stage Libraries
      • Spark on YARN
        • YARN Prerequisite
        • Spark Home Requirement
        • Application Properties
        • Using a Proxy Hadoop User
        • Kerberos Authentication
      • Spark on Databricks
        • Databricks Prerequisites
      • Event Generation
        • Event Records
      • Monitoring
      • Configuring a Spark Executor
  • StreamSets Control Hub
    • Meet StreamSets Control Hub
      • Design the Complete Data Architecture
      • Collaboratively Build Pipelines
      • Execute Jobs at Scale
      • Map Jobs into a Topology
      • Measure Dataflow Quality
      • Monitor Dataflow Operations
    • Working with Control Hub
    • Request a Control Hub Organization and User Account
    • Register Data Collector with Control Hub
      • Registration Prerequisites
      • Registering from Data Collector
      • Registering from the Command Line Interface
      • Registering from Cloudera Manager
      • Disconnected Mode
      • Using an HTTP or HTTPS Proxy Server
      • Using a Publicly Accessible URL
      • Transfer Permissions to Control Hub Users
    • Pipeline Statistics
      • Pipeline Execution Mode
      • Write Statistics Directly to Control Hub
      • Write Statistics to SDC RPC
        • Best Practices for SDC RPC
      • Write Statistics to Kafka
        • Partition Strategy
        • Best Practices for a Kafka Cluster
      • Write Statistics to Kinesis Streams
        • AWS Credentials
        • Best Practices for Kinesis Streams
      • Write Statistics to MapR Streams
        • Partition Strategy
        • Best Practices for MapR Streams
      • Configuring a Pipeline to Write Statistics
    • Pipeline Management with Control Hub
      • Pipeline Types
        • Viewing Pipeline Types in Data Collector
      • Publishing Pipelines to Control Hub
        • Reverting Changes to Published Pipelines
      • Viewing Pipeline Commit History
      • Downloading Published Pipelines
      • Exporting Pipelines for Control Hub
    • Control Hub Configuration
    • Unregister Data Collector from Control Hub
      • Unregistering from Data Collector
      • Unregistering from the Command Line Interface
  • Dataflow Triggers
    • Dataflow Triggers Overview
    • Pipeline Event Generation
      • Using Pipeline Events
        • Pass to an Executor
        • Pass to Another Pipeline
    • Stage Event Generation
      • Using Stage Events
        • Task Execution Streams
        • Event Storage Streams
    • Executors
    • Logical Pairings
    • Event Records
      • Event Record Header Attributes
    • Viewing Events in Data Preview, Snapshot, and Monitor Mode
      • Viewing Stage Events in Data Preview and Snapshot
      • Viewing Stage Events in Monitor Mode
    • Executing Pipeline Events in Data Preview
    • Case Study: Parquet Conversion
    • Case Study: Impala Metadata Updates for DDS for Hive
    • Case Study: Output File Management
    • Case Study: Stop the Pipeline
    • Case Study: Offloading Data from Relational Sources to Hadoop
    • Case Study: Sending Email
    • Case Study: Event Storage
    • Summary
  • Drift Synchronization Solution for Hive
    • Drift Synchronization Solution for Hive
      • General Processing
      • Parquet Processing
      • Impala Support
      • Flatten Records
    • Basic Avro Implementation
    • Basic Parquet Implementation
    • Implementation Steps
    • Avro Case Study
      • The Hive Metadata Processor
      • The Hive Metastore Destination
      • The Data-Processing Destination
      • Processing Avro Data
    • Parquet Case Study
      • JDBC Query Consumer
      • The Hive Metadata Processor
      • The Hive Metastore Destination
      • The Data-Processing Destination
      • The MapReduce Executor
      • Processing Parquet Data
    • Hive Data Types
  • Drift Synchronization Solution for Postgres
    • Drift Synchronization Solution for Postgres
    • Basic Implementation and Processing
      • Flattening Records
    • Requirements
    • Implementation Steps
    • Case Study
      • The JDBC Multitable Consumer Origin
      • The Postgres Metadata Processor
      • The JDBC Producer Destination
      • Running the Pipeline
  • Multithreaded Pipelines
    • Multithreaded Pipeline Overview
    • How It Works
      • Origins for Multithreaded Pipelines
      • Processor Caching
    • Monitoring
    • Tuning Threads and Runners
    • Resource Usage
    • Multithreaded Pipeline Summary
  • Edge Pipelines
    • Edge Pipelines Overview
    • Example: IoT Preventative Maintenance
    • Supported Platforms
    • Edge Pipelines
      • Edge Sending Pipelines
      • Edge Receiving Pipelines
      • Error Record Handling
      • Data Formats
      • Edge Pipeline Limitations
    • Data Collector Receiving Pipelines
    • Getting Started with Sample Edge Pipelines
      • Step 1. Create and Start a Data Collector Receiving Pipeline
      • Step 2. Download and Install SDC Edge
      • Step 3. Start SDC Edge and the Edge Pipeline
    • Install SDC Edge
      • Downloading from Data Collector
      • Downloading from the StreamSets Website
      • Running from Docker
    • Administer SDC Edge
      • Configuring SDC Edge
      • Starting SDC Edge
      • Shutting Down SDC Edge
      • Logs
    • Export Pipelines to SDC Edge
    • Manage Pipelines on SDC Edge
    • Uninstalling SDC Edge
  • SDC RPC Pipelines
    • SDC RPC Pipeline Overview
      • Pipeline Types
    • Deployment Architecture
    • Configuring the Delivery Guarantee
    • Defining the RPC ID
    • Enabling Encryption
    • Configuration Guidelines for SDC RPC Pipelines
  • Cluster Pipelines
    • Cluster Pipeline Overview
      • Cluster Batch and Streaming Execution Modes
      • HTTP Protocols
      • Checkpoint Storage for Streaming Pipelines
        • Configuring the Location for Mesos
      • Error Handling Limitations
      • Monitoring and Snapshot
    • Kafka Cluster Requirements
      • Configuring Cluster YARN Streaming for Kafka
      • Configuring Cluster Mesos Streaming for Kafka
    • MapR Requirements
      • Configuring Cluster Batch Mode for MapR
      • Configuring Cluster Streaming Mode for MapR
    • HDFS Requirements
    • Cluster Pipeline Limitations
  • Data Preview
    • Data Preview Overview
      • Data Preview Availability
      • Source Data for Data Preview
      • Writing to Destinations
      • Notes
    • Data Collector UI - Preview Mode
      • Preview Codes
    • Previewing a Single Stage
    • Previewing Multiple Stages
    • Editing Preview Data
    • Editing Properties
  • Rules and Alerts
    • Rules and Alerts Overview
    • Metric Rules and Alerts
      • Default Metric Rules
      • Metric Types
        • Gauge
        • Counter
        • Histogram
        • Meter
        • Timer
      • Metric Conditions
      • Configuring a Metric Rule and Alert
    • Data Rules and Alerts
      • Configuring a Data Rule and Alert
      • Viewing Data Rule Metrics and Sample Data
    • Data Drift Rules and Alerts
      • Data Drift Alert Triggers
      • Configuring Data Drift Rules and Alerts
    • Alert Webhooks
      • Configuring an Alert Webhook
    • Configuring Email for Alerts
  • Pipeline Monitoring
    • Pipeline Monitoring Overview
    • Data Collector UI - Monitor Mode
    • Viewing Pipeline and Stage Statistics
    • Monitoring Errors
      • Stage-Related Errors
    • Snapshots
      • Failure Snapshots
        • Viewing a Failure Snapshot
      • Capturing and Viewing a Snapshot
      • Downloading a Snapshot
      • Deleting a Snapshot
    • Viewing the Run History
      • Viewing a Run Summary
  • Pipeline Maintenance
    • Understanding Pipeline States
      • State Transition Examples
    • Starting Pipelines
      • Starting Pipelines with Parameters
      • Resetting the Origin
    • Stopping Pipelines
    • Importing Pipelines
      • Importing a Single Pipeline
      • Importing a Set of Pipelines
    • Sharing Pipelines
      • Sharing a Pipeline
      • Changing the Pipeline Owner
    • Adding Labels to Pipelines
    • Exporting Pipelines
    • Duplicating a Pipeline
    • Deleting Pipelines
  • Administration
    • Viewing Data Collector Configuration Properties
    • Viewing Data Collector Directories
    • Viewing Data Collector Metrics
    • Viewing Data Collector Logs
      • Modifying the Log Level
    • Shutting Down Data Collector
    • Restarting Data Collector
    • Viewing Users and Groups
    • Support Bundles
      • Generators
      • Generating a Support Bundle
      • Customizing Generators
    • REST Response
      • Viewing REST Response Data
      • Disabling the REST Response Menu
    • Command Line Interface
      • Java Configuration Options for the Cli Command
      • Using the Cli Command
        • Not Registered with Control Hub
        • Registered with Control Hub
      • Help Command
      • Manager Command
      • Store Command
      • System Command
  • Tutorial
    • Tutorial Overview
    • Before You Begin
    • Basic Tutorial
      • Create a Pipeline and Define Pipeline Properties
      • Configure the Origin
      • Preview Data
      • Route Data with the Stream Selector
      • Use Jython for Card Typing
      • Mask Credit Card Numbers
      • Write to the Destination
      • Add a Corresponding Field with the Expression Evaluator
      • Create a Data Rule and Alert
      • Run the Basic Pipeline
    • Extended Tutorial
      • Convert Types with a Field Type Converter
      • Manipulate Data with the Expression Evaluator
      • Preview and Edit the Pipeline
      • Write to Trash
      • Run the Extended Pipeline
  • Troubleshooting
    • Accessing Error Messages
    • Pipeline Basics
      • Data Preview
      • General Validation Errors
    • Origins
      • Directory
      • Hadoop FS
      • JDBC Origins
      • Kafka Consumer
      • Oracle CDC Client
      • SQL Server CDC Client
    • Destinations
      • Cassandra
      • Hadoop FS
      • HBase
      • Kafka Producer
      • SDC RPC
    • Executors
      • Hive Query
    • JDBC Connections
      • No Suitable Driver
      • Cannot Connect to Database
      • MySQL JDBC Driver and Time Values
    • Performance
    • Cluster Execution Mode
  • Glossary
    • Glossary of Terms
  • Data Formats by Stage
    • Data Format Support
      • Origins
      • Destinations
  • Expression Language
    • Expression Language
      • Expression Examples
    • Functions
      • Record Functions
      • Delimited Data Record Functions
      • Error Record Functions
      • Base64 Functions
      • Credential Functions
      • Data Drift Functions
      • Field Functions
      • File Functions
      • Math Functions
      • Pipeline Functions
      • String Functions
      • Time Functions
      • Miscellaneous Functions
    • Constants
    • Datetime Variables
    • Literals
    • Operators
      • Operator Precedence
    • Reserved Words
  • Regular Expressions
    • Regular Expressions Overview
    • Regular Expressions in the Pipeline
    • Quick Reference
    • Regex Examples
  • Grok Patterns
    • Defining Grok Patterns
    • General Grok Patterns
    • Date and Time Grok Patterns
    • Java Grok Patterns
    • Log Grok Patterns
    • Networking Grok Patterns
    • Path Grok Patterns
© Apache License, Version 2.0.