Solr

Supported pipeline types:
  • Data Collector

The Solr destination writes data to a Solr node or cluster.

When you configure the Solr destination, you configure connection information for the node or cluster. You can configure the destination to index records individually or in batches.

You map incoming fields to Solr fields. You can specify the action to take when mapped fields are missing and configure the destination to ignore missing optional fields.

You can specify whether the destination validates the connection to Solr. And you can configure write properties that determine whether the destination waits for Solr to complete all processing before continuing to write additional data.

When necessary, you can enable the destination to use Kerberos authentication.

Index Mode

The index mode determines how the Solr destination indexes records when writing to Solr. Index mode also determines how the destination handles errors.

You can use the following index modes:
Record
The destination indexes one record at a time, and then passes the record to Solr.
If an error occurs, the destination passes the record to the stage for error handling.
Batch
The destination indexes a batch of records at one time, and then passes the batch to Solr.
If an error occurs, the destination rolls back any records that were indexed and passes the entire batch to the stage for error handling.

Kerberos Authentication

You can use Kerberos authentication to connect to a Solr node or cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to Solr.

The Kerberos principal and keytab are defined in the Data Collector configuration file, $SDC_CONF/sdc.properties. To use Kerberos authentication, configure all Kerberos properties in the Data Collector configuration file, and then enable Kerberos in the Solr destination.

For more information about enabling Kerberos authentication for Data Collector, see Kerberos Authentication.

Configuring a Solr Destination

Configure a Solr destination to write data to a Solr node or cluster.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Solr tab, configure the following properties:
    Solr Property Description
    Instance Type Solr instance type to write to:
    • Single Node - Writes to a single Solr node.
    • SolrCloud - Writes to a Solr cluster.
    Solr URI When writing to a single node, URI for the node. Use the following format:
    http://<host>:<port>/solr/<core_name>
    ZooKeeper Connection String When writing to a Solr cluster, the ZooKeeper connection string. Use the following format:
    <host>:<port>

    If the cluster uses multiple ZooKeeper instances, enter a comma-separated list of the connection strings.

    Default Collection Name When writing to a Solr cluster, the default collection name for the cluster.
    Record Indexing Mode Determines how records are indexed.
    Fields Map fields from the record to Solr fields.

    Mapped fields must have compatible data types. For example, you must map List and Map fields in the record to Solr fields that are multi-valued.

    Using simple or bulk edit mode, click the Add icon to create additional field mappings.

    Ignore Optional Fields Ignores non-required fields that do not exist in the record. When selected, records with missing optional fields are written without the optional field.

    When not selected, any record with a missing optional field is treated based on the Missing Fields property.

    Missing Fields Action to take if any of the mapped fields are not included in the record:
    • Discard - Discards any missing mapped fields and writes the record to Solr without the fields.
    • Send to Error - Processes the record based on the error handling configured for the stage.
    • Stop Pipeline - Stops the pipeline.

    When Ignore Optional Fields is selected, this property does not apply to missing optional fields.

    Kerberos Authentication Uses Kerberos credentials to connect to a Solr node or cluster.

    When selected, uses the Kerberos principal and keytab defined in the Data Collector configuration file, $SDC_CONF/sdc.properties.

    Skip Validation Determines whether the destination validates the connection to Solr.

    Configure the destination to skip validation when the Solr configuration file, solrconfig.xml, does not define the default search field ("df") parameter.

    Wait Flush Determines whether the destination waits for Solr to complete writing a batch of data to disk before processing another batch.

    By default, the destination waits. You can disable this property to increase write performance, but data can be lost if the Solr server fails to complete the write to disk.

    Wait Searcher Determines whether the destination waits for Solr to make a batch of data searchable before processing another batch.

    By default, the destination waits. You can disable this property if you don’t need the data to be searchable in Solr before the data is committed by Data Collector.

    Soft Commit Determines whether Solr performs a soft or hard commit. A soft commit refreshes the view of the index before a batch of data is fully available. A hard commit updates the index only after the batch is fully available.

    By default, the destination requests a hard commit. You can disable this property to increase write performance if data does not need to be immediately visible.