Solr

Supported pipeline types:
  • Data Collector

The Solr destination writes data to a Solr node or cluster.

When you configure the Solr destination, you configure connection information for the node or cluster. You can configure the destination to index records individually or in batches.

You configure how the destination maps fields in the record to fields in Solr. You can have the destination automatically map fields in the record to fields in the Solr schema based on name. Alternatively, you can map specific incoming fields to Solr fields. You also specify the action to take when the record is missing fields from the schema or mapped fields, and you can configure the destination to ignore missing optional fields.

You can specify whether the destination validates the connection to Solr. And you can configure write properties that determine whether the destination waits for Solr to complete all processing before continuing to write additional data.

When necessary, you can enable the destination to use Kerberos authentication.

Index Mode

The index mode determines how the Solr destination indexes records when writing to Solr. Index mode also determines how the destination handles errors.

You can use the following index modes:
Record
The destination indexes one record at a time, and then passes the record to Solr.
If an error occurs, the destination passes the record to the stage for error handling.
Batch
The destination indexes a batch of records at one time, and then passes the batch to Solr.
If an error occurs, the destination rolls back any records that were indexed and passes the entire batch to the stage for error handling.

Kerberos Authentication

You can use Kerberos authentication to connect to a Solr node or cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to Solr.

The Kerberos principal and keytab are defined in the Data Collector configuration file, $SDC_CONF/sdc.properties. To use Kerberos authentication, configure all Kerberos properties in the Data Collector configuration file, and then enable Kerberos in the Solr destination.

For more information about enabling Kerberos authentication for Data Collector, see Kerberos Authentication.

Configuring a Solr Destination

Configure a Solr destination to write data to a Solr node or cluster.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Solr tab, configure the following properties:
    Solr Property Description
    Instance Type Solr instance type to write to:
    • Single Node - Writes to a single Solr node.
    • SolrCloud - Writes to a Solr cluster.
    Solr URI When writing to a single node, URI for the node. Use the following format:
    http://<host>:<port>/solr/<core_name>
    ZooKeeper Connection String When writing to a Solr cluster, the ZooKeeper connection string. Use the following format:
    <host>:<port>

    If the cluster uses multiple ZooKeeper instances, enter a comma-separated list of the connection strings.

    Default Collection Name When writing to a Solr cluster, the default collection name for the cluster.
    Record Indexing Mode Determines how records are indexed.
    Map Fields Automatically Maps fields in the record automatically to fields in the Solr schema based on matching names.

    If Ignore Optional Fields is selected, the destination processes each record unless the record is missing a field required in the Solr schema. If Ignore Optional Fields is not selected, then each record must contain all fields specified in the schema, required or not.

    Only use this option when fields in the record have the same names and compatible data types as fields in the Solr schema.

    Field Path for Data Path to the record fields that the destination writes to Solr. Available when the destination maps fields automatically.

    Default value is /, indicating that the fields are at the root level.

    Fields Mapping of fields in the record to Solr fields. Available when the destination does not map fields automatically.

    Mapped fields must have compatible data types. For example, you must map List and Map fields in the record to Solr fields that are multi-valued.

    Using simple or bulk edit mode, click the Add icon to create additional field mappings.

    Ignore Optional Fields Ignores non-required fields that do not exist in the record. When selected, records with missing optional fields are written without the optional field.

    When not selected, any record with a missing optional field is treated based on the Missing Fields property.

    Missing Fields Action to take if the record does not include a field from the schema or a mapped field:
    • Discard - Discards the record and continues to process subsequent records.
    • Send to Error - Processes the record based on the error handling configured for the stage.
    • Stop Pipeline - Stops the pipeline.

    When Ignore Optional Fields is selected, this property does not apply to missing optional fields.

    Kerberos Authentication Uses Kerberos credentials to connect to a Solr node or cluster.

    When selected, uses the Kerberos principal and keytab defined in the Data Collector configuration file, $SDC_CONF/sdc.properties.

    Skip Validation Determines whether the destination validates the connection to Solr.

    Configure the destination to skip validation when the Solr configuration file, solrconfig.xml, does not define the default search field ("df") parameter.

    Wait Flush Determines whether the destination waits for Solr to complete writing a batch of data to disk before processing another batch.

    By default, the destination waits. You can disable this property to increase write performance, but data can be lost if the Solr server fails to complete the write to disk.

    Wait Searcher Determines whether the destination waits for Solr to make a batch of data searchable before processing another batch.

    By default, the destination waits. You can disable this property if you don’t need the data to be searchable in Solr before the data is committed by Data Collector.

    Soft Commit Determines whether Solr performs a soft or hard commit. A soft commit refreshes the view of the index before a batch of data is fully available. A hard commit updates the index only after the batch is fully available.

    By default, the destination requests a hard commit. You can disable this property to increase write performance if data does not need to be immediately visible.

    Connection Timeout (ms) Maximum number of milliseconds allowed to initiate a connection to a Solr node or cluster. 0 indicates no limit.
    Socket Timeout (ms) Maximum number of milliseconds that the data flow can be interrupted. 0 indicates no limit.