Like Jitsu? Give us a star on ⭐ GitHub!

πŸ“œ Configuration

Configuration UI

πŸ‘©β€πŸ”¬ Extending Jitsu

Overview
Destination Extensions
Source Extensions
API Specs

Jitsu Internals

Sources Configuration

Understanding sources (aka connectors)

Sources (connectors) and Collection (streams)
Sources (connectors) and Collection (streams)

Sources (or connectors) are used to import data from external API (Google Analytics, Facebook, etc) or databases (redis, firebase, etc) into destinations. Each source represents a connection to particular API.

Synchronization scheduling engine is called sync tasks sync tasks.

Jitsu supports 3 type of sources:

Collections (aka streams)

Each source exports one or more collections (also called "streams" in Airbyte/Singer nomenclature). Example: slack source exports Users, Messages, Channels and few other collections. Each collection is represented by a table in a destination.

Collections may be static or configurable. Configuration usually defines a set of fields which are exported. Example Firebase collections (users, firestore) are static while Google Analytics collections is parametrized (Google Analytics has dimensions and metrics).

Native Connecting Configuration

This section applies only to connectors that are native part of Jitsu. A full list of native connectors is: is: facebook, google-ads, google-analytics, redis, google-play, firebase, amplitude.

Other connectors (based either on Singer, or Airbyte) has a slighly different configuration syntax. Learn more abour Singer-based or Airbyte-based sources

Example of source configuration:

sources:
  firebase_example_id:
    type: firebase
    destinations:
      - "<DESTINATION_ID>"
    collections:
      - "<FIRESTORE_COLLECTION_ID>"
    config:
      project_id: "<FIREBASE_PROJECT_ID>"
      key: '<GOOGLE_SERVICE_ACCOUNT_KEY_JSON>'
  google_analytics_example_id:
    type: google_analytics
    destinations:
      - "<DESTINATION_ID>"
    collections:
      - name: "report_test"
        type: "report"
        schedule: '45 23 * * 6'
        parameters:
          dimensions:
            - "ga:country"
            - "ga:yearMonth"
          metrics:
            - "ga:sessions"
    config:
      view_id: "<VIEW_ID_VALUE>"
      auth:
        service_account_key: "<GOOGLE_SERVICE_ACCOUNT_KEY_JSON>"
  ...

Common yaml properties for all sources (all yaml properties are required):

PropertyDescription
typedetermines the type of a data source from which data would be imported (like google_analytics or firebase)
destinationslist of destination ids where result must be stored
collectionslist of collections to synchronize
configcustom parameters for each source type

To see how to configure some type of source, please visit documentation pages for exact source types.

This feature requires:

  • meta.storage configuration
  • primary_key_fields configuration (in Postgres destination case)
  • Collection Configuration

    Sources should define a list of collections (or stream) explicitly. Each collection defines a synchronization schedule, destination table name (table name will be prefixed with source_id to avoid collisions). Here's an example configuration snippet:

    sources:
      firebase_example_id:
      collections:
        - name: "some_name"
          type: "collection_type_id"
          table_name: "table_name_for_data"
          start_date: "2020-06-01"
          schedule: '@daily' #cron expression. see below
          parameters:
            field1: "value"
            field2: ["values"]
            field3:
              some_object:
          ...

    Full list of parameters

    ParameterDescription
    name (required)is a unique identifier of collection within a list of collections
    typedetermines which data subset must be synchronized. If type absents, type equals to name parameter
    table_namename of the table to keep synchronized data. If not set, equals to the name of collection
    start_datestart date string of data to download in YYYY-MM-DD format. Default values is 365 days ago
    schedulecron expression automatic collection synchronization schedule. If not set - only manual collection synchronization(by HTTP API) will be available
    parametersif the collection is parametrized, parameter values are set here. A value may be of any type (string, number, boolean, list, object). To get a full list of parameters, take a look to catalog

    If the collection has no parameters, it may be configured only by its name as a string argument. For example:

    collections: ["collection1_id", "collection2_id"]

    Configuring sources via HTTP - endpoint

    If sources configuration is generated by an external service, it is possible to externalize via HTTP end - point (or file) as follows:

    sources: 'location'

    The location can behttp(s):// of a local file (/path/to/file) location and should contain YAML or (JSON that is identical to YAML structure). If the location is an URL, the client will respect If-Modified-Since / Last-Modified caching.

    Example of URL content:

    {
      "sources": { #json object where inner keys - sources unique ids
        "facebook_marketing_online_sales": { #source config object
          "type": "facebook_marketing",
          ...
        },
        "facebook_marketing_offline_sales": {
          "type": "facebook_marketing",
          ...
        }
      }
    }