140+ Source Connectors with Jitsu and Airbyte

Sergey Buryking

Software Engineer
October 6th, 2021
Jitsu & Airbyte

Airbyte is an open-source ETL platform that pipes data from services and databases to data warehouses. The company is growing fast, as well as the number of connectors they support. At the moment, they support around 100 services. Each connector is a separate open-source package, similar to Singer Tap, but wrapped in a docker image. The connector contains all logic for downloading data from a source and follows airbyte protocol.

All connectors are published on DockerHub, most of them are MIT licensed.

At Jitsu, we built a support Singer-based connectors a while ago. So we decided to support Airbyte protocol as well: starting from Jitsu v1.36 more than 80 connectors build by Airbyte team are available in Jitsu - both open-source version and Jitsu.Cloud

Jitsu Source Connectors

Configuration

For Jitsu.Cloud users Airbyte connectors work without any additional configuration. If you're deploying Jitsu on-prem, additional configuration required:

  • If you're deploying Jitsu with Docker, host machine should be exposed to an Jitsu docker image with -v /var/run/docker.sock:/var/run/docker.sock. Read docker deployment manual for more information
  • Unfortunately, Airbyte connectors will not work for Heroku deployments. Heroku Docker environment won't allow you to expose host machine

How it works

Let's take a look how Airbyte connectors works inside Jitsu. As an example, we're going to pull data from Hubspot which we use internally at Jitsu

Step 1: search for Hubspot source

Signup for Jitsu.Cloud or Deploy Jitsu Open-Source. Select Sources and search for Hubspot source

Step 2: configure Hubspot connection

We're going to use API Key Credentials authentication. This type of authentication requires a one developer key which can be easily obtained from HubSpot.

Put any past date to Replication start field. We put 2017-01-25T00:00:00Z, it means that data before 25-Jan-2017 will be synchronized

Click Test Connection button to verify that the connection works

Step 3: configure streams

Go to Streams tab and select entities you want to sync. We're going to sync only contacts and companies. Stream is represents a set ob objects from a source which will be mapped to a table in SQL database.

Step 4: configure Postgres

We need to set up a destination. We're going to use Postgres SQL instance provided by AWS Lightsail. Don't forget to enable Public Mode for your instance! Note: default database name for your instance is postgres

Once the destination is configured, connect the destination and the source at Linked Connectors & API Keys tab

Ignore the field Table Name, this is relevant only for events coming through Push APIs (JavaScript as an example). Hubspot connector will create their own tables

Step 5: synchronize

By default, the data is synced on a daily basis. You can initiate the sync manually by going to Sources and clicking Sync Now.

You'll be redirected to a list of tasks immediately. Be patient, the first run can take a while since the data is synchronized for the first time. Subsequent runs will be faster

Once the task is finished, you'll see a message in logs. The task will change the status from RUNNING to SUCCESS as well:

Step 6: see the data

Connect to PostgresSQL database with your favourite tool (we're fans of DataGrip!):

SELECT tablename FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';

This query will return list of all tables. Since we synchronized only 2 sources, only two tables will appear: airbyte_source_hubspot_companies, airbyte_source_hubspot_contacts. Let's see what inside with select * from airbyte_source_hubspot_companies

About Jitsu

Jitsu is an open-source data integration platform offering features like pulling data from APIs, streaming event-base data to DBs, multiplexing and many others.
© Jitsu Labs, Inc

2261 Market Street #4109
San Francisco, CA 94114