Airbyte is an open-source ETL platform that pipes data from services and databases to data warehouses. The company is growing fast, as well as the number of connectors they support. At the moment, they support around 100 services. Each connector is a separate open-source package, similar to Singer Tap, but wrapped in a docker image. The connector contains all logic for downloading data from a source and follows airbyte protocol.
All connectors are published on DockerHub, most of them are MIT licensed.
At Jitsu, we built a support Singer-based connectors a while ago. So we decided to support Airbyte protocol as well: starting from Jitsu v1.36 more than 80 connectors build by Airbyte team are available in Jitsu - both open-source version and Jitsu.Cloud
For Jitsu.Cloud users Airbyte connectors work without any additional configuration. If you're deploying Jitsu on-prem, additional configuration required:
- If you're deploying Jitsu with Docker, host machine should be exposed to an Jitsu docker image with
-v /var/run/docker.sock:/var/run/docker.sock. Read docker deployment manual for more information
- Unfortunately, Airbyte connectors will not work for Heroku deployments. Heroku Docker environment won't allow you to expose host machine
How it works#
Let's take a look how Airbyte connectors works inside Jitsu. As an example, we're going to pull data from Hubspot which we use internally at Jitsu
Step 1: search for Hubspot source#
Step 2: configure Hubspot connection#
We're going to use
API Key Credentials authentication. This type of authentication requires a one developer
key which can be easily obtained from HubSpot.
Put any past date to
Replication start field. We put
2017-01-25T00:00:00Z, it means that data before 25-Jan-2017 will be synchronized
Click Test Connection button to verify that the connection works
Step 3: configure streams#
Go to Streams tab and select entities you want to sync. We're going to sync only contacts and companies. Stream is represents a set ob objects from a source which will be mapped to a table in SQL database.
Step 4: configure Postgres#
We need to set up a destination. We're going to use Postgres SQL instance provided by AWS Lightsail.
Don't forget to enable Public Mode for your instance!
Note: default database name for your instance is
Once the destination is configured, connect the destination and the source at Linked Connectors & API Keys tab
Step 5: synchronize#
By default, the data is synced on a daily basis. You can initiate the sync manually by going to Sources and clicking Sync Now.
You'll be redirected to a list of tasks immediately. Be patient, the first run can take a while since the data is synchronized for the first time. Subsequent runs will be faster
Once the task is finished, you'll see a message in logs. The task will change the status from
SUCCESS as well:
Step 6: see the data#
Connect to PostgresSQL database with your favourite tool (we're fans of DataGrip!):
SELECT tablename FROM pg_catalog.pg_tables WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';
This query will return list of all tables. Since we synchronized only 2 sources, only two tables will
airbyte_source_hubspot_contacts. Let's see what inside
select * from airbyte_source_hubspot_companies