1 of 1

Google BigQuery

The Google BigQuery sink connector is an open-source connector imported from Confluent (originally developed by WePay) that enables you to export data from Apache Kafka® topics to Google BigQuery tables.

Overview

The BigQuery sink connector allows you to:

Stream data from Kafka topics to BigQuery tables
Automatically create tables based on topic data
Configure data delivery semantics (at-least-once or exactly-once)
Perform schema evolution when topic schemas change

Prerequisites

Before using the BigQuery sink connector, ensure you have:

A Google Cloud Platform (GCP) account
A BigQuery project with appropriate permissions
Service account credentials with access to BigQuery
Kafka topics with data to be exported

Configuration

Basic Configuration

Here's a basic configuration for the BigQuery sink connector:

Features of Google BigQuery Sink Connector

Multiple tasks support: Configure using tasks.max parameter for performance optimization when parsing multiple files
InsertAll API features: Supports insert operations with built-in duplicate detection capabilities
Real-time streaming: Records are inserted one at a time and available immediately for querying

Important Configuration Properties

Property

Description

Type

Default

Importance

Data Mapping

Data Type Conversions

The connector maps Kafka Connect schema types to BigQuery data types as follows:

BigQuery Data Type

Connector Mapping

Schema Evolution

When schema evolution is enabled (using allowNewBigQueryFields, allowBigQueryRequiredFieldRelaxation, and allowSchemaUnionization), the connector can handle schema changes:

New fields added to the Kafka topic can be added to the BigQuery table
Field constraints can be relaxed from REQUIRED to NULLABLE
Schemas can be unionized when records in the same batch have different schemas

Usage Examples

Basic Example

Example with Batch Loading

Example with Upsert Functionality

Troubleshooting

Common Issues

Authentication errors: Ensure your service account key file is correct and has appropriate permissions.
Schema compatibility issues: When schema updates are enabled, existing data might not be compatible with new schemas.
Quota limitations: BigQuery has quotas for API requests; consider adjusting threadPoolSize and queueSize.

Logging

To enable detailed logging for troubleshooting:

Limitations

The BigQuery Sink connector has the following limitations:

The connector does not support schemas with recursion.
The connector does not support schemas having float fields with NaN or +Infinity values.
Auto schema update does not support removing columns.
Auto schema update does not support recursive schemas.

Upgrading to 2.x.x

The following changes aren’t backward compatible in the BigQuery connector:

datasets was removed and defaultDataset has been introduced. The connector now infers the dataset from the topic name if the topic is in the form <dataset>:<tableName>. If the topic name is in the form <tablename>, the connector defaults to defaultDataset.
topicsToTables was removed. You should use SMT RegexRouter to route topics to tables.