Increase BigQuery Talend timeout within components

In this article I will show you how to change a critical setting in Talend : the timeout. We will see how to change it for BigQuery components, since it’s not available in the Studio interface.

I will also explain how they work, which is very important to understand.

The BigQuery components

There is basically 3 actions you can do with these components :

  • tBigQueryInput : read data from BigQuery
  • tBigQuerySQLRow : Execute a query, and fetch the result if needed
  • tBigQueryOutput : write data to BigQuery ; this components doesn’t really exists, it’s an aggregation of :
    1. tBigQueryOutputBulk : write a bulk file
    2. tBigQueryBulkExec : upload the file to Cloud Storage and write data to BigQuery

This is very important to understand because we will not change all components, but only one !

biguery components

Closer look into tBigQueryOutput

First, let’s take a look at our component :

tbigQueryOutput component

Here is what we have :

  • The local path to write the file
  • The authentication part
  • The targeted table information
  • Google Storage information, as of :
    • Authentication
    • Target file

But the components does not work in this order :

  1. Authenticate to Google Cloud Platform
  2. Write the file
  3. Authenticate to Google Cloud Storage
  4. Upload the file to GCS
  5. Load the table with the bulk file
  6. Wait for table load to end : this is the timeout we want to control
  7. Closing connection

For BigQuery to load a table, it needs to input a file to replace or append its data. So the component waits for GCP to acknowledge of the end of the loading, so it can end the component.

This timeout is set to 30 seconds and you can’t change it without changing the component itself !

If your working with 5M+ tuple tables or 5GB+, you will crack up this limit pretty fast !

Changing the timeout

As seen before we only need to change containing the connection to Bigquery, which is the tBigQueryBulkExec component. To do so, we need to change its code used to generate you code !

The file you need to change is located in your Talend folder, in this precise location (version may vary) : « C:\TOS_DI-20210915_1333-V8.0.1M12\plugins\org.talend.designer.components.localprovider_8.0.1.M12\components\tBigQueryBulkExec\tBigQueryBulkExec_begin.javajet » (the bold part is static).

First thing to do is to backup your file in case something doesn’t work well with your version of the Studio !

Then, find the following string :

job_<%=cid%> = job_<%=cid%>.waitFor(com.google.cloud.RetryOption.initialRetryDelay(org.threeten.bp.Duration.ofSeconds(1)), com.google.cloud.RetryOption.totalTimeout(org.threeten.bp.Duration.ofSeconds(30)));

and replace it with :

job_<%=cid%> = job_<%=cid%>.waitFor(com.google.cloud.RetryOption.initialRetryDelay(org.threeten.bp.Duration.ofSeconds(1)), com.google.cloud.RetryOption.totalTimeout(org.threeten.bp.Duration.ofSeconds(300)));

300 is the value in seconds of the desired timeout.

To ease your finding, you should look for the string « RetryOption » which is basically unique in the component. If there’s no part with « totalTimeout« , you can just replace with the new line.

For this change to take effect you need to restart Talend, and all your co-workers need to update their component too, so you don’t build a job with another timeout.

bigquery javajet

Conclusion

You have successfully managed to upgrade your BigQuery components by allowing a larger timespan for loading the file from Google Cloud Storage to Google BigQuery.

You can change the parameter to even a larger time if needed, but keep in mind that loading a 10GB table (the file size is 40GB) takes between 90 and 120 seconds, so you won’t need to increase it probably !

Since you changed a « .javajet » file, you’ll need to replicate the change in every Studio you use !