Skip to content

Commit

Permalink
Add support for Dataproc 2.2
Browse files Browse the repository at this point in the history
  • Loading branch information
jphalip committed Oct 2, 2024
1 parent 0fea263 commit 3236f3b
Show file tree
Hide file tree
Showing 10 changed files with 147 additions and 40 deletions.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## 2.1.0 - 2024-07-??

* Added support for Dataproc 2.2.
* Added support for customer-managed encryption key (CMEK).
* Added support for Pig and HCatalog.
* Added support for Hive 1.x.x and Hive 2.x.x.
Expand Down
18 changes: 9 additions & 9 deletions README-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ See the details in [CHANGES.md](CHANGES.md).

## Version support

This connector supports [Dataproc](https://cloud.google.com/dataproc) 2.0 and 2.1.
This connector supports [Dataproc](https://cloud.google.com/dataproc) 2.0, 2.1, and 2.2.

For Hadoop clusters other than Dataproc, the connector has been tested with the following
software versions:
Expand Down Expand Up @@ -871,22 +871,22 @@ You must use Java version 8, as it's the version that Hive itself uses. Make sur

* To run the integration tests:
```sh
./mvnw verify -Pdataproc21,integration
./mvnw verify -Pdataproc22,integration
```

* To run a single integration test class:
```sh
./mvnw verify -Pdataproc21,integration -Dit.test="BigLakeIntegrationTests"
./mvnw verify -Pdataproc22,integration -Dit.test="BigLakeIntegrationTests"
```

* To run a specific integration test method:
```sh
./mvnw verify -Pdataproc21,integration -Dit.test="BigLakeIntegrationTests#testReadBigLakeTable"
./mvnw verify -Pdataproc22,integration -Dit.test="BigLakeIntegrationTests#testReadBigLakeTable"
```

* To debug the tests, add the `-Dmaven.failsafe.debug` property:
```sh
./mvnw verify -Pdataproc21,integration -Dmaven.failsafe.debug
./mvnw verify -Pdataproc22,integration -Dmaven.failsafe.debug
```
... then run a remote debugger in IntelliJ at port `5005`. Read more about debugging with FailSafe
here: https://maven.apache.org/surefire/maven-failsafe-plugin/examples/debugging.html
Expand All @@ -906,10 +906,10 @@ The following environment variables must be set and **exported** first.
To run the acceptance tests:

```sh
./mvnw verify -Pdataproc21,acceptance
./mvnw verify -Pdataproc22,acceptance
```

If you want to avoid rebuilding the `shaded-deps-dataproc21` and
If you want to avoid rebuilding the `shaded-deps-dataproc22` and
`shaded-acceptance-tests-dependencies` modules if they have no changes, you can break it down into
the following steps:

Expand All @@ -918,13 +918,13 @@ the following steps:
./mvnw install:install-file -Dpackaging=pom -Dfile=hive-bigquery-parent/pom.xml -DpomFile=hive-bigquery-parent/pom.xml
# Build and install the module JARs to the Maven local repo
./mvnw clean install -pl shaded-deps-dataproc21,shaded-acceptance-tests-dependencies -Pdataproc21 -DskipTests
./mvnw clean install -pl shaded-deps-dataproc22,shaded-acceptance-tests-dependencies -Pdataproc22 -DskipTests
```

At that point you can just run the tests without rebuilding the modules:

```sh
./mvnw clean verify -pl hive-bigquery-connector-common,hive-3-bigquery-connector -Pdataproc21,acceptance
./mvnw clean verify -pl hive-bigquery-connector-common,hive-3-bigquery-connector -Pdataproc22,acceptance
```

##### Running the tests for different Hadoop versions
Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ See the details in [CHANGES.md](CHANGES.md).

## Version support

This connector supports [Dataproc](https://cloud.google.com/dataproc) 2.0 and 2.1.
This connector supports [Dataproc](https://cloud.google.com/dataproc) 2.0, 2.1, and 2.2.

For Hadoop clusters other than Dataproc, the connector has been tested with the following
software versions:
Expand Down Expand Up @@ -871,22 +871,22 @@ You must use Java version 8, as it's the version that Hive itself uses. Make sur

* To run the integration tests:
```sh
./mvnw verify -Pdataproc21,integration
./mvnw verify -Pdataproc22,integration
```

* To run a single integration test class:
```sh
./mvnw verify -Pdataproc21,integration -Dit.test="BigLakeIntegrationTests"
./mvnw verify -Pdataproc22,integration -Dit.test="BigLakeIntegrationTests"
```

* To run a specific integration test method:
```sh
./mvnw verify -Pdataproc21,integration -Dit.test="BigLakeIntegrationTests#testReadBigLakeTable"
./mvnw verify -Pdataproc22,integration -Dit.test="BigLakeIntegrationTests#testReadBigLakeTable"
```

* To debug the tests, add the `-Dmaven.failsafe.debug` property:
```sh
./mvnw verify -Pdataproc21,integration -Dmaven.failsafe.debug
./mvnw verify -Pdataproc22,integration -Dmaven.failsafe.debug
```
... then run a remote debugger in IntelliJ at port `5005`. Read more about debugging with FailSafe
here: https://maven.apache.org/surefire/maven-failsafe-plugin/examples/debugging.html
Expand All @@ -906,10 +906,10 @@ The following environment variables must be set and **exported** first.
To run the acceptance tests:

```sh
./mvnw verify -Pdataproc21,acceptance
./mvnw verify -Pdataproc22,acceptance
```

If you want to avoid rebuilding the `shaded-deps-dataproc21` and
If you want to avoid rebuilding the `shaded-deps-dataproc22` and
`shaded-acceptance-tests-dependencies` modules if they have no changes, you can break it down into
the following steps:

Expand All @@ -918,13 +918,13 @@ the following steps:
./mvnw install:install-file -Dpackaging=pom -Dfile=hive-bigquery-parent/pom.xml -DpomFile=hive-bigquery-parent/pom.xml
# Build and install the module JARs to the Maven local repo
./mvnw clean install -pl shaded-deps-dataproc21,shaded-acceptance-tests-dependencies -Pdataproc21 -DskipTests
./mvnw clean install -pl shaded-deps-dataproc22,shaded-acceptance-tests-dependencies -Pdataproc22 -DskipTests
```

At that point you can just run the tests without rebuilding the modules:

```sh
./mvnw clean verify -pl hive-bigquery-connector-common,hive-3-bigquery-connector -Pdataproc21,acceptance
./mvnw clean verify -pl hive-bigquery-connector-common,hive-3-bigquery-connector -Pdataproc22,acceptance
```

##### Running the tests for different Hadoop versions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,57 @@
import com.google.inject.Guice;
import com.google.inject.Injector;
import java.io.IOException;
import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Parameter;
import java.time.Instant;
import org.apache.hadoop.conf.Configuration;

/**
* Simple AccessTokenProvider that delegates credentials retrieval to BigQueryCredentialsSupplier.
*/
public class GCSConnectorAccessTokenProvider implements AccessTokenProvider {

// New versions (>=3.0) of the GCS connector used in Dataproc >=2.2 have changed the
// signature of some constructors, so we use reflection to determine which constructor
// to use and remain compatible with all versions of Dataproc.
protected static boolean useNewGcsConnectorAPI;

static {
Constructor<?>[] constructors = AccessToken.class.getConstructors();
for (Constructor<?> constructor : constructors) {
Parameter[] parameters = constructor.getParameters();
if (parameters.length == 2
&& parameters[0].getType() == String.class
&& parameters[1].getType() == Instant.class) {
useNewGcsConnectorAPI = true;
break;
}
}
}

public static AccessToken instantiateAccessToken(String stringArg, long timestamp) {
try {
if (useNewGcsConnectorAPI) {
return AccessToken.class
.getConstructor(String.class, Instant.class)
.newInstance(stringArg, Instant.ofEpochMilli(timestamp));
} else {
return AccessToken.class
.getConstructor(String.class, Long.class)
.newInstance(stringArg, timestamp);
}
} catch (NoSuchMethodException
| InvocationTargetException
| IllegalAccessException
| InstantiationException e) {
throw new RuntimeException("Error instantiating AccessToken", e);
}
}

Configuration conf;
BigQueryCredentialsSupplier credentialsSupplier;
private static final AccessToken EXPIRED_TOKEN = new AccessToken("", -1L);
private static final AccessToken EXPIRED_TOKEN = instantiateAccessToken("", -1L);
private AccessToken accessToken = EXPIRED_TOKEN;
public static final String CLOUD_PLATFORM_SCOPE =
"https://www.googleapis.com/auth/cloud-platform";
Expand All @@ -47,7 +88,8 @@ public void refresh() throws IOException {
GoogleCredentials credentials = (GoogleCredentials) credentialsSupplier.getCredentials();
com.google.auth.oauth2.AccessToken token =
credentials.createScoped(CLOUD_PLATFORM_SCOPE).refreshAccessToken();
this.accessToken = new AccessToken(token.getTokenValue(), token.getExpirationTime().getTime());
this.accessToken =
instantiateAccessToken(token.getTokenValue(), token.getExpirationTime().getTime());
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/
package com.google.cloud.hive.bigquery.connector.acceptance;

import org.apache.parquet.Strings;
import com.google.common.base.Strings;

public class AcceptanceTestConstants {

Expand All @@ -38,23 +38,5 @@ public class AcceptanceTestConstants {
public static final String CONNECTOR_JAR_DIRECTORY = "target";
public static final String CONNECTOR_JAR_PREFIX = "hive-3-bigquery-connector";
public static final String CONNECTOR_INIT_ACTION_PATH = "/acceptance/connectors.sh";

public static final String MIN_BIG_NUMERIC =
"-578960446186580977117854925043439539266.34992332820282019728792003956564819968";
public static final String MAX_BIG_NUMERIC =
"578960446186580977117854925043439539266.34992332820282019728792003956564819967";
public static final String BIGNUMERIC_TABLE_QUERY_TEMPLATE =
"create table %s.%s (\n"
+ " min bignumeric,\n"
+ " max bignumeric\n"
+ " ) \n"
+ " as \n"
+ " select \n"
+ " cast(\""
+ MIN_BIG_NUMERIC
+ "\" as bignumeric) as min,\n"
+ " cast(\""
+ MAX_BIG_NUMERIC
+ "\" as bignumeric) as max";
protected static final long ACCEPTANCE_TEST_TIMEOUT_IN_SECONDS = 600;
}
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ private static Cluster createClusterSpec(
GceClusterConfig.newBuilder()
.setNetworkUri("default")
.setZoneUri(REGION + "-a")
.setInternalIpOnly(false)
.putMetadata("hive-bigquery-connector-url", connectorJarUri))
.setMasterConfig(
InstanceGroupConfig.newBuilder()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
* Copyright 2023 Google Inc. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.hive.bigquery.connector.acceptance;

import org.junit.AfterClass;
import org.junit.BeforeClass;

public class DataprocImage22AcceptanceTest extends DataprocAcceptanceTestBase {

private static AcceptanceTestContext context;

@BeforeClass
public static void setup() throws Exception {
context = DataprocAcceptanceTestBase.setup("2.2-debian12");
}

public DataprocImage22AcceptanceTest() {
super(context);
}

@AfterClass
public static void tearDown() throws Exception {
DataprocAcceptanceTestBase.tearDown(context);
}
}
7 changes: 7 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,13 @@
<module>hive-3-bigquery-connector</module>
</modules>
</profile>
<profile>
<id>dataproc22</id>
<modules>
<module>shaded-deps-dataproc22</module>
<module>hive-3-bigquery-connector</module>
</modules>
</profile>

</profiles>
</project>
2 changes: 1 addition & 1 deletion shaded-deps-dataproc20/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

<properties>
<hive.version>3.1.3</hive.version>
<hadoop.version>3.2.3</hadoop.version>
<hadoop.version>3.2.4</hadoop.version>
<tez.version>0.9.2</tez.version>
<pig.version>0.17.0</pig.version>
</properties>
Expand Down
36 changes: 36 additions & 0 deletions shaded-deps-dataproc22/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.google.cloud.hive</groupId>
<artifactId>shaded-deps-parent</artifactId>
<version>${revision}</version>
<relativePath>../shaded-deps-parent</relativePath>
</parent>

<artifactId>shaded-deps-dataproc20</artifactId>
<name>Shaded dependencies for Dataproc 2.0</name>

<properties>
<hive.version>3.1.3</hive.version>
<hadoop.version>3.3.6</hadoop.version>
<tez.version>0.10.2</tez.version>
<pig.version>0.17.0</pig.version>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
</plugin>
</plugins>
</build>

</project>

0 comments on commit 3236f3b

Please sign in to comment.