Skip to content

Commit

Permalink
Merge pull request #1895 from cloudsufi/e2e_distinct
Browse files Browse the repository at this point in the history
e2e-distinct additional tests
  • Loading branch information
psainics authored Dec 3, 2024
2 parents f3ffe2f + c6f5fe4 commit 7793ef2
Show file tree
Hide file tree
Showing 6 changed files with 253 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,141 @@ Feature: Distinct analytics - Verify File data transfer scenarios using Distinct
Then Close the pipeline logs
Then Validate OUT record count of distinct is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "distinctMacroOutputFile"

@GCS_DISTINCT_TEST1 @FILE_SINK_TEST
Scenario: To verify data is getting transferred from File source to File sink with number of partitions set as macro arguments
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Distinct" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Distinct" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Distinct" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDistinctTest1"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctCsvAllDataTypeFileSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Distinct"
Then Enter the Distinct plugin fields as list "distinctValidSingleFieldName"
Then Click on the Macro button of Property: "numberOfPartitions" and set the value to: "distinctValidPartitions"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctOutputFileSchema"
Then Validate "Distinct" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "tsv"
Then Validate "File2" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "distinctValidPartitions" for key "distinctValidPartitions"
Then Run the preview of pipeline with runtime arguments
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "distinctValidPartitions" for key "distinctValidPartitions"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Close the pipeline logs
Then Validate OUT record count of distinct is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "distinctDatatypeOutputFile"

@GCS_DISTINCT_TEST2 @FILE_SINK_TEST
Scenario: To verify pipeline is failed when Field set as macro arguments with invalid value
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Distinct" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Distinct" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Distinct" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDistinctTest2"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctCsvFileSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Distinct"
Then Click on the Macro button of Property: "fields" and set the value to: "DistinctFieldName"
Then Validate "Distinct" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "tsv"
Then Validate "File2" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "distinctInvalidFields" for key "DistinctFieldName"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Verify the pipeline status is "Failed"
Then Open Pipeline logs and verify Log entries having below listed Level and Message:
| Level | Message |
| ERROR | errorLogsMessageDistinctInvalidFields |

@GCS_DISTINCT_TEST1 @FILE_SINK_TEST
Scenario: To verify pipeline is failed when number of partitions set as macro arguments with invalid value
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Distinct" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Distinct" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Distinct" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDistinctTest1"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctCsvAllDataTypeFileSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Distinct"
Then Enter the Distinct plugin fields as list "distinctValidSingleFieldName"
Then Click on the Macro button of Property: "numberOfPartitions" and set the value to: "distinctInvalidPartitions"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctOutputFileSchema"
Then Validate "Distinct" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "tsv"
Then Validate "File2" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "distinctInvalidPartitions" for key "distinctInvalidPartitions"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Verify the pipeline status is "Failed"
Then Open Pipeline logs and verify Log entries having below listed Level and Message:
| Level | Message |
| ERROR | errorLogsMessageDistinctInvalidNumberOfPartitions |
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,98 @@ Feature: Distinct Analytics - Verify File source data transfer using Distinct an
Then Close the pipeline logs
Then Validate OUT record count of distinct is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "distinctCsvOutputFile"

@GCS_DISTINCT_TEST2 @FILE_SINK_TEST
Scenario: To verify distinct records is getting transferred from File source to File sink plugin successfully using distinct plugin without any field names given
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Distinct" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Distinct" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Distinct" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDistinctTest2"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctCsvFileSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Distinct"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctCsvFileSchema"
Then Validate "Distinct" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Validate "File2" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Preview and run the pipeline
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Close the pipeline logs
Then Validate OUT record count of distinct is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "distinctOutputFile"

@GCS_DISTINCT_TEST1 @FILE_SINK_TEST @Distinct_Required
Scenario: To verify data is getting transferred from File source to File sink plugin successfully with field names having unique records
Given Open Datafusion Project to configure pipeline
When Select plugin: "File" from the plugins list as: "Source"
When Expand Plugin group in the LHS plugins list: "Analytics"
When Select plugin: "Distinct" from the plugins list as: "Analytics"
Then Connect plugins: "File" and "Distinct" to establish connection
When Expand Plugin group in the LHS plugins list: "Sink"
When Select plugin: "File" from the plugins list as: "Sink"
Then Connect plugins: "Distinct" and "File2" to establish connection
Then Navigate to the properties page of plugin: "File"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "gcsDistinctTest1"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Click plugin property: "skipHeader"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctCsvAllDataTypeFileSchema"
Then Validate "File" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "Distinct"
Then Enter the Distinct plugin fields as list "distinctFieldsWithUniqueRecords"
Then Click on the Get Schema button
Then Verify the Output Schema matches the Expected Schema: "distinctFieldsWithUniqueRecordsOutputSchema"
Then Validate "Distinct" plugin properties
Then Close the Plugin Properties page
Then Navigate to the properties page of plugin: "File2"
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
Then Select dropdown plugin property: "format" with option value: "csv"
Then Validate "File2" plugin properties
Then Close the Plugin Properties page
Then Save the pipeline
Then Preview and run the pipeline
Then Wait till pipeline preview is in running state
Then Open and capture pipeline preview logs
Then Verify the preview run status of pipeline in the logs is "succeeded"
Then Close the pipeline logs
Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Close the pipeline logs
Then Validate OUT record count of distinct is equal to IN record count of sink
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "distinctFieldsWithUniqueRecordsOutputFile"
5 changes: 5 additions & 0 deletions core-plugins/src/e2e-test/resources/errorMessage.properties
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,8 @@ errorMessageJoinerAdvancedJoinCondition=A join condition must be specified.
errorMessageJoinerInputLoadMemory=Advanced outer joins must specify an input to load in memory.
errorMessageJoinerAdvancedJoinConditionType=Advanced join conditions can only be used when there are two inputs.
errorMessageDeduplicateInvalidFieldName=Invalid filter MAX(abcd): Field 'abcd' does not exist in input schema
errorLogsMessageDistinctInvalidFields=Spark program 'phase-1' failed with error: Errors were encountered during validation.\
\ Field $^&* does not exist in input schema.. Please check the system logs for more details.
errorLogsMessageDistinctInvalidNumberOfPartitions=Spark program 'phase-1' failed with error: Unable to create config \
for batchaggregator Distinct 'numPartitions' is invalid: Value of field class io.cdap.plugin.\
batch.aggregator.AggregatorConfig.numPartitions is expected to be a number.. Please check the system logs for more details.
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,11 @@ distinctCsvAllDataTypeFileSchema=[{"key":"id","value":"int"},{"key":"name","valu
distinctDatatypeOutputFile=e2e-tests/expected_outputs/CSV_DISTINCT_TEST1_Output.csv
distinctCsvOutputFile=e2e-tests/expected_outputs/CSV_DISTINCT_TEST2_Output.csv
distinctMacroOutputFile=e2e-tests/expected_outputs/CSV_DISTINCT_TEST3_Output.csv
distinctOutputFile=e2e-tests/expected_outputs/CSV_DISTINCT_Output
distinctFieldsWithUniqueRecords=id, name, yearofbirth
distinctFieldsWithUniqueRecordsOutputSchema=[{"key":"id","value":"int"},{"key":"name","value":"string"},\
{"key":"yearofbirth","value":"int"}]
distinctFieldsWithUniqueRecordsOutputFile:e2e-tests/expected_outputs/CSV_FIELDWITHUNIQUERECORDS_OUTPUT.csv
## DISTINCT-PLUGIN-PROPERTIES-END

## Deduplicate-PLUGIN-PROPERTIES-START
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
bob,coffee,buy,2019-03-11 04:50:01 UTC
bob,coffee,drink,2019-03-12 04:50:01 UTC
bob,donut,eat,2019-03-08 04:50:01 UTC
bob,donut,buy,2019-03-10 04:50:01 UTC
bob,donut,buy,2019-03-11 04:50:01 UTC
bob,donut,eat,2019-03-09 04:50:01 UTC
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
4,galilée,1564
3,marie curie,1867
2,isaac newton,1643
1,albert einstein,1879

0 comments on commit 7793ef2

Please sign in to comment.