Skip to content

Commit

Permalink
Merging develop (v1.2) to master (#9)
Browse files Browse the repository at this point in the history
* Updated README.md

* Base extractor - Initial cut (#1)

* Initial commit of base for extractors

* Fixed dockerfile

* Added README

* Changed extractor to transformer in readme

* Updated README

* Fleshing out functionality

* Updates for file list support

* Variable name clarification

* Bug fixes

* Optimizing Dockerfile

* Fixed typo

* Merging into Develop (#2)

* Initial commit of base for extractors

* Fixed dockerfile

* Added README

* Changed extractor to transformer in readme

* Updated README

* Fleshing out functionality

* Updates for file list support

* Variable name clarification

* Bug fixes

* Optimizing Dockerfile

* Fixed typo

* Debugging

* Removed TERRA REF left-over commented out code

* Adding command line parameter storage

* Added text abour return codes

* Fix up transformer declared types (#4)

* Changed list to tuple

* type declaration correction and additional commenting

* Pylint changes

* Test development (#7)

* Adding test files

* Was missing the 'test_transformer.py' file and made some changes to 'test_transformer_class.py'

* adding the initial travis yml

* Filled out the yml file

* Changed the tests from base unittest to pytest

* Changed to pytest over basic unittest

* Moved tests to own folder

* Removed duplicate files

* Moved the yml file out of base-image

* modified travis yml to move directories for testing

* Added -v to pytest run for more clarity

* Ignoring a test for testing purposes

* Reverted git ignore changes and temporarily removed a test

* Placed test_entrypoint.py back into folder

* Pushing from new local

* Changed file naming conventions to be more conformative

* removing duplicates

* Fixed a test

* Added pylint installation to travis yml

* Added test doc as a link in main readme.

* fixed a formatting issue

* Initial test folder readme draft

* Clarified testing process

* added links to helpful documentation

* Clarified type of testing being done

* Adding pylint file

* renamed pylintrc file

* changes to pylint file

* fixed pylint call

* trying to include all files from test folder

* moved pylint file

* testing pylint

* Fixed yml file

* fixing yml

* I think I figured out the problem

* I'll seperate them and make to calls

* Made seperate calls

* misspelled directory name

* made some pylint recommended adjustments

* now pulling pylintrc from repo

* adding organization repo

* Hopefully this works

* Updated readme

*  Added some new lines

* Trying again

* hopefully this is it

* please

* done

* Minor readme changes

* Added pylint protocol link in readmes

* Pylint changes

* made some pylint recomended changes

* Helping pylint along

* Directory issues

* Resetting python path

* adding another line

* adding another line

* adding another line

* added some ignore lines

* adding another line

* Moved files around

* Finding pylint rc

* Missed a quotation mark

* fixed pylint call

* pylint diables

* Update base-image/test-files/README.md

Co-Authored-By: Chris Schnaufer <[email protected]>

* Update base-image/test-files/README.md

Co-Authored-By: Chris Schnaufer <[email protected]>

* Update base-image/test-files/README.md

Co-Authored-By: Chris Schnaufer <[email protected]>

* Update base-image/test-files/README.md

Co-Authored-By: Chris Schnaufer <[email protected]>

* Update base-image/test-files/README.md

Co-Authored-By: Chris Schnaufer <[email protected]>

* Incorporated Chris' suggestions

* Fixed a mistake

* Made changes recommended by Chris

* missed a change

* Loading multiple metadata files & pylint changes (#6)

* Update .gitignore, allowing multiple --metadata arguments, spelling & pylint changes

* Logging what metadatafiles are loading

* Added ability to download files after transformer gives the go-ahead (#8)

* Update .gitignore, allowing multiple --metadata arguments, spelling & pylint changes

* Logging what metadatafiles are loading

* Adding call to download files

Co-authored-by: Jorge Barrios <[email protected]>
  • Loading branch information
Chris-Schnaufer and Sithyphus authored Jan 22, 2020
1 parent 7593f1e commit 23045ba
Show file tree
Hide file tree
Showing 11 changed files with 414 additions and 46 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,6 @@ venv.bak/

# mypy
.mypy_cache/

# pycharm
.idea
20 changes: 20 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
language: python

python:
- "3.7"

cache: pip

install:
- "pip install pytest"
- "pip install pylint"

script:
- "git clone https://github.com/AgPipeline/Organization-info.git"
- mv base-image/test-files/*.py base-image
- cd base-image
- rm -r test-files
- cd ..
- pylint --rcfile=Organization-info/pylint.rc base-image/*.py
- "python -m pytest -v"

3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@ Be sure to read the [organization documentation](https://github.com/AgPipeline/O
Every folder in this repo must have a README.md clearly explaining the interface for derived images, how to create a derived image, and other information on how to use the images created.
Providing a quick start guide with links to more detailed information a good approach for some situations.
The goal is to provide documentation for users of these base images that makes it easy for them to be used.

## Testing
The testing modules and readme may be found in the [testing folder](https://github.com/AgPipeline/base-docker-support/tree/test-development/base-image/test-files).
15 changes: 11 additions & 4 deletions base-image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,17 @@ If the transformer.py file has a function named `check_continue` it will be call
The return from the check_continue() function is used to determine if processing should continue.
If the function is not defined, processing will continue automatically.

5. Processsing:
The `perform_process` function in transformer.py is called getting passed the transformer_class.Transformer instance and any parameters previously defined.
5. Retrieve Files:
If the transformer_class.Transformer instance has a method named `retrieve_files` it will be called getting passed the dictionary returned by `transformer_class.Transformer.get_transformer_params()` (see step 3.) and the loaded metadata.
This allows the downloading of data when the transformer has determined it can proceed (see step 4.).
If this function is not defined, processing will continue automatically.

6. Result Handling:
6. Processsing:
The `perform_process` function in transformer.py is called getting passed the transformer_class.Transformer instance and any parameters previously defined (see step 3.).
This performs the processing of the data.
It's important to note that the dictionary returned in step 3 is used to populate the parameter list of the `perform_process` call.

7. Result Handling:
The result of the above steps may produce warnings, errors, or successful results.
These results can be stored in a file, printed to standard output, and/or returned to the caller of `do_work`.
In the default case that we're exploring here, the return value from do_work is ignored.
Expand All @@ -102,7 +109,7 @@ The following command line parameters are defined for all transformers.
* -h: (optional parameter) display help message (automatically defined by argparse)
* --info, -i: (optional parameter) enable info level logging messages
* --result: (optional parameter) how to handle the result of processing; one or more comma-separated strings of: all, file, print
* --metadata: mandatory path to file containing JSON metadata
* --metadata: mandatory path to file containing JSON metadata; can be specified multiple times
* --working_space: path to folder to use as a working space and file store
* the "file_list" argument contains all additional parameters (which are assumed to be file names but may not be)

Expand Down
169 changes: 132 additions & 37 deletions base-image/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import os
import json
import logging
from typing import Optional

import transformer_class

Expand All @@ -22,7 +23,7 @@ def __init__(self):

@staticmethod
def handle_error(code: int, message: str) -> dict:
"""Handles logging and return values when an error ocurrs. Implies processing
"""Handles logging and return values when an error occurs. Implies processing
will stop.
Arguments:
code: return code related to the error
Expand All @@ -35,14 +36,12 @@ def handle_error(code: int, message: str) -> dict:
code = -1
if not message:
logging.warning("An error has occurred without a message, setting default message")
message = "An error has ocurred with error code (%s)" % str(code)
message = "An error has occurred with error code (%s)" % str(code)

logging.error(message)
logging.error("Stopping processing")

result = {}
result['error'] = message
result['code'] = code
result = {'error': message, 'code': code}

return result

Expand All @@ -52,13 +51,13 @@ def load_metadata(metadata_path: str) -> dict:
Arguments:
metadata_path: path to the metadata file
Return:
Returns a dict containing the loaded metadata. If an error ocurrs, the dict
Returns a dict containing the loaded metadata. If an error occurs, the dict
won't contain metadata but will contain an error message under an 'error' key.
"""
try:
with open(metadata_path, 'r') as in_file:
md_load = json.load(in_file)
if not md_load is None:
if md_load is not None:
md_return = {'metadata': md_load}
else:
msg = 'Invalid JSON specified in metadata file "%s"' % metadata_path
Expand All @@ -72,6 +71,83 @@ def load_metadata(metadata_path: str) -> dict:

return md_return

@staticmethod
def check_params_result_error(transformer_params: dict) -> Optional[dict]:
"""Checks the transformer parameter results for an error
Arguments:
transformer_params: the dictionary to check for errors
Return:
An error dict if errors were found and None if not
Notes:
See handle_error() function
"""
if 'code' in transformer_params:
if 'error' in transformer_params:
error = transformer_params['error']
else:
error = "Error returned from get_transformer_params with code: %s" % transformer_params['code']
return __internal__.handle_error(-104, error)

return None

@staticmethod
def check_retrieve_results_error(transformer_retrieve: tuple) -> Optional[dict]:
"""Checks the results of the transformer_class retrieving files
Arguments:
transformer_retrieve: the results of the retrieve
Return:
An error dict if errors were found and None if not
Notes:
See handle_error() function
"""
if not transformer_retrieve:
return None

code = 0
message = None
retrieve_len = len(transformer_retrieve)
if retrieve_len > 0:
code = transformer_retrieve[0]
if retrieve_len > 1:
message = transformer_retrieve[1]
else:
message = "Retrieving files returned a code of %s" % str(code)

# Check for an error code
if code < 0:
return __internal__.handle_error(code, message)

# Log the message if we get one returned to us
if retrieve_len > 1:
logging.info(transformer_retrieve[1])

return None

@staticmethod
def load_metadata_files(metadata_files: list) -> dict:
"""Loads the specified metadata files
Arguments:
metadata_files: list of metadata files to load
Returns:
Returns a dict containing the loaded metadata as an array. If an error occurs, the dict
won't contain metadata but will contain an error message under an 'error' key.
"""
metadata = []
result = {'metadata': metadata}
for metadata_file in metadata_files:
if not os.path.exists(metadata_file):
result = __internal__.handle_error(-2, "Unable to access metadata file '%s'" % metadata_file)
break
logging.info("Loading metadata from file: '%s'", metadata_file)
md_loaded = __internal__.load_metadata(metadata_file)
if 'metadata' in md_loaded:
metadata.append(md_loaded['metadata'])
else:
result = __internal__.handle_error(-3, md_loaded['error'])
break

return result

@staticmethod
def parse_continue_result(result) -> tuple:
"""Parses the result of calling transformer.check_continue and returns
Expand Down Expand Up @@ -129,7 +205,31 @@ def handle_check_continue(transformer_instance: transformer_class.Transformer, t
return result

@staticmethod
def perform_processing(transformer_instance, args: argparse.ArgumentParser, metadata: dict) -> dict:
def handle_retrieve_files(transformer_instance: transformer_class.Transformer, args: argparse.Namespace, metadata: dict) ->\
Optional[dict]:
"""Handles calling the transformer class to retrieve files
Arguments:
transformer_instance: the current transformer environment
args: the command line arguments
metadata: the loaded metadata
Return:
A dict containing error information if a problem occurs and None if no problems are found.
Note:
A side effect of this function is a information message logged if the transformer class instance does not have
a 'retrieve_files' function declared.
"""
if hasattr(transformer_instance, 'retrieve_files'):
transformer_retrieve = transformer_instance.retrieve_files(args, metadata)
retrieve_results = __internal__.check_retrieve_results_error(transformer_retrieve)
if retrieve_results:
return retrieve_results
else:
logging.info("Transformer class doesn't have function named 'retrieve_files'")

return None

@staticmethod
def perform_processing(transformer_instance: transformer_class.Transformer, args: argparse.Namespace, metadata: dict) -> dict:
"""Makes the calls to perform the processing
Arguments:
transformer_instance: instance of transformer class
Expand All @@ -144,16 +244,14 @@ def perform_processing(transformer_instance, args: argparse.ArgumentParser, meta
if hasattr(transformer_instance, 'get_transformer_params'):
transformer_params = transformer_instance.get_transformer_params(args, metadata)
if not isinstance(transformer_params, dict):
return __internal__.handle_error(-101, \
"Invalid return from getting transformer parameters from transformer class instance")
if 'code' in transformer_params:
if 'error' in transformer_params:
error = transformer_params['error']
else:
error = "Error returned from get_transformer_params with code: %s" % transformer_params['code']
return __internal__.handle_error(-101, error)
return __internal__.handle_error(-101,
"Invalid return from getting transformer parameters from transformer class instance")

params_result = __internal__.check_params_result_error(transformer_params)
if params_result:
return params_result
else:
logging.debug("Transformer class instance does not have get_transformer_params method")
logging.info("Transformer class instance does not have get_transformer_params method")
transformer_params = {}

# First check if the transformer thinks everything is in place
Expand All @@ -162,31 +260,30 @@ def perform_processing(transformer_instance, args: argparse.ArgumentParser, meta
if 'code' in result and result['code'] < 0 and 'error' not in result:
result['error'] = "Unknown error returned from check_continue call"
else:
logging.debug("transformer module doesn't have a function named 'check_continue'")
logging.info("Transformer module doesn't have a function named 'check_continue'")

# Retrieve additional files if indicated by return code from the check
if not 'error' in result and 'code' in result and result['code'] == 0:
# TODO: Fetch additional data for processing
pass
if 'error' not in result and 'code' in result and result['code'] == 0:
result = __internal__.handle_retrieve_files(transformer_instance, args, metadata)

# Next make the call to perform the processing
if not 'error' in result:
if 'error' not in result:
if hasattr(transformer, 'perform_process'):
result = transformer.perform_process(transformer_instance, **transformer_params)
else:
logging.debug("transformer module is missing function named 'perform_process'")
logging.debug("Transformer module is missing function named 'perform_process'")
return __internal__.handle_error(-102, "Transformer perform_process interface is not available " +
"for processing data")

return result

@staticmethod
def handle_result(result_types: str, result_file_path: str, result: dict) -> None:
def handle_result(result: dict, result_types: str = None, result_file_path: str = None) -> dict:
"""Handles the results of processing as dictated by the arguments passed in.
Arguments:
result_types: comma separated string containing one or more of: all, file, print
result_file_path: location to place result file
result: the dictionary of result information
result_types: optional, comma separated string containing one or more of: all, file, print
result_file_path: optional, location to place result file
Return:
Returns the result parameter
Notes:
Expand Down Expand Up @@ -224,7 +321,7 @@ def add_parameters(parser: argparse.ArgumentParser, transformer_instance) -> Non
parser.add_argument('--result', nargs='?', default='all',
help='Direct the result of a run to one or more of (all is default): "all,file,print"')

parser.add_argument('--metadata', type=str, help='The path to the source metadata')
parser.add_argument('--metadata', type=str, action='append', help='The path to the source metadata')

parser.add_argument('--working_space', type=str, help='the folder to use use as a workspace and ' +
'for storing results')
Expand All @@ -240,7 +337,7 @@ def add_parameters(parser: argparse.ArgumentParser, transformer_instance) -> Non
# Assume the rest of the arguments are the files
parser.add_argument('file_list', nargs=argparse.REMAINDER, help='additional files for transformer')

def do_work(parser: argparse.ArgumentParser, **kwargs) -> None:
def do_work(parser: argparse.ArgumentParser, **kwargs) -> dict:
"""Function to prepare and execute work unit
Arguments:
parser: an instance of argparse.ArgumentParser
Expand All @@ -252,7 +349,7 @@ def do_work(parser: argparse.ArgumentParser, **kwargs) -> None:
transformer_instance = transformer_class.Transformer(**kwargs)
if not transformer_instance:
result = __internal__.handle_error(-100, "Unable to create transformer class instance for processing")
return __internal__.handle_result(None, None, result)
return __internal__.handle_result(result, None, None)

add_parameters(parser, transformer_instance)
args = parser.parse_args()
Expand All @@ -262,22 +359,20 @@ def do_work(parser: argparse.ArgumentParser, **kwargs) -> None:

# Check that we have mandatory metadata
if not args.metadata:
result = __internal__.handle_error(-1, "No metadata path was specified.")
elif not os.path.exists(args.metadata):
result = __internal__.handle_error(-2, "Unable to access metadata file '%s'" % args.metadata)
result = __internal__.handle_error(-1, "No metadata paths were specified.")
else:
md_result = __internal__.load_metadata(args.metadata)
if 'metadata' in md_result:
result = __internal__.perform_processing(transformer_instance, args, md_result['metadata'])
md_results = __internal__.load_metadata_files(args.metadata)
if 'metadata' in md_results:
result = __internal__.perform_processing(transformer_instance, args, md_results['metadata'])
else:
result = __internal__.handle_error(-3, md_result['error'])
result = __internal__.handle_error(-3, md_results['error'])

if args.working_space:
result_path = os.path.join(args.working_space, 'result.json')
else:
result_path = None

__internal__.handle_result(args.result, result_path, result)
__internal__.handle_result(result, args.result, result_path)
return result

if __name__ == "__main__":
Expand Down
38 changes: 38 additions & 0 deletions base-image/test-files/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Transformer Unit Tests
This folder holds the unit tests [TravisCI](https://travis-ci.org/) will run in it's build.
These are basic functional unit tests, checks for output formatting and typing. All tests are written in and utilizing [pytest](https://docs.pytest.org/en/latest/).
In addition to this, [pylint](https://www.pylint.org/) will also be deployed.
Look into our organization's repository for our [pylint protocols](https://github.com/AgPipeline/Organization-info)

### Running the tests
Upon submitting a pull request Travis will build and run the testing modules automatically and will return a report with all passing or failing code.

### Running the tests before submitting a pull request
Should you wish to test your code before submitting a pull request follow these steps:
1) Clone, pull, copy or otherwise aquire the pylintrc file located at this [repo](https://github.com/AgPipeline/Organization-info)

2) From the command line run the following commands (from base-docker-support as current directory)
```sh
pylint --rcfile=<path-to-pylint.rc> base-image/*py
pylint --rcfile=<path-to-pylint.rc> base-image/**/*py
```
3) Once the previous commands have executed there will be a list of changes that should be made to bring any code up to standard
4) From the command line run the following command while the current working directory is base-image
```sh
python -m pytest -v
```
or
```sh
python3 -m pytest -v
```


### Requirements

There are no additional requirements or dependancies if not running these tests locally, if however these are to be run before deploying travis the following are required.

python 3 \
pylint \
pytest

All three may be installed using pip, conda, or another preferred method.
Loading

0 comments on commit 23045ba

Please sign in to comment.