Automate testing for GPT with mock implementation #21

ByteYJ · 2024-08-06T19:31:09Z

This updates automating testing for ocr functionality and mock api call, including tests for:

Part 1: Testing llm_ocr_function
test_parse_table_data : ensures correct extraction and conversion of table data into DataFrames.
test_rescale_image : verifies the image resizing functionality for both largest and smallest dimensions.
test_encode_image : checks the base64 encoding and decoding of images.
test_correct_image_orientation : validates the correction of image orientation based on EXIF data.

Part 2: Testing OpenAI API call
test_query_api : verifies successful API responses using mocked responses.
test_query_api_authentication_error : tests handling of authentication errors using mocked error responses.

(These tests do not include extract_text_from_image and related functions, which requires a real API key and focuses more on OpenAI's effectiveness testing.)

ginic

@ByteYJ , I think the OpenAI mocks are off to a good start, but we should actually be testing our own code somehow with this. For example, we should add a test for key functions in src/msfocr/llm/ocr_functions.py, like get_results or parse_table_data. Without checking those functions, it's kind of just an example on how the OpenAI API works, which isn't really the purpose of a unit test. Remember that our goal is to test the code that we're writing and make sure they work correctly in normal situations (OpenAI returns a correct json response) and edge cases (OpenAI returns an error or malformed json). The thing you want to mock is the request to the OpenAI client and force it to return the response you need for the test.

Could you try to add a test for one of those functions that uses your mocked classes in one of those situations?

Some other things you might want to watch out for:

You're installing the pytest_mock dependency, but it's not used in the tests right now. Instead you're using the built-in unittest library, which is an alternative to the pytest library we're using for our tests. I think this is okay and they should work together, but I haven't tried it before, although there is a reference on https://docs.pytest.org/en/7.1.x/how-to/unittest.html.
While having logging is a good idea later on, we don't need logging in the unit test files, that can be removed.

ByteYJ · 2024-08-07T19:02:13Z

@ginic thanks for the suggestions! I added more tests to ocr_functions. now it includes six test cases.

ginic · 2024-08-07T21:39:56Z

src/msfocr/llm/ocr_functions.py

+    with Image.open(image_path) as image:
+        image.load()
+
+    orientation = None
+    try:
+        for orientation in ExifTags.TAGS.keys():
+            if ExifTags.TAGS[orientation] == 'Orientation':
+                break
+        exif = dict(image.getexif().items())
+        if exif.get(orientation) == 3:
+            image = image.rotate(180, expand=True)
+        elif exif.get(orientation) == 6:
+            image = image.rotate(270, expand=True)
+        elif exif.get(orientation) == 8:
+            image = image.rotate(90, expand=True)
+    except (AttributeError, KeyError, IndexError):
+        pass
+    return image


Just a heads up, I was wrong about the correct way of handling the image earlier. I changed it again and I think this is correct now. You can read more at https://pillow.readthedocs.io/en/latest/reference/open_files.html#file-handling

ginic

Thanks, @ByteYJ ! This look great! I'm really glad we have tests for these things now.
If you wanted to take things one step further, you could use your OpenAI mocker to test the extract_text_from_image function, but I think that's going above and beyond, so only do that if you feel like you have time.

ByteYJ added 4 commits August 6, 2024 13:48

Automate testing for llm ocr functions.

074318f

update test script

5d25554

add docstring

4284bb1

bug fix

af6a7ba

ginic requested changes Aug 6, 2024

View reviewed changes

ByteYJ added 4 commits August 7, 2024 11:00

add tests for ocr functions

efe8c55

removed unused packages

60bdc12

removed unused import

c0830db

Fix image open issue and remove unused library.

5f1155b

ByteYJ and others added 2 commits August 7, 2024 15:17

fix the correct_image_orientation function

b17f98b

Fixed image loading

035a0e1

ginic reviewed Aug 7, 2024

View reviewed changes

ginic approved these changes Aug 7, 2024

View reviewed changes

ByteYJ merged commit 34232bb into main Aug 13, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate testing for GPT with mock implementation #21

Automate testing for GPT with mock implementation #21

ByteYJ commented Aug 6, 2024 •

edited

Loading

ginic left a comment

ByteYJ commented Aug 7, 2024

ginic Aug 7, 2024

ginic left a comment

Automate testing for GPT with mock implementation #21

Automate testing for GPT with mock implementation #21

Conversation

ByteYJ commented Aug 6, 2024 • edited Loading

ginic left a comment

Choose a reason for hiding this comment

ByteYJ commented Aug 7, 2024

ginic Aug 7, 2024

Choose a reason for hiding this comment

ginic left a comment

Choose a reason for hiding this comment

ByteYJ commented Aug 6, 2024 •

edited

Loading