-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CEA4 Format Helper - Phase 1 #3753
Conversation
DB to be included in future PR
Warning Rate limit exceeded@ShiZhongming has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 23 minutes and 4 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughThis pull request introduces a comprehensive data migration and verification system for transitioning from CEA-3 to CEA-4 format. The changes include updates to several Python scripts in the Changes
Possibly related PRs
Suggested labels
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Nitpick comments (4)
cea/datamanagement/format_helper/cea4_verify.py (1)
203-205
: Consider handling the case when verification passesCurrently, when all inputs are verified and compatible with CEA-4, the function
cea4_verify
passes without any confirmation to the user.Consider adding a print statement to inform the user that the verification passed successfully.
if all(not value for value in dict_missing.values()): + print('All inputs are already compatible with CEA-4 format. No migration needed.') pass
🧰 Tools
🪛 Ruff (0.8.2)
204-204: Local variable
list_missing_files_shp_building_geometry
is assigned to but never usedRemove assignment to unused variable
list_missing_files_shp_building_geometry
(F841)
205-205: Local variable
list_missing_files_csv_building_properties
is assigned to but never usedRemove assignment to unused variable
list_missing_files_csv_building_properties
(F841)
cea/datamanagement/format_helper/cea4_migrate.py (1)
375-375
: Fix typo in docstring ofmain
functionThe word "Mirgate" is misspelled in the docstring at the top of the file.
Apply this diff to correct the typo:
- Mirgate the format of the input data to CEA-4 format after verification. + Migrate the format of the input data to CEA-4 format after verification.cea/datamanagement/format_helper/format_helper.py (1)
87-87
: Adjust divider length calculationIn the
main
function, thediv_len
variable might result in negative values if thescenario
name is long, causing formatting issues.Add a condition to ensure
div_len
is not negative.div_len = 47 - len(scenario) + if div_len < 0: + div_len = 0cea/scripts.yml (1)
368-368
: Fix formatting in parameters declaration.Add space after comma in
'general:scenario',format-helper
for consistency with YAML formatting standards.- parameters: ['general:scenario',format-helper] + parameters: ['general:scenario', format-helper]🧰 Tools
🪛 yamllint (1.35.1)
[warning] 368-368: too few spaces after comma
(commas)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
cea/datamanagement/format_helper/cea4_migrate.py
(1 hunks)cea/datamanagement/format_helper/cea4_verify.py
(1 hunks)cea/datamanagement/format_helper/format_helper.py
(1 hunks)cea/default.config
(2 hunks)cea/scripts.yml
(1 hunks)cea/utilities/batch_process_workflow.py
(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cea/datamanagement/format_helper/format_helper.py
8-8: subprocess
imported but unused
Remove unused import: subprocess
(F401)
10-10: Redefinition of unused cea
from line 6
Remove definition: cea
(F811)
cea/datamanagement/format_helper/cea4_verify.py
204-204: Local variable list_missing_files_shp_building_geometry
is assigned to but never used
Remove assignment to unused variable list_missing_files_shp_building_geometry
(F841)
205-205: Local variable list_missing_files_csv_building_properties
is assigned to but never used
Remove assignment to unused variable list_missing_files_csv_building_properties
(F841)
228-228: Local variable scenario_name
is assigned to but never used
Remove assignment to unused variable scenario_name
(F841)
cea/datamanagement/format_helper/cea4_migrate.py
12-12: sys
imported but unused
Remove unused import: sys
(F401)
24-24: cea.datamanagement.format_helper.cea4_verify.CSV_BUILDING_PROPERTIES_4
imported but unused
Remove unused import: cea.datamanagement.format_helper.cea4_verify.CSV_BUILDING_PROPERTIES_4
(F401)
🪛 GitHub Actions: Ruff
cea/datamanagement/format_helper/format_helper.py
[error] 8-8: Unused import: subprocess
is imported but never used
[error] 10-10: Redefinition of unused cea
from line 6
cea/datamanagement/format_helper/cea4_migrate.py
[error] 12-12: Unused import: sys
is imported but never used
[error] 24-24: Unused import: CSV_BUILDING_PROPERTIES_4
is imported but never used
🪛 yamllint (1.35.1)
cea/scripts.yml
[warning] 368-368: too few spaces after comma
(commas)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: windows-latest
- GitHub Check: macos-latest
- GitHub Check: ubuntu-latest
🔇 Additional comments (8)
cea/datamanagement/format_helper/cea4_verify.py (1)
Line range hint
406-408
: Correct time formatting in print statementThe time elapsed is being formatted incorrectly in the print statement. The
%d.2
format specifier is not appropriate for floating-point numbers.Apply this diff to correct the format specifier:
- print('The entire process of CEA-4 format verification is now completed - time elapsed: %d.2 seconds' % time_elapsed) + print('The entire process of CEA-4 format verification is now completed - time elapsed: %.2f seconds' % time_elapsed)Ensure that the time is displayed correctly with two decimal places.
✅ Verification successful
Incorrect format specifier confirmed
The review comment is correct. The code uses
time.perf_counter()
which returns a floating-point number, but attempts to format it with%d.2
which incorrectly tries to format it as an integer. The suggested fix to use%.2f
is the correct solution.🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash # Check if file exists and show relevant lines if [ -f "cea/datamanagement/format_helper/cea4_verify.py" ]; then echo "File exists. Showing content:" # Show lines around 406-408 with context rg -A 2 -B 2 "time_elapsed.*seconds" "cea/datamanagement/format_helper/cea4_verify.py" # Look for time_elapsed variable usage echo -e "\nSearching for time_elapsed variable:" rg "time_elapsed" "cea/datamanagement/format_helper/cea4_verify.py" else echo "File not found" fiLength of output: 537
🧰 Tools
🪛 Ruff (0.8.2)
204-204: Local variable
list_missing_files_shp_building_geometry
is assigned to but never usedRemove assignment to unused variable
list_missing_files_shp_building_geometry
(F841)
205-205: Local variable
list_missing_files_csv_building_properties
is assigned to but never usedRemove assignment to unused variable
list_missing_files_csv_building_properties
(F841)
228-228: Local variable
scenario_name
is assigned to but never usedRemove assignment to unused variable
scenario_name
(F841)
cea/datamanagement/format_helper/cea4_migrate.py (1)
407-409
:⚠️ Potential issueCorrect time formatting in print statement
The time elapsed is being formatted incorrectly in the print statement. The
%d.2
format specifier is not appropriate for floating-point numbers.Apply this diff to correct the format specifier:
- print('The entire process of data migration from CEA-3 to CEA-4 is now completed - time elapsed: %d.2 seconds' % time_elapsed) + print('The entire process of data migration from CEA-3 to CEA-4 is now completed - time elapsed: %.2f seconds' % time_elapsed)Ensure that the time is displayed correctly with two decimal places.
✅ Verification successful
String formatting fix is correct
The suggested fix to use
%.2f
instead of%d.2
is the correct solution. The current format specifier is syntactically incorrect, while the proposed one will properly format the floating-point time value with two decimal places.🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
cea/utilities/batch_process_workflow.py (2)
159-159
: LGTM! Improved docstring clarity.The docstring now correctly specifies that the function processes "selected scenarios" rather than "all scenarios", which better reflects its actual behavior.
187-187
: LGTM! Improved error message formatting.Added period at the end of the error message for better consistency in message formatting.
cea/scripts.yml (1)
360-387
: LGTM! Well-structured CEA-4 format helper utilities.The new utilities are well-organized and provide a comprehensive solution for CEA-4 format handling:
cea4-format-helper
: Combined verification and migration capabilitiescea4-migrate
: Dedicated migration functionalitycea4-verify
: Standalone verification toolEach utility has clear descriptions and appropriate interface configurations.
🧰 Tools
🪛 yamllint (1.35.1)
[warning] 368-368: too few spaces after comma
(commas)
cea/default.config (3)
363-365
: LGTM! Updated emissions calculation year.Updated
year-to-calculate
to 2025 and improved the help text to better explain its purpose in emissions calculations.
613-620
: LGTM! Well-designed format helper configuration.The new format-helper section provides essential parameters for CEA-4 format handling:
scenarios-to-verify-and-migrate
: Flexible scenario selectionmigrate-from-cea-3
: Clear control over migration process with appropriate warning about irreversibility
625-625
: LGTM! Improved help text clarity.Updated help text for
scenarios-to-simulate
to clarify the exclusion of reference scenario when importing from Rhino/Grasshopper.
t0 = time.perf_counter() | ||
|
||
assert os.path.exists(config.general.project), 'input file not found: %s' % config.project | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix assertion error message
The assertion message uses config.project
, but it should reference config.general.project
.
Apply this diff to correct the variable reference:
- assert os.path.exists(config.general.project), 'input file not found: %s' % config.project
+ assert os.path.exists(config.general.project), 'input file not found: %s' % config.general.project
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (6)
cea/datamanagement/format_helper/format_helper.py (3)
2-2
: Fix typo in docstringThe word "formate" in the docstring should be "format".
-Main script of the formate helper that activates the verification and migration as needed. +Main script of the format helper that activates the verification and migration as needed.
31-32
: Consider usingos.pathsep
for cross-platform compatibilityUsing
:
as a path separator is Unix-specific. For better cross-platform compatibility, useos.pathsep
.-my_env['PATH'] = f"{os.path.dirname(sys.executable)}:{my_env['PATH']}" +my_env['PATH'] = f"{os.path.dirname(sys.executable)}{os.pathsep}{my_env['PATH']}"
39-41
: Remove commented-out subprocess callsThe code contains commented-out subprocess calls that should be removed if they're no longer needed.
if not bool_migrate: - # subprocess.run(['cea', 'cea4_migrate', '--scenario', cea_scenario], env=my_env, check=True,capture_output=True) dict_missing = cea4_verify(cea_scenario, print_results=True) print_verification_results_4(scenario_name, dict_missing) else: - # subprocess.run(['cea', 'cea4_verify', '--scenario', cea_scenario], env=my_env, check=True, capture_output=True) migrate_cea3_to_cea4(cea_scenario)Also applies to: 44-46
cea/datamanagement/format_helper/cea4_verify.py (1)
47-48
: Document the limitation of hardcoded pathsThe comment about hardcoded paths should be expanded to explain the limitation and future plans.
Add a TODO comment to track the technical debt:
# The paths are relatively hardcoded for now without using the inputlocator script. # This is because we want to iterate over all scenarios, which is currently not possible with the inputlocator script. +# TODO: Refactor to use inputlocator once it supports scenario iteration
cea/datamanagement/format_helper/cea4_migrate.py (2)
2-2
: Fix typo in docstringThe word "Mirgate" in the docstring should be "Migrate".
-Mirgate the format of the input data to CEA-4 format after verification. +Migrate the format of the input data to CEA-4 format after verification.
279-279
: Fix typo in commentThe comment contains typos: "folde" and "mirgrated".
- #2. about the .dbf files in the building-properties folde to be mirgrated to .csv files + #2. about the .dbf files in the building-properties folder to be migrated to .csv files
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cea/datamanagement/format_helper/cea4_migrate.py
(1 hunks)cea/datamanagement/format_helper/cea4_verify.py
(1 hunks)cea/datamanagement/format_helper/format_helper.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cea/datamanagement/format_helper/cea4_verify.py
203-203: Local variable list_missing_files_shp_building_geometry
is assigned to but never used
Remove assignment to unused variable list_missing_files_shp_building_geometry
(F841)
204-204: Local variable list_missing_files_csv_building_properties
is assigned to but never used
Remove assignment to unused variable list_missing_files_csv_building_properties
(F841)
227-227: Local variable scenario_name
is assigned to but never used
Remove assignment to unused variable scenario_name
(F841)
cea/datamanagement/format_helper/format_helper.py
9-9: Redefinition of unused cea
from line 6
Remove definition: cea
(F811)
🪛 GitHub Actions: Ruff
cea/datamanagement/format_helper/format_helper.py
[error] 9-9: Redefinition of unused cea
from line 6. Remove definition: cea
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: windows-latest
- GitHub Check: macos-latest
- GitHub Check: ubuntu-latest
🔇 Additional comments (7)
cea/datamanagement/format_helper/format_helper.py (3)
6-10
:⚠️ Potential issueRemove duplicate import
The module
cea.config
is imported twice.import cea.config import os import sys -import cea.config import time
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
9-9: Redefinition of unused
cea
from line 6Remove definition:
cea
(F811)
🪛 GitHub Actions: Ruff
[error] 9-9: Redefinition of unused
cea
from line 6. Remove definition:cea
94-94
:⚠️ Potential issueFix time formatting pattern
The time formatting pattern
%d.2
is incorrect for floating-point numbers. Use%.2f
instead.- print('The entire batch processing of data format verification (and migration) for CEA-4 is now completed - time elapsed: %d.2 seconds' % time_elapsed) + print('The entire batch processing of data format verification (and migration) for CEA-4 is now completed - time elapsed: %.2f seconds' % time_elapsed)Likely invalid or redundant comment.
68-68
:⚠️ Potential issueFix incorrect variable reference in assertion message
The assertion message references
config.project
, but it should beconfig.general.project
.- assert os.path.exists(config.general.project), 'input file not found: %s' % config.project + assert os.path.exists(config.general.project), 'input file not found: %s' % config.general.projectLikely invalid or redundant comment.
cea/datamanagement/format_helper/cea4_verify.py (3)
203-204
:⚠️ Potential issueRemove unused variables
The variables
list_missing_files_shp_building_geometry
andlist_missing_files_csv_building_properties
are assigned but never used.- list_missing_files_shp_building_geometry = dict_missing.get('building-geometry') - list_missing_files_csv_building_properties = dict_missing.get('building-properties')Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
203-203: Local variable
list_missing_files_shp_building_geometry
is assigned to but never usedRemove assignment to unused variable
list_missing_files_shp_building_geometry
(F841)
204-204: Local variable
list_missing_files_csv_building_properties
is assigned to but never usedRemove assignment to unused variable
list_missing_files_csv_building_properties
(F841)
195-197
:⚠️ Potential issueAdd error handling for the 'name' column access
The function assumes the 'name' column exists without checking, which could raise a KeyError.
- list_names_duplicated = df['name'][df['name'].duplicated()].tolist() + if 'name' in df.columns: + name_column = 'name' + elif 'Name' in df.columns: + name_column = 'Name' + else: + raise ValueError("The file does not contain a 'name' column.") + list_names_duplicated = df[name_column][df[name_column].duplicated()].tolist()Likely invalid or redundant comment.
227-227
:⚠️ Potential issueRemove unused variable
scenario_name
The variable
scenario_name
is assigned but never used.- scenario_name = os.path.basename(scenario)
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
227-227: Local variable
scenario_name
is assigned to but never usedRemove assignment to unused variable
scenario_name
(F841)
cea/datamanagement/format_helper/cea4_migrate.py (1)
154-154
:⚠️ Potential issueFix incorrect error message
The error message mentions CSV file when it's actually a DBF file.
- raise ValueError(f"Error reading CSV file: {e}") + raise ValueError(f"Error reading DBF file: {e}")Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
cea/datamanagement/format_helper/format_helper.py (2)
2-2
: Fix typo in docstringThe word "formate" in the docstring is misspelled.
-Main script of the formate helper that activates the verification and migration as needed. +Main script of the format helper that activates the verification and migration as needed.
40-41
: Remove commented-out codeThe commented-out subprocess commands should be removed as they are no longer needed and can cause confusion.
- # subprocess.run(['cea', 'cea4_migrate', '--scenario', cea_scenario], env=my_env, check=True,capture_output=True) dict_missing = cea4_verify(cea_scenario, print_results=True) print_verification_results_4(scenario_name, dict_missing) else: - # subprocess.run(['cea', 'cea4_verify', '--scenario', cea_scenario], env=my_env, check=True, capture_output=True) migrate_cea3_to_cea4(cea_scenario)Also applies to: 45-46
cea/datamanagement/format_helper/cea4_migrate.py (2)
2-2
: Fix typo in docstringThe word "Mirgate" in the docstring is misspelled.
-Mirgate the format of the input data to CEA-4 format after verification. +Migrate the format of the input data to CEA-4 format after verification.
62-64
: Consider using inputlocator scriptThe comment indicates that paths are hardcoded due to iteration requirements. Consider implementing a solution that allows iteration while using the inputlocator script for better maintainability.
Consider creating an iterator class that works with the inputlocator script to handle multiple scenarios while maintaining the benefits of centralized path management.
cea/datamanagement/format_helper/cea4_verify.py (1)
334-336
: TODO comment about database verificationThe comment about verifying the database under "inputs/technology/" folder is not implemented.
Would you like me to help implement the database verification logic or create a GitHub issue to track this task?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cea/datamanagement/format_helper/cea4_migrate.py
(1 hunks)cea/datamanagement/format_helper/cea4_verify.py
(1 hunks)cea/datamanagement/format_helper/format_helper.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cea/datamanagement/format_helper/format_helper.py
9-9: Redefinition of unused cea
from line 6
Remove definition: cea
(F811)
🪛 GitHub Actions: Ruff
cea/datamanagement/format_helper/format_helper.py
[error] 9-9: Redefinition of unused cea
from line 6. Remove definition: cea
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: windows-latest
- GitHub Check: macos-latest
- GitHub Check: ubuntu-latest
🔇 Additional comments (3)
cea/datamanagement/format_helper/format_helper.py (2)
6-12
:⚠️ Potential issueClean up imports
There are duplicate imports and unused imports that should be removed.
import cea.config import os import sys -import cea.config import time from cea.datamanagement.format_helper.cea4_verify import cea4_verify, print_verification_results_4 from cea.datamanagement.format_helper.cea4_migrate import migrate_cea3_to_cea4
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
9-9: Redefinition of unused
cea
from line 6Remove definition:
cea
(F811)
🪛 GitHub Actions: Ruff
[error] 9-9: Redefinition of unused
cea
from line 6. Remove definition:cea
68-69
:⚠️ Potential issueFix incorrect config reference in assertion message
The error message references
config.project
but should useconfig.general.project
.- assert os.path.exists(config.general.project), 'input file not found: %s' % config.project + assert os.path.exists(config.general.project), 'input file not found: %s' % config.general.projectLikely invalid or redundant comment.
cea/datamanagement/format_helper/cea4_verify.py (1)
168-199
:⚠️ Potential issueFix 'name' column assumption in
verify_name_duplicates_4
The function assumes the existence of a 'name' column without checking, which could lead to KeyError.
- # Find duplicate names - list_names_duplicated = df['name'][df['name'].duplicated()].tolist() + # Find duplicate names (case-insensitive check for both 'name' and 'Name') + name_col = None + for col in df.columns: + if col.lower() == 'name': + name_col = col + break + + if name_col is None: + raise ValueError("The file does not contain a 'name' column") + + list_names_duplicated = df[name_col][df[name_col].duplicated()].tolist()Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
cea/datamanagement/format_helper/cea4_verify.py (2)
196-196
:⚠️ Potential issueFix the 'name' column assumption
The function directly accesses the 'name' column without verifying its existence, which could raise a KeyError.
Apply this fix:
- list_names_duplicated = df['name'][df['name'].duplicated()].tolist() + if 'name' in df.columns: + name_column = 'name' + elif 'Name' in df.columns: + name_column = 'Name' + else: + raise ValueError("The file does not contain a 'name' column.") + list_names_duplicated = df[name_column][df[name_column].duplicated()].tolist()
374-374
:⚠️ Potential issueFix time formatting pattern
The time formatting pattern
%.2f
should be used for floating-point numbers.- print('The entire process of CEA-4 format verification is now completed - time elapsed: %.2f seconds' % time_elapsed) + print('The entire process of CEA-4 format verification is now completed - time elapsed: %.2f seconds' % time_elapsed)
🧹 Nitpick comments (9)
cea/datamanagement/format_helper/cea4_verify.py (2)
14-14
: Update the copyright yearThe copyright year is set to 2025, which is beyond the current date. Update it to reflect the current year.
-__copyright__ = "Copyright 2025, Architecture and Building Systems - ETH Zurich" +__copyright__ = "Copyright 2024, Architecture and Building Systems - ETH Zurich"
1-377
: Overall solid implementation with room for improvementThe verification system is well-structured with comprehensive checks. Consider:
- Adding logging instead of print statements for better error tracking
- Adding type hints to improve code maintainability
- Adding more detailed error messages to help users fix issues
cea/datamanagement/format_helper/cea4_migrate.py (7)
2-2
: Fix typo in docstring"Mirgate" should be "Migrate".
-Mirgate the format of the input data to CEA-4 format after verification. +Migrate the format of the input data to CEA-4 format after verification.
16-16
: Update copyright yearThe copyright year 2025 is set in the future. Consider updating it to the current year.
-__copyright__ = "Copyright 2025, Architecture and Building Systems - ETH Zurich" +__copyright__ = "Copyright 2024, Architecture and Building Systems - ETH Zurich"
62-92
: Consider using a configuration file for pathsThe paths are currently hardcoded. Consider moving them to a configuration file for better maintainability and flexibility.
170-171
: Update incorrect parameter descriptions in docstringThe docstring parameters don't match the actual function parameters.
- :param shapefile_path: Path to the shapefile (without file extension). - :param new_dataframe: pandas DataFrame with the new data to replace the DBF file. + :param scenario: Path or identifier for the scenario. + :param item: Identifier for the shapefile. + :param new_dataframe: pandas DataFrame with the new data to replace the DBF file. + :param list_attributes_3: List of attributes in CEA-3 format.
237-242
: Remove commented-out codeClean up the code by removing commented-out print statements or convert them to proper debug logging.
Consider using Python's logging module for debug messages instead of commented-out print statements.
341-341
: Improve assertion error messageThe error message could be more descriptive by including the actual path.
- assert os.path.exists(config.general.project), 'input file not found: %s' % config.project + assert os.path.exists(config.general.project), f'Project directory not found at: {config.general.project}'
1-369
: Overall: Well-structured migration script with minor improvements neededThe code is well-organized with clear separation of concerns, comprehensive migration logic, and good error handling. Consider the following improvements:
- Add error handling for file operations
- Clean up commented-out code
- Move hardcoded paths to configuration
- Fix documentation and error messages
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cea/datamanagement/format_helper/cea4_migrate.py
(1 hunks)cea/datamanagement/format_helper/cea4_verify.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: windows-latest
- GitHub Check: macos-latest
- GitHub Check: ubuntu-latest
🔇 Additional comments (4)
cea/datamanagement/format_helper/cea4_verify.py (1)
49-74
: LGTM! Well-structured path constructionThe function is well-implemented with:
- Clear error handling for unknown items
- Comprehensive coverage of all file types
- Documented reasoning for hardcoded paths
cea/datamanagement/format_helper/cea4_migrate.py (3)
28-56
: LGTM! Well-organized constants and mappingsThe constants and column mappings are well-defined, following consistent naming conventions.
151-154
:⚠️ Potential issueFix incorrect error message
The error message refers to a CSV file when reading a DBF file.
try: df = dbf_to_dataframe(dbf_path) except Exception as e: - raise ValueError(f"Error reading CSV file: {e}") + raise ValueError(f"Error reading DBF file: {e}")Likely invalid or redundant comment.
322-324
: 🛠️ Refactor suggestionAdd error handling for file operations
The file removal operation lacks proper error handling.
if os.path.exists(typology_path): - os.remove(typology_path) - print('typology.dbf has been removed as it is no longer needed by CEA-4.') + try: + os.remove(typology_path) + print('typology.dbf has been removed as it is no longer needed by CEA-4.') + except OSError as e: + print(f'Error removing typology.dbf: {e}')Likely invalid or redundant comment.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (7)
cea/datamanagement/format_helper/cea4_verify.py (5)
2-2
: Fix typo in docstringThere's a typo in the docstring: "Verify" is misspelled.
-Verify the format of the input data for CEA-4 model. +Verify the format of the input data for the CEA-4 model.
14-14
: Update copyright yearThe copyright year is set to 2025, which is beyond the current date (January 2025).
-__copyright__ = "Copyright 2025, Architecture and Building Systems - ETH Zurich" +__copyright__ = "Copyright 2024-2025, Architecture and Building Systems - ETH Zurich"
47-49
: Move comment to function docstring and add type hintsThe comment about hardcoding should be part of the function's docstring, and the function would benefit from type hints.
def path_to_input_file_without_db_4(scenario, item): + """ + Construct the path to an input file without using the inputlocator script. + + Note: The paths are relatively hardcoded for now because we want to iterate over + all scenarios, which is currently not possible with the inputlocator script. + + Args: + scenario (str): The scenario path + item (str): The item type (e.g., 'zone', 'surroundings', etc.) + + Returns: + str: The constructed file path + + Raises: + ValueError: If the item type is unknown + """
Line range hint
407-407
: Remove commented-out codeRemove the commented-out print statement as it's no longer needed.
- # print("-" * 1 + ' Scenario: {scenario} - end '.format(scenario=scenario_name) + "-" * 50)
Line range hint
384-388
: Add error handling for configurationThe configuration validation could be more robust.
def main(config): # Start the timer t0 = time.perf_counter() - assert os.path.exists(config.general.project), 'input file not found: %s' % config.project + if not hasattr(config, 'general') or not hasattr(config.general, 'project'): + raise ValueError("Invalid configuration: 'general.project' not found") + if not os.path.exists(config.general.project): + raise FileNotFoundError(f'Project file not found: {config.project}')cea/datamanagement/format_helper/cea4_migrate.py (2)
2-2
: Fix typo in docstringThere's a typo in the docstring: "Mirgate" should be "Migrate".
-Mirgate the format of the input data to CEA-4 format after verification. +Migrate the format of the input data to CEA-4 format after verification.
384-415
: Improve progress reporting and clean up commented codeThe main function could benefit from better progress reporting and cleanup.
- Add a progress reporting mechanism:
def report_progress(stage, total_stages): """Report progress of the migration process.""" print(f"[{stage}/{total_stages}] {stage_descriptions[stage]}") stage_descriptions = { }
- Remove commented-out code and use the progress reporting:
- # Print: End - # print("-" * 1 + ' Scenario: {scenario} - end '.format(scenario=scenario_name) + "-" * 50) print('+' * 60)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cea/datamanagement/format_helper/cea4_migrate.py
(1 hunks)cea/datamanagement/format_helper/cea4_verify.py
(1 hunks)
🔇 Additional comments (1)
cea/datamanagement/format_helper/cea4_verify.py (1)
195-197
:⚠️ Potential issueFix name column case sensitivity
The function assumes the 'name' column exists and is lowercase, but it should handle both 'name' and 'Name' cases.
- list_names_duplicated = df['name'][df['name'].duplicated()].tolist() + name_col = 'name' if 'name' in df.columns else 'Name' + if name_col not in df.columns: + raise ValueError("Neither 'name' nor 'Name' column found in the file") + list_names_duplicated = df[name_col][df[name_col].duplicated()].tolist()Likely invalid or redundant comment.
if 'name' not in list_missing_columns: | ||
list_names_duplicated = verify_name_duplicates_4(scenario, item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect logic in name column check
The condition is checking the wrong list for the 'name' column presence.
- if 'name' not in list_missing_columns:
+ if 'name' in required_columns: # Only check for duplicates if name is a required column
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if 'name' not in list_missing_columns: | |
list_names_duplicated = verify_name_duplicates_4(scenario, item) | |
if 'name' in required_columns: # Only check for duplicates if name is a required column | |
list_names_duplicated = verify_name_duplicates_4(scenario, item) |
if not os.path.isfile(dbf_path): | ||
raise FileNotFoundError(f"CSV file not found: {dbf_path}") | ||
|
||
# Load the CSV file | ||
try: | ||
df = dbf_to_dataframe(dbf_path) | ||
except Exception as e: | ||
raise ValueError(f"Error reading CSV file: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect error messages in verify_dbf_3
The error messages refer to CSV files when handling DBF files.
if not os.path.isfile(dbf_path):
- raise FileNotFoundError(f"CSV file not found: {dbf_path}")
+ raise FileNotFoundError(f"DBF file not found: {dbf_path}")
try:
df = dbf_to_dataframe(dbf_path)
except Exception as e:
- raise ValueError(f"Error reading CSV file: {e}")
+ raise ValueError(f"Error reading DBF file: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if not os.path.isfile(dbf_path): | |
raise FileNotFoundError(f"CSV file not found: {dbf_path}") | |
# Load the CSV file | |
try: | |
df = dbf_to_dataframe(dbf_path) | |
except Exception as e: | |
raise ValueError(f"Error reading CSV file: {e}") | |
if not os.path.isfile(dbf_path): | |
raise FileNotFoundError(f"DBF file not found: {dbf_path}") | |
# Load the CSV file | |
try: | |
df = dbf_to_dataframe(dbf_path) | |
except Exception as e: | |
raise ValueError(f"Error reading DBF file: {e}") |
zone_df_4 = pd.merge(zone_df_3, typology_df, left_on=['name'], right_on=["Name"], how='left') | ||
zone_df_4.drop(columns=['Name'], inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for DataFrame merge operation
The merge operation could fail silently if the key columns don't match.
- zone_df_4 = pd.merge(zone_df_3, typology_df, left_on=['name'], right_on=["Name"], how='left')
- zone_df_4.drop(columns=['Name'], inplace=True)
+ try:
+ zone_df_4 = pd.merge(zone_df_3, typology_df, left_on=['name'], right_on=["Name"], how='left')
+ if zone_df_4['name'].isna().any():
+ raise ValueError("Some buildings in zone.shp don't have matching records in typology.dbf")
+ zone_df_4.drop(columns=['Name'], inplace=True)
+ except Exception as e:
+ raise ValueError(f"Error merging zone and typology data: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
zone_df_4 = pd.merge(zone_df_3, typology_df, left_on=['name'], right_on=["Name"], how='left') | |
zone_df_4.drop(columns=['Name'], inplace=True) | |
try: | |
zone_df_4 = pd.merge(zone_df_3, typology_df, left_on=['name'], right_on=["Name"], how='left') | |
if zone_df_4['name'].isna().any(): | |
raise ValueError("Some buildings in zone.shp don't have matching records in typology.dbf") | |
zone_df_4.drop(columns=['Name'], inplace=True) | |
except Exception as e: | |
raise ValueError(f"Error merging zone and typology data: {e}") |
df.to_csv(path_to_input_file_without_db_4(scenario, item), index=False) | ||
os.remove(path_to_input_file_without_db_3(scenario, item)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for file operations
The file operations lack proper error handling.
- df.to_csv(path_to_input_file_without_db_4(scenario, item), index=False)
- os.remove(path_to_input_file_without_db_3(scenario, item))
+ output_path = path_to_input_file_without_db_4(scenario, item)
+ input_path = path_to_input_file_without_db_3(scenario, item)
+ try:
+ df.to_csv(output_path, index=False)
+ if os.path.exists(input_path):
+ os.remove(input_path)
+ except (IOError, OSError) as e:
+ raise RuntimeError(f"Error during file operations: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
df.to_csv(path_to_input_file_without_db_4(scenario, item), index=False) | |
os.remove(path_to_input_file_without_db_3(scenario, item)) | |
output_path = path_to_input_file_without_db_4(scenario, item) | |
input_path = path_to_input_file_without_db_3(scenario, item) | |
try: | |
df.to_csv(output_path, index=False) | |
if os.path.exists(input_path): | |
os.remove(input_path) | |
except (IOError, OSError) as e: | |
raise RuntimeError(f"Error during file operations: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
cea/datamanagement/format_helper/format_helper.py (3)
1-4
: Fix typo in module docstringThere's a typo in the word "formate" which should be "format".
-Main script of the formate helper that activates the verification and migration as needed. +Main script of the format helper that activates the verification and migration as needed.
29-29
: Remove commented-out codeRemove the commented-out subprocess calls if they are no longer needed. If they are meant for future use, add a TODO comment explaining why they are kept.
- # subprocess.run(['cea', 'cea4_migrate', '--scenario', cea_scenario], env=my_env, check=True,capture_output=True) - # subprocess.run(['cea', 'cea4_verify', '--scenario', cea_scenario], env=my_env, check=True, capture_output=True)Also applies to: 34-34
44-51
: Add type hints to improve code maintainabilityConsider adding type hints to improve code maintainability and IDE support.
-def main(config): +def main(config: cea.config.Configuration) -> None:
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cea/datamanagement/format_helper/format_helper.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cea/datamanagement/format_helper/format_helper.py
7-7: sys
imported but unused
Remove unused import: sys
(F401)
🪛 GitHub Actions: Ruff
cea/datamanagement/format_helper/format_helper.py
[error] 7-7: Unused import: sys
is imported but never used in the code
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: windows-latest
- GitHub Check: macos-latest
- GitHub Check: ubuntu-latest
🔇 Additional comments (3)
cea/datamanagement/format_helper/format_helper.py (3)
86-87
: LGTM!The entry point implementation follows Python best practices.
14-14
:⚠️ Potential issueUpdate copyright year
The copyright year is set to 2025, which is a future date. Please update it to the current year.
-__copyright__ = "Copyright 2025, Architecture and Building Systems - ETH Zurich" +__copyright__ = "Copyright 2024, Architecture and Building Systems - ETH Zurich"Likely invalid or redundant comment.
23-38
: 🛠️ Refactor suggestionImprove error handling and logic flow
- The function lacks error handling for potential exceptions
- The logic flow seems inverted - migration should happen before verification
Consider this improved implementation:
def exec_cea_format_helper(config, cea_scenario): # auto-migrate from CEA-3 to CEA-4 bool_migrate = config.format_helper.migrate_from_cea_3 scenario_name = os.path.basename(cea_scenario) try: if bool_migrate: migrate_cea3_to_cea4(cea_scenario) print(f"Successfully migrated {scenario_name} from CEA-3 to CEA-4") dict_missing = cea4_verify(cea_scenario, print_results=not bool_migrate) print_verification_results_4(scenario_name, dict_missing) except Exception as e: print(f"Error processing scenario {scenario_name}: {str(e)}") raiseLikely invalid or redundant comment.
This PR helps to verify the input data format's compatibility for CEA-4.
It also helps to migrate input data from CEA-3 to CEA-4.
This is Phase One of the CEA4 Format Helper.
In Phase Two, we will include the migrator for the new DB format after refactoring the DB.
Summary by CodeRabbit
Release Notes
New Features
Configuration Updates
Improvements
Utilities
cea4-format-helper
,cea4-migrate
, andcea4-verify
scripts for improved data management.