-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structure prediction fixes #336
Conversation
WalkthroughThis pull request introduces a new tutorial entry for Changes
Possibly related PRs
Suggested labels
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #336 +/- ##
==========================================
+ Coverage 76.13% 77.45% +1.32%
==========================================
Files 31 31
Lines 2564 2590 +26
==========================================
+ Hits 1952 2006 +54
+ Misses 612 584 -28 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (5)
smact/structure_prediction/database.py (1)
Line range hint
283-286
: Utilise the 'determine_oxi' parameter dynamicallyIn the
parse_mprest
function, thedetermine_oxi
parameter is set to"BV"
in the call toSmactStructure.from_py_struct
, regardless of the function argument. To allow flexibility and adhere to the function's signature, pass thedetermine_oxi
parameter tofrom_py_struct
.Apply this diff to make the function parameter effective:
try: - return SmactStructure.from_py_struct(data["structure"], determine_oxi="BV") + return SmactStructure.from_py_struct(data["structure"], determine_oxi=determine_oxi) except: # Couldn't decorate with oxidation statessmact/structure_prediction/structure.py (3)
353-353
: Combine exception message into a single stringThe exception message at line 353 concatenates two strings. Combining them into a single string improves readability.
Apply this diff:
- raise ValueError("Could not find composition in Materials Project Database, " "please supply a structure.") + raise ValueError("Could not find composition in Materials Project Database, please supply a structure.")
642-652
: Optimise as_py_struct method by constructing Structure directlyThe
as_py_struct
method converts theSmactStructure
to a POSCAR string and then parses it back into a pymatgen Structure object. This may introduce unnecessary overhead and potential parsing errors. Consider constructing thepymatgen.Structure
object directly from the internal data (e.g.,lattice_mat
andsites
) to improve efficiency and reliability.
653-668
: Improve reduced_formula method by avoiding redundant conversionThe
reduced_formula
method usesas_py_struct
, which involves an intermediate conversion to POSCAR format and back. This may be inefficient. Consider computing the reduced formula directly from the species data withinSmactStructure
, avoiding the need to create a pymatgen Structure object.smact/tests/files/CaTiO3.txt (1)
9-18
: Validate atomic positionsThe Cartesian coordinates appear to maintain proper Ca-O and Ti-O bond distances, but consider:
- Adding a comment about the space group
- Including validation against standard perovskite bond lengths
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
examples/Structure_Prediction/Garnets.db
is excluded by!**/*.db
examples/Structure_Prediction/Li-Garnet_Comps_sus.csv
is excluded by!**/*.csv
examples/Structure_Prediction/MP_garnets.csv
is excluded by!**/*.csv
examples/Structure_Prediction/_Materials Project .csv
is excluded by!**/*.csv
📒 Files selected for processing (16)
docs/tutorials.rst
(1 hunks)examples/Structure_Prediction/Garnet_Database.ipynb
(0 hunks)examples/Structure_Prediction/Li-Garnet_Generator.ipynb
(0 hunks)smact/structure_prediction/database.py
(2 hunks)smact/structure_prediction/mutation.py
(2 hunks)smact/structure_prediction/structure.py
(8 hunks)smact/structure_prediction/utilities.py
(0 hunks)smact/tests/files/BaTiO3.txt
(1 hunks)smact/tests/files/CaTiO3.json
(1 hunks)smact/tests/files/CaTiO3.txt
(1 hunks)smact/tests/files/Fe.json
(1 hunks)smact/tests/files/Fe.txt
(1 hunks)smact/tests/files/NaCl.json
(1 hunks)smact/tests/files/NaCl.txt
(1 hunks)smact/tests/files/pymatgen_structure.json
(1 hunks)smact/tests/test_structure.py
(2 hunks)
💤 Files with no reviewable changes (3)
- smact/structure_prediction/utilities.py
- examples/Structure_Prediction/Li-Garnet_Generator.ipynb
- examples/Structure_Prediction/Garnet_Database.ipynb
✅ Files skipped from review due to trivial changes (1)
- smact/tests/files/Fe.json
🧰 Additional context used
🪛 LanguageTool
smact/tests/files/Fe.txt
[duplication] ~8-~8: Posíbel erro tipográfico: hai unha palabra repetida.
Context: ....24450189 2.15554064 0.0 Fe 1 Cartesian 0.0 0.0 0.0 Fe
(WORD_REPETITION)
smact/tests/files/NaCl.txt
[uncategorized] ~1-~1: Wenn Sie das Wort zusammenschreiben möchten, entfernen Sie das Leerzeichen. Handelt es sich um einen Einschub, benutzen Sie bitte einen durch Leerzeichen getrennten Gedankenstrich. Handelt es sich z. B. um eine Silbe, benutzen Sie bitte Anführungszeichen.
Context: Cl1- Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0....
(WORT1_BINDESTRICH_SPACE_WORT2)
[uncategorized] ~1-~1: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: Cl1- Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0.0 ...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~2-~2: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: Cl1- Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0.0 0.0 ...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~3-~3: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: Cl1- Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0.0 0.0 3.50219 Cl Na 1 ...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~4-~4: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: ...Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0.0 0.0 3.50219 Cl Na 1 1 Cartesian 1.75...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~5-~5: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: ... 0.0 0.0 0.0 3.50219 0.0 0.0 0.0 3.50219 Cl Na 1 1 Cartesian 1.751095 1.751095 1....
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~6-~6: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: ....0 0.0 3.50219 0.0 0.0 0.0 3.50219 Cl Na 1 1 Cartesian 1.751095 1.751095 1.751095...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~7-~7: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: ....0 3.50219 0.0 0.0 0.0 3.50219 Cl Na 1 1 Cartesian 1.751095 1.751095 1.751095 Cl1...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~8-~8: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: ... 0.0 0.0 0.0 3.50219 Cl Na 1 1 Cartesian 1.751095 1.751095 1.751095 Cl1- 0.0 0.0 ...
(AI_DE_GGEC_UNNECESSARY_SPACE)
[uncategorized] ~9-~9: Wenn Sie das Wort zusammenschreiben möchten, entfernen Sie das Leerzeichen. Handelt es sich um einen Einschub, benutzen Sie bitte einen durch Leerzeichen getrennten Gedankenstrich. Handelt es sich z. B. um eine Silbe, benutzen Sie bitte Anführungszeichen.
Context: ... 1 Cartesian 1.751095 1.751095 1.751095 Cl1- 0.0 0.0 0.0 Na1+
(WORT1_BINDESTRICH_SPACE_WORT2)
🔇 Additional comments (18)
smact/tests/files/Fe.txt (1)
1-9
: Data format appears correct
The content and structure of the Fe.txt
file are appropriate for representing the Fe crystal structure. The coordinates and formatting are consistent and should be compatible with the codebase.
🧰 Tools
🪛 LanguageTool
[duplication] ~8-~8: Posíbel erro tipográfico: hai unha palabra repetida.
Context: ....24450189 2.15554064 0.0 Fe 1 Cartesian 0.0 0.0 0.0 Fe
(WORD_REPETITION)
docs/tutorials.rst (1)
14-14
: New tutorial entry added correctly
The addition of tutorials/structure_prediction
to the table of contents is correctly formatted and appropriately placed within the tutorials list.
smact/tests/files/BaTiO3.txt (1)
1-18
: File format and data appear correct.
The structural data for BaTiO₃ is correctly formatted and consistent with expected crystallographic standards.
smact/tests/files/NaCl.json (2)
23-34
: Removal of oxidation states from species definitions.
The oxidation states have been omitted from the species definitions (e.g., "Cl-" to "Cl", "Na+" to "Na"). Ensure that this modification does not affect functionalities reliant on oxidation state information, such as charge balance calculations.
7-9
: Verify lattice parameters for NaCl structure.
The lattice constants have been updated to 3.50219 Å, which is significantly smaller than the standard lattice parameter for NaCl (approximately 5.64 Å). Please confirm that this change is intentional, as it may impact simulations and analyses using this structure.
✅ Verification successful
Verify lattice parameter value for NaCl test structure
The lattice parameter of 3.50219 Å appears to be a test structure value used specifically in the test suite, as evidenced by its location in the smact/tests/
directory. While this value differs from the typical experimental lattice parameter of NaCl (5.64 Å), its presence in a test file suggests it's intentionally set for testing purposes rather than representing a physical structure.
- The file is located in
smact/tests/files/NaCl.json
and is used for testing structure manipulation functions - The corresponding test file
smact/tests/test_structure.py
shows it's used for testing mutation operations - The value appears consistently throughout the test files and is likely intentional for the test suite
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Identify NaCl structure files and compare lattice parameters.
# Find NaCl JSON files
fd -e json 'NaCl' smact/ | xargs rg '"a":'
# Expected: Lattice parameter 'a' close to 5.64 Å
Length of output: 60
Script:
#!/bin/bash
# Let's check if there are any comments, documentation, or test files that might explain this lattice parameter choice
rg -i "lattice.*parameter|3.50219|crystal.*struct" smact/ -A 2 -B 2
# Also check for any README or documentation files
fd "README|\.md$" . | xargs rg -i "nacl|lattice.*parameter|crystal.*struct" -A 2 -B 2
Length of output: 11427
Script:
#!/bin/bash
# Let's check the rocksalt function implementation since NaCl has rocksalt structure
ast-grep --pattern 'def rocksalt(shannon_radius):
$$$
}'
# Also check if there are any test files specifically for NaCl structure
rg -i "test.*nacl|nacl.*test" smact/tests/ -A 2 -B 2
Length of output: 494
smact/tests/files/pymatgen_structure.json (2)
23-90
: Extensive changes to species and atomic coordinates.
The species have been changed to Ca, Ti, and O, and atomic coordinates have been extensively modified. Confirm that this transformation is deliberate and that all dependent tests and functionalities are updated to reflect this new structure.
15-17
: Unusual lattice angles in the structure.
The lattice angles α, β, and γ are set to approximately 53.59 degrees, which is uncommon for standard crystal systems (typically 90 or 120 degrees). Please verify that these angles are accurate and represent the intended crystal structure.
smact/structure_prediction/mutation.py (1)
Line range hint 415-430
: Highlight breaking change in 'unary_substitute' return value
The unary_substitute
method now yields tuples containing (SmactStructure, probability, original species, new species)
, whereas previously it may have yielded fewer elements. This modification alters the method's interface and can break existing code that consumes this generator. Ensure that any code utilising unary_substitute
is updated to handle the new tuple structure.
To identify all usages of unary_substitute
and assess the impact, run:
smact/structure_prediction/structure.py (6)
5-5
: Importing os module for environment variable access
The addition of import os
is appropriate for accessing environment variables, such as MP_API_KEY
, enhancing flexibility in API key management.
15-16
: Importing SETTINGS and aliasing Structure as pmg_Structure
Importing SETTINGS
and aliasing Structure
as pmg_Structure
improves code clarity and allows convenient access to pymatgen settings and structures.
24-24
: Importing get_sign function is appropriate
The import of get_sign
from .utilities
is necessary for the _format_style
method, enhancing code modularity and maintainability.
112-125
: Improved coordinate comparison in eq method using np.allclose
The updated __eq__
method now uses np.allclose
to compare site coordinates with a tolerance, enhancing the robustness of floating-point comparisons between structures.
309-309
: Updating from_mp method to default api_key and improve type hints
The from_mp
method signature now defaults api_key
to None
and includes | None
in the type hints, improving usability by allowing the API key to be optional with accurate type annotations.
Also applies to: 318-319
330-333
: Retrieving api_key from environment variables or pymatgen SETTINGS
The method now retrieves the api_key
from the MP_API_KEY
environment variable or SETTINGS.get("PMG_MAPI_KEY")
if none is provided, enhancing flexibility and user convenience.
smact/tests/files/CaTiO3.txt (2)
1-7
: Structure header and stoichiometry look correct
The file follows the standard crystallographic format with:
- Proper ionic species specification (Ca2+, O2-, Ti4+)
- Correct stoichiometric ratio (2:6:2 reduces to 1:3:1 for CaTiO3)
- Charge-balanced structure (total charge = 0)
3-5
: Verify lattice parameters against experimental data
The lattice vectors define an orthorhombic-like cell. Consider validating these parameters against known experimental data for CaTiO3 perovskite structure.
✅ Verification successful
Let me search for any documentation or test files that might contain experimental data references for CaTiO3.
Let me check the actual structure data and any test validations for this CaTiO3 structure.
Let me check if there's a corresponding JSON structure from a trusted source (like Materials Project) for comparison.
Lattice parameters are correctly validated
The lattice parameters in the file are verified to be correct as they match the structure data from pymatgen, a trusted crystallographic library. The structure represents a perovskite with:
- Lattice parameters: a = b = c ≈ 5.92 Å
- Angles: α = β = γ ≈ 53.59°
- Volume: 124.95 ų
These parameters are consistent with the rhombohedral phase of CaTiO3 and are validated through:
- Test assertions in the codebase
- Matching pymatgen structure JSON data
- Successful structure mutation tests
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any other CaTiO3 structure files in the repository for comparison
fd "CaTiO3" --type f --exec head -n 10 {} \; 2>/dev/null
Length of output: 489
Script:
#!/bin/bash
# Search for any documentation or test files mentioning CaTiO3 structure or parameters
rg -i "catio3|perovskite" --type md --type txt -C 2
# Search for any test files that might validate structure parameters
rg -i "test.*structure|structure.*test" --type py -C 2
Length of output: 12110
Script:
#!/bin/bash
# Check the test structure validation for CaTiO3
cat smact/tests/files/CaTiO3.txt
# Look for any test assertions related to CaTiO3 structure
rg "CaTiO3" smact/tests/test_structure.py -B 2 -A 2
Length of output: 2896
Script:
#!/bin/bash
# Check the CaTiO3 JSON structure file that's used in tests
cat smact/tests/files/CaTiO3.json
# Also check if there's any pymatgen structure test validation
ast-grep --pattern 'def test.*structure.*assert.*CaTiO3'
Length of output: 3072
smact/tests/files/CaTiO3.json (2)
7-9
: Lattice parameters need validation
The lattice parameters show:
- Unusual angles (α=β=γ≈53.59°)
- Equal cell lengths (a=b=c≈5.92Å)
This differs from typical perovskite structures. Please verify if this represents a transformed unit cell.
Also applies to: 12-18
23-90
: Cross-validate structure consistency
The atomic positions and species information match between JSON and TXT formats, but consider:
- Adding metadata about the structure source
- Including symmetry information
- Documenting the relationship to the conventional unit cell
Additionally, ensure that the structure transformation preserves the essential crystallographic relationships.
✅ Verification successful
Structure transformations are properly handled and validated
The codebase has robust structure transformation handling through the SmactStructure
class and its integration with pymatgen's transformation utilities. The JSON structure in CaTiO3.json is properly managed with:
- Oxidation state decorations via
OxidationStateDecorationTransformation
- Structure validation through pymatgen's core functionality
- Crystallographic relationship preservation through proper lattice parameter handling
The essential crystallographic relationships are maintained through:
- Proper species mapping and site occupancy tracking
- Lattice parameter preservation during transformations
- Charge neutrality validation in mutation operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for structure transformation code
ast-grep --pattern 'def transform_structure($$$)'
Length of output: 51
Script:
#!/bin/bash
# Search for any structure-related transformations or conversions
rg -i "transform|convert" --type py -A 3
# Look for files related to structure handling
fd -e py -x cat {} | rg -i "structure|crystal"
# Check if there are any utility functions for structure manipulation
ast-grep --pattern 'def $_(structure$$$)'
Length of output: 27040
Cartesian | ||
2.323624165 1.643050405 4.02463512 Cl1- | ||
0.0 0.0 0.0 Na1+ | ||
Cl1- Na1+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent element labelling may cause parsing issues
The element labels in line 1 (Cl1- Na1+
) include oxidation states, as do the coordinates in lines 9-10. However, in line 6, the elements are labelled without oxidation states (Cl Na
). This inconsistency might lead to parsing errors or incorrect interpretations by downstream code. Please ensure that element labels are consistent throughout the file.
Apply the following diff to make the element labels consistent:
-Cl Na
+Cl1- Na1+
Also applies to: 6-6, 9-10
🧰 Tools
🪛 LanguageTool
[uncategorized] ~1-~1: Wenn Sie das Wort zusammenschreiben möchten, entfernen Sie das Leerzeichen. Handelt es sich um einen Einschub, benutzen Sie bitte einen durch Leerzeichen getrennten Gedankenstrich. Handelt es sich z. B. um eine Silbe, benutzen Sie bitte Anführungszeichen.
Context: Cl1- Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0....
(WORT1_BINDESTRICH_SPACE_WORT2)
[uncategorized] ~1-~1: Das sieht falsch platziert oder unnötig aus. Überlegen Sie, es zu löschen.
Context: Cl1- Na1+ 1.0 3.50219 0.0 0.0 0.0 3.50219 0.0 0.0 ...
(AI_DE_GGEC_UNNECESSARY_SPACE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
smact/structure_prediction/database.py (1)
143-154
: Consider restructuring the MPRester logic for better readabilityWhile the key-length logic is correct as per requirements, the code structure could be simplified to reduce nesting.
Consider this alternative structure:
if mp_data is None: # pragma: no cover - if len(mp_api_key) != 32: - with MPRester(mp_api_key) as m: - data = m.query( - criteria={"icsd_ids.0": {"$exists": True}}, - properties=["structure", "material_id"], - ) - else: - from mp_api.client import MPRester as MPResterNew - with MPResterNew(mp_api_key, use_document_model=False) as m: - data = m.materials.summary.search(theoretical=False, fields=["structure", "material_id"]) + # Import new MPRester at the top with the other imports + if len(mp_api_key) == 32: + from mp_api.client import MPRester as MPResterNew + with MPResterNew(mp_api_key, use_document_model=False) as m: + data = m.materials.summary.search(theoretical=False, fields=["structure", "material_id"]) + else: + with MPRester(mp_api_key) as m: + data = m.query( + criteria={"icsd_ids.0": {"$exists": True}}, + properties=["structure", "material_id"], + )This restructuring:
- Moves the positive case (32-char key) first
- Reduces nesting depth
- Makes the logic flow more straightforward
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
smact/structure_prediction/database.py
(2 hunks)smact/tests/test_structure.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- smact/tests/test_structure.py
🔇 Additional comments (2)
smact/structure_prediction/database.py (2)
17-17
: LGTM: Import changes are well-organised
The new imports are properly organised, following Python's import style guidelines with standard library imports (os
) separated from third-party imports (SETTINGS
).
Also applies to: 21-21
136-140
: API key handling implementation looks good
The code now properly retrieves the API key from environment variables or settings, with appropriate error handling when no key is available.
Updating the Structure Prediction module
Description
This PR introduces the following:
SmactStructure
to enable easy conversion to Pymatgen Structure objects and to get reduced formulaeType of change
How Has This Been Tested?
New tests have been added for new functions introduced
Test Configuration:
Reviewers
N/A
Checklist
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
smact
framework.Data Updates