Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add domain frequencies to validation report #47

Merged
merged 67 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
7abdfca
Hover state of "logo" el
akuny Mar 8, 2024
7341628
Add about and card component/macro
akuny Mar 11, 2024
d31514b
Update 4xx templates
akuny Mar 11, 2024
95011ee
Add data checklist view
akuny Mar 11, 2024
550af85
Mapping index
akuny Mar 12, 2024
2efa8d0
Isolate alpine component for testing
akuny Mar 12, 2024
76856c5
Test MappingForm component and validation logic
akuny Mar 13, 2024
e3aeddc
Add linting, update ci action
akuny Mar 13, 2024
8d0cde3
Update actions
akuny Mar 13, 2024
d7b624f
Adopt convention of using calling parenthesis when using alpine.js co…
akuny Mar 13, 2024
e1c0c3f
Work on create mapping form
akuny Mar 13, 2024
e45df5b
Merge branch 'main' into mapping-ui
akuny Mar 13, 2024
816c6fb
Update sidebar
akuny Mar 13, 2024
3b0aa80
Stub out routes for data submissions CRUD
akuny Mar 13, 2024
c996a37
Work on create mapping use case
akuny Mar 13, 2024
5c062e2
Add show get_by_id method to map repo
akuny Mar 14, 2024
dcf5f74
Add use case to get a column map entity
akuny Mar 14, 2024
5712f63
Trying to tighten up consistency of terminology re ColumnMap entity
akuny Mar 14, 2024
fc4d444
Start working on display of existing column_maps
akuny Mar 14, 2024
493202c
Fix language inconsistencies
akuny Mar 14, 2024
793a486
Tweak formatting of component
akuny Mar 14, 2024
dabd742
First pass at mapping validation
akuny Mar 14, 2024
1598c4f
Test mapping validation logic
akuny Mar 14, 2024
1147f8a
Format
akuny Mar 14, 2024
d9a7db6
Fixes after manual testing
akuny Mar 15, 2024
a3d029e
Create mapping happy path
akuny Mar 15, 2024
a7ec3b0
Work on show mapping
akuny Mar 15, 2024
03b6df6
Stub out form that can update a required field, edit an optional fiel…
akuny Mar 18, 2024
372a500
Work on forms
akuny Mar 18, 2024
4191e9a
Successfully use form
akuny Mar 18, 2024
cafc16a
Testing and tweaking formatting
akuny Mar 18, 2024
cbca71e
Adjusting formatting
akuny Mar 19, 2024
82706db
Adjust button container
akuny Mar 19, 2024
be9d816
Update font of data type in checklist\
akuny Mar 19, 2024
4e9d14f
Hack to allow for space at bottom of mapping form
akuny Mar 19, 2024
e2675ec
Use card component in data submission show template
akuny Mar 19, 2024
9eae1ed
Pull tweaks into component files
akuny Mar 19, 2024
6637a4d
Update seed script, fix storage download_temp method
akuny Mar 19, 2024
a83ea32
Format login page
akuny Mar 19, 2024
7da9649
Test, lint and format
akuny Mar 19, 2024
b350336
Show created date in mapping index
akuny Mar 19, 2024
35d1c39
Move required fields to static prop of ColumnMap entity
akuny Mar 21, 2024
0b9e7ad
Shift add form to show.html
akuny Mar 22, 2024
4cb870c
Rework edit form
akuny Mar 22, 2024
933c366
Testing workflow
akuny Mar 22, 2024
ac04b33
Repositioning buttons
akuny Mar 22, 2024
03d63c2
Update type of component
akuny Mar 22, 2024
83ec6d6
Lint and format
akuny Mar 22, 2024
30db818
Test column map use cases
akuny Mar 22, 2024
a6f2747
Rename domain dir core to avoid confusion with gis terminology
akuny Mar 22, 2024
5e7a991
Tweak pending items
akuny Mar 22, 2024
2e53ebe
Add domain frequencies to validation report
Mar 22, 2024
82118fc
Update column_map route handlers
akuny Mar 25, 2024
4d14d4c
Remove hidden PUT fields and hook
akuny Mar 25, 2024
30aedad
Display user feedback for invalid mappings
akuny Mar 25, 2024
ded964c
Display updated date for mappings
akuny Mar 25, 2024
1e7c297
Add rudimentary required field indicator to edit form
akuny Mar 25, 2024
1e6d48c
Initialize report to empty list
akuny Mar 25, 2024
ad82729
Format
akuny Mar 25, 2024
2926c6c
Ad updated_at to fake column map repo
akuny Mar 25, 2024
ce94313
Basic test for column_map index method
akuny Mar 25, 2024
2fa2be4
Extract file reading to application layer from controller
akuny Mar 25, 2024
09c90ef
Merge in main
akuny Mar 25, 2024
cd957bb
Update import statement
akuny Mar 25, 2024
d8376b2
Test and lint frontend code
akuny Mar 25, 2024
45ce15c
Merge pull request #48 from GSA-TTS/mapping-ui
akuny Mar 25, 2024
d201e87
Merge in main
akuny Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,47 @@ jobs:

steps:
- name: Check out repository
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Read .python-version
run: echo "##[set-output name=PYTHON_VERSION;]$(cat .python-version)"
id: python-version

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
python-version: "${{ steps.python-version.outputs.PYTHON_VERSION }}"

- name: Install poetry
shell: bash
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "/root/.local/bin" >> $GITHUB_PATH

- name: Install dependencies
- name: Install Python dependencies
shell: bash
run: poetry install

- name: Lint
- name: Lint backend code
shell: bash
run: poetry run flake8

- name: Test
- name: Test backend code
shell: bash
run: poetry run pytest

- name: Read .nvmrc
run: echo "##[set-output name=NVMRC;]$(cat .nvmrc)"
id: nvm

- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: "${{ steps.nvm.outputs.NVMRC }}"

- name: Install npm dependencies, lint, and test frontend code
run: |
cd nad_ch/controllers/web
npm install
npm run lint
npm test
2 changes: 1 addition & 1 deletion .github/workflows/deploy-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:

steps:
- name: Check out repository
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
Expand Down
1 change: 1 addition & 0 deletions .nvmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
18.17.1
9 changes: 3 additions & 6 deletions nad_ch/application/dtos.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,9 @@ class DataSubmissionReportFeature:
invalid_domain_count: int = 0
valid_domain_count: int = 0
invalid_domains: List[str] = field(default_factory=list)
# TODO: Add frequency charts for each field and only take the top 10 if
# more than 10 values exist
# invalid_domain_frequencies: Dict[str, int]
# Set to True if invalid_domains & invalid_domain_frequencies doesn't contain
# a full list of unique domains found in source data
# invalid_domain_list_truncated: bool = False
domain_frequency: Dict[str, Dict[str, int]] = field(default_factory=dict)
# Set to true when there is too many unexpected domain values found for a field
high_domain_cardinality: bool = False


@dataclass
Expand Down
4 changes: 3 additions & 1 deletion nad_ch/application/interfaces.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from typing import Optional, Protocol, Dict
from nad_ch.application.dtos import DownloadResult
from nad_ch.domain.repositories import (
from nad_ch.core.repositories import (
DataProducerRepository,
DataSubmissionRepository,
UserRepository,
Expand Down Expand Up @@ -106,6 +106,8 @@ def __getitem__(self, key: str):
return self.submissions
elif key == "users":
return self.users
elif key == "column_maps":
return self.column_maps
elif key == "logger":
return self.logger
elif key == "storage":
Expand Down
2 changes: 1 addition & 1 deletion nad_ch/application/use_cases/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
OAuth2TokenError,
)
from nad_ch.application.interfaces import ApplicationContext
from nad_ch.domain.entities import User
from nad_ch.core.entities import User


def get_or_create_user(ctx: ApplicationContext, provider_name: str, email: str) -> User:
Expand Down
120 changes: 120 additions & 0 deletions nad_ch/application/use_cases/column_maps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import csv
from io import StringIO
from typing import Dict, List, IO
from nad_ch.application.interfaces import ApplicationContext
from nad_ch.application.view_models import (
get_view_model,
ColumnMapViewModel,
)
from nad_ch.core.entities import ColumnMap


def add_column_map(
ctx: ApplicationContext, user_id: int, name: str, mapping: Dict[str, str]
):
user = ctx.users.get_by_id(user_id)
if user is None:
raise ValueError("User not found")

# TODO get the producer name from the user's producer property
producer = ctx.producers.get_by_name("New Jersey")
if producer is None:
raise ValueError("Producer not found")

# Note: will need to account for admin permissions to update any DataProducer's
# column mapping, and for users associated with the DataProducer to update ONLY
# their own column mapping

column_map = ColumnMap(name, producer, mapping, 1)

if not column_map.is_valid():
raise ValueError("Invalid mapping")

saved_column_map = ctx.column_maps.add(column_map)
ctx.logger.info("Column Map added")

return get_view_model(saved_column_map)


def get_column_map(ctx: ApplicationContext, id: int) -> ColumnMapViewModel:
column_map = ctx.column_maps.get_by_id(id)

if column_map is None:
raise ValueError("Column map not found")

return get_view_model(column_map)


def get_column_maps_by_producer(
ctx: ApplicationContext, producer_name: str
) -> List[ColumnMapViewModel]:
producer = ctx.producers.get_by_name(producer_name)
if not producer:
raise ValueError("Producer not found")
column_maps = ctx.column_maps.get_by_producer(producer)

return [get_view_model(column_map) for column_map in column_maps]


def update_column_mapping(
ctx: ApplicationContext, id: int, new_mapping: Dict[str, str]
):
column_map = ctx.column_maps.get_by_id(id)

if column_map is None:
raise ValueError("Column map not found")

column_map.mapping = {
key: new_mapping[key] for key in ColumnMap.all_fields if key in new_mapping
}

if not column_map.is_valid():
raise ValueError("Invalid mapping")

ctx.column_maps.update(column_map)

return get_view_model(column_map)


def update_column_mapping_field(
ctx: ApplicationContext, id: int, user_field: str, nad_field: str
):
column_map = ctx.column_maps.get_by_id(id)

if column_map is None:
raise ValueError("Column map not found")

column_map.mapping[nad_field] = user_field

column_map.mapping = {
key: column_map.mapping[key]
for key in ColumnMap.all_fields
if key in column_map.mapping
}

if not column_map.is_valid():
raise ValueError("Invalid mapping")

ctx.column_maps.update(column_map)

return get_view_model(column_map)


def get_column_map_from_csv_file(file: IO[bytes]) -> Dict[str, str]:
file_content = file.read().decode("utf-8-sig")
stream = StringIO(file_content)
csv_reader = csv.reader(stream, dialect="excel")

headers = next(csv_reader)
if not headers:
raise Exception("CSV file is empty or invalid")

csv_dict = {}

for row in csv_reader:
if len(row) < 2:
continue
key, value = row[:2]
csv_dict[key] = value

return csv_dict
2 changes: 1 addition & 1 deletion nad_ch/application/use_cases/data_producers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
get_view_model,
DataProducerViewModel,
)
from nad_ch.domain.entities import DataProducer
from nad_ch.core.entities import DataProducer


def add_data_producer(
Expand Down
2 changes: 1 addition & 1 deletion nad_ch/application/use_cases/data_submissions.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
get_view_model,
DataSubmissionViewModel,
)
from nad_ch.domain.entities import DataSubmission, ColumnMap
from nad_ch.core.entities import DataSubmission, ColumnMap


def ingest_data_submission(
Expand Down
33 changes: 25 additions & 8 deletions nad_ch/application/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
)
import glob
from pathlib import Path
from nad_ch.domain.entities import ColumnMap
from nad_ch.core.entities import ColumnMap
from collections import Counter


class DataValidator:
Expand Down Expand Up @@ -90,7 +91,7 @@ def update_feature_details(self, gdf: GeoDataFrame):
feature_submission.populated_count += populated_count
feature_submission.null_count += null_count

# Update domain specific metrics
# Update invalid domain metrics
column_domain_dict = self.domains["domain"].get(column)
column_mapper_dict = self.domains["mapper"].get(column)
if column_domain_dict and column_mapper_dict:
Expand Down Expand Up @@ -124,12 +125,28 @@ def update_feature_details(self, gdf: GeoDataFrame):
)
feature_submission.invalid_domain_count += invalid_domain_count
feature_submission.valid_domain_count += valid_domain_count
# Can only store up to 10 invalid domains per nad field
invalid_domain_unique_count = len(invalid_domains)
remaining_slots = 10 - len(feature_submission.invalid_domains)
if invalid_domain_unique_count and remaining_slots > 0:
invalid_domains = invalid_domains[:remaining_slots]
feature_submission.invalid_domains.extend(invalid_domains)
# Can only store up to 100 invalid domains per nad field
remaining_slots = 100 - len(feature_submission.invalid_domains)
if invalid_domains and remaining_slots > 0:
feature_submission.invalid_domains.extend(
invalid_domains[:remaining_slots]
)

# Generate frequency table of fields that are domain specific only
if column_domain_dict:
domain_freq = gdf[column].value_counts().to_dict()
if feature_submission.domain_frequency:
domain_freq = dict(
Counter(feature_submission.domain_frequency)
+ Counter(domain_freq)
)
# Check if the number of unique domains in frequency dictionary
# is 2x greater than maximum expected unique domains
if len(domain_freq.keys()) > 2 * len(column_domain_dict.keys()):
feature_submission.high_domain_cardinality = True
# Reset domain frequency
domain_freq = {}
feature_submission.domain_frequency = domain_freq

def update_overview_details(self, gdf: GeoDataFrame):
self.report_overview.records_count += self.get_record_count(gdf)
Expand Down
48 changes: 44 additions & 4 deletions nad_ch/application/view_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
from datetime import datetime
import json
import numpy as np
from typing import Union, List, Tuple, TypeVar, Protocol
from nad_ch.domain.entities import Entity, DataProducer, DataSubmission
from typing import Union, Dict, List, Tuple, Protocol
from nad_ch.core.entities import Entity, ColumnMap, DataProducer, DataSubmission


class ViewModel(Protocol):
Expand All @@ -18,6 +18,7 @@ def get_view_model(
get a static view model object that it can return to its caller.
"""
entity_to_vm_function_map = {
ColumnMap: create_column_map_view_model,
DataProducer: create_data_producer_vm,
DataSubmission: create_data_submission_vm,
}
Expand All @@ -36,6 +37,46 @@ def get_view_model(
raise ValueError(f"No mapping function defined for entity type: {entity_type}")


@dataclass
class ColumnMapViewModel(ViewModel):
id: int
date_created: str
date_updated: str
name: str
mapping: Dict[str, str]
version: int
producer_name: str
available_nad_fields: List[str]
required_nad_fields: List[str]


def create_column_map_view_model(column_map: ColumnMap) -> ColumnMapViewModel:
available_nad_fields = [
key
for key in ColumnMap.all_fields
if key not in column_map.mapping or column_map.mapping.get(key) in ["", None]
]

date_updated = (
"-"
if column_map.updated_at == column_map.created_at
and column_map.updated_at is not None
else present_date(column_map.updated_at)
)

return ColumnMapViewModel(
id=column_map.id,
date_created=present_date(column_map.created_at),
date_updated=date_updated,
name=column_map.name,
mapping=column_map.mapping,
version=column_map.version_id,
producer_name=column_map.producer.name,
available_nad_fields=available_nad_fields,
required_nad_fields=ColumnMap.required_fields,
)


@dataclass
class DataProducerViewModel(ViewModel):
id: int
Expand All @@ -61,8 +102,7 @@ class DataSubmissionViewModel(ViewModel):


def create_data_submission_vm(submission: DataSubmission) -> DataSubmissionViewModel:
# TODO make this be an empty array so the frontend doesn't have to check for None
report_json = None
report_json = []
if submission.report is not None:
enriched_report = enrich_report(submission.report)
report_json = json.dumps(enriched_report)
Expand Down
5 changes: 3 additions & 2 deletions nad_ch/controllers/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ def list_submissions_by_producer(ctx, producer):
@cli.command()
@click.pass_context
@click.argument("filename")
def validate_submission(ctx, filename):
@click.argument("mapping_name")
def validate_submission(ctx, filename, mapping_name):
context = ctx.obj
validate_data_submission(context, filename)
validate_data_submission(context, filename, mapping_name)
Loading
Loading