Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to harvest IIS WAF #4827

Closed
FuhuXia opened this issue Jul 24, 2024 · 5 comments · Fixed by GSA/catalog.data.gov#1491
Closed

Fail to harvest IIS WAF #4827

FuhuXia opened this issue Jul 24, 2024 · 5 comments · Fixed by GSA/catalog.data.gov#1491
Assignees
Labels
bug Software defect or bug

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Jul 24, 2024

Fail to harvest source https://hazards.fema.gov/filedownload/metadata/. Something to do with ckan/ckanext-spatial#319. Need to exam the iis parser and make it more inclusive.

How to reproduce

harvest IIS WAF with subfolders like this

Tuesday, June 30, 2015  3:41 PM        <dir> mydir
Tuesday, June 30, 2015  3:30 PM        13867 one.xml

Expected behavior

traverse into mydir and harvest files under the sub folder

Actual behavior

ignore files under mydir

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

@FuhuXia FuhuXia added the bug Software defect or bug label Jul 24, 2024
@FuhuXia FuhuXia self-assigned this Jul 25, 2024
@FuhuXia
Copy link
Member Author

FuhuXia commented Jul 25, 2024

A PR is submitted to upstream ckan/ckanext-spatial#337

But we want to stay at current release version and cherry-pick this fix. So we should use our fork in catalog repo's requirement.in.

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 12, 2024

Still have issue with harvest source https://hazards.fema.gov/filedownload/metadata/, the harvester does not traverse into folders. It turns out harvester is expecting relative path for each folder, for example, for a WAF like this

Sunday, April 14, 2024  7:55 PM        <dir> R01
Sunday, April 14, 2024  7:55 PM        <dir> R02

Harvester is expecting R01/, R02/, but this IIS WAF is using full path /filedownload/metadata/R01/, /filedownload/metadata/R02/. Harvester is designed to ignore any path starting with /, as in this code.

Nginx and Apache servers are fine. We need to research if IIS is using full path by default, or it is a custom setting by this particular IIS server, and then come up with a fix accordingly.

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 15, 2024

Further fix is done to address IIS folder url.

@FuhuXia FuhuXia closed this as completed Aug 15, 2024
@github-project-automation github-project-automation bot moved this from 📔 Product Backlog to ✔ Done in data.gov team board Aug 15, 2024
@btylerburton btylerburton moved this from ✔ Done to 🗄 Closed in data.gov team board Sep 3, 2024
@FuhuXia
Copy link
Member Author

FuhuXia commented Nov 7, 2024

The fix was merged into upstream main. But it breaks for Apache servers.
ckan/ckanext-spatial#337
Need further fix and add tests for all servers.

@FuhuXia FuhuXia reopened this Nov 7, 2024
@FuhuXia FuhuXia moved this from 🗄 Closed to 🏗 In Progress [8] in data.gov team board Nov 7, 2024
@FuhuXia
Copy link
Member Author

FuhuXia commented Nov 12, 2024

new PR pushed to upstream for fix IIS/APACHE. test added.
ckan/ckanext-spatial#342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug
Projects
Status: 🗄 Closed
Development

Successfully merging a pull request may close this issue.

1 participant