Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: load iNaturalist observation photos for taxa #21

Merged
merged 5 commits into from
Dec 30, 2024

Conversation

kueda
Copy link
Contributor

@kueda kueda commented Dec 20, 2024

Curious how you feel about this approach, @johnkenny54. I basically just duplicated the taxon photos approach, which adds another fairly slow script that should only need to be run once, and another big data file. Alternatively, I could build off your approach in cpl-photos.js and only fetch observation photos for taxa that don't have 5 taxon photos, but that will slow that script down quite a bit.

Also, what does cpl stand for, anyway?

@johnkenny54
Copy link
Contributor

I took the "cpl-photos" name from an email you sent me last month, I assumed "cpl" was an acronym for ca-plant-list, and it seemed to make sense as a prefix, so I went with it. cpl-photos is a refactored version of inattaxonphotos.js which can operate on a subset of taxa and update inattaxonphotos.csv non-destructively.

My intention with this was that cpl-photos would have tools for simplifying photo management and making it predictable. Currently there are 2 commands:

  • addmissing will scan the data for any taxa with fewer than 5 photos and try to add new photos. So in ca-plant-list, you can run npx cpl-photos addmissing to update the inattaxonphotos.csv file (currently it writes a new file to the output directory so you have to manually copy it to the master data directory, but it should probably be updated in place). The inattaxonphotos.csv file should be unchanged except for any new photos. This only runs in the ca-plant-list repository.
  • checkmissing will scan the data and report on any taxa with fewer than a specified number of photos. You can run this from any repository.

So if you wanted to update the rareplants photos to add photos for taxa that currently had none, the workflow would be something like:

  • go to your local copy of rareplants and run npx cpl-photos checkmissing --minphotos 1 - this will generate a list of taxa with no photos, to console and output/log.tsv
  • go to iNat and update the taxa photos for the ones you want to add
  • go to ca-plant-list and run npx cpl-photos addmissing
  • when a new version of ca-plant-list with the updated photo CSV is published, these should be picked up in the next rareplants build

From what I can see in this PR, it looks like the observation photos are a fallback in case there are fewer than 5 photos in the taxon CSV. Options I see are:

  • Go ahead with the approach in this PR. This is a quick way to get photos for almost everything. I don't really like this option because it adds more code and data which seems like it will become increasingly less useful if the taxa photos are filled in.
  • Update cpl-photos with an option to pull from observation photos if there aren't enough taxa photos. This is also a quick way to get as many photos as possible. It also means there are no new data files or display code.
  • Leave things as-is, and add photos by updating the taxa photos in iNat. The downside to this is that it takes a lot longer to get photos for everything, and it's a lot of work. The upside is that it presumably benefits the iNat community, and whoever selects the taxon photos can curate them.

I had been assuming we would use the last approach, but the second option is defintely faster.

If you made it this far, what are your thoughts on the above?

@kueda
Copy link
Contributor Author

kueda commented Dec 22, 2024

If you made it this far, what are your thoughts on the above?

I guess I was assuming consumers of this library would have more control over loading their own photos. If that was the case, I think tooling that allows the consumer to load iNat taxon photos or iNat obs photos would be useful, e.g. for https://rareplants.ebcnps.org we could not use the iNat taxon photos and use iNat obs photos from the East Bay instead.

However, if the consumer is not supposed to be saving their own photo data, this PR as it stands isn't worth it. I think option 2 is probably best, since the other relies on us doing slow manual work, which would be beneficial to iNat, but probably not something I'm going to do for every plant in CA. There also might be benefits to consumers to having a consistent number of photos, but on iNat there's no such benefit.

That's said, your call. Happy to implement option 2 if you want to go that way, but also happy to just go with option 3.

@johnkenny54
Copy link
Contributor

OK, I think I'm getting what you're trying to do, and if I'm understanding correctly, it should be pretty straightforward. Thinking out loud a bit here, and I may not be getting everything, so let me know if something doesn't make sense.

I think the main problem is with the #loadInatPhotos function in taxa.js. The dataDir passed in to that function is relative to ca-plant-list, so as the function stands now, it will never use data from the consumer. I think what you want is something like:

#loadInatPhotos(dataDir) {
    this.#loadPhotosFromFile( "./data", "inatobsphotos.csv" );
    this.#loadPhotosFromFile( dataDir, "inattaxonphotos.csv" );
}

That should

  • first load photos from the local directory (e.g. rareplants), if they have been set up
  • then add the default photos from ca-plant-list

So the local file takes precedence, but if it's not there the default photos would be used. If you made this change, I think the merge request in rareplants would work as you intended - but without this change the rareplants photo file was not being used.

Not sure if you want to keep the inatobsphotos.csv currently in this PR or not - if you do, that would add a third layer of priority. Also I don't know what location filter you applied to generate the file, but if you keep it, it seems like the filter should be California. Local files I assume would generally use a smaller location (e.g. rareplants would use East Bay Counties).

I think it's fine for now to leave inatobsphotos.js as a standalone script. Maybe we want to consolidate these scripts eventually, but this is all still pretty experimental so probably best to wait until things are more stable.

@kueda
Copy link
Contributor Author

kueda commented Dec 26, 2024

That all makes sense, but the larger question remains unanswered: do you want the consumer to have this level of control? If you do, then I can refine this PR. But if you don't, or don't think it's worth the extra hassle, then I'll close this PR.

@johnkenny54
Copy link
Contributor

Yes, this seems fine to me. I was confused about what the goal was based on the initial code. But it seems like with this change one is not required to customize photos for a local site, but if you choose to do so it should be pretty easy, which all makes sense to me.

Uses values from data/inatphotos.csv before using baked-in photos from
inattaxonphotos.csv and inatobsphotos.csv, allowing the consumer to customize
what photos show up on their site.

Also updates baked-in inatobsphotos.csv w/ photos from all of CA.
@johnkenny54 johnkenny54 merged commit 3f538bd into ca-plants:main Dec 30, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants