Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export community/collection structure as xml #3647

Open
2 tasks done
pgwillia opened this issue Nov 25, 2024 · 34 comments
Open
2 tasks done

Export community/collection structure as xml #3647

pgwillia opened this issue Nov 25, 2024 · 34 comments

Comments

@pgwillia
Copy link
Member

pgwillia commented Nov 25, 2024

https://wiki.lyrasis.org/display/DSDOC7x/Exporting+and+Importing+Community+and+Collection+Hierarchy

Before we can import a SAF collection package we need to have established the community/collection hierarchy in DSpace and have the collection id for the DSpace destination of the SAF.

We can import this structure if we can create an XML file in the following form:

<import_structure>
        <community>
                <name>Community Name</name>
                <description>Descriptive text</description>
                <intro>Introductory text</intro>
                <copyright>Special copyright notice</copyright>
                <sidebar>Sidebar text</sidebar>
                <community>
                   <name>Sub Community Name</name>
                   <community> ...[ad infinitum]...
                   </community>
                </community>
                <collection>
                   <name>Collection Name</name>
                   <description>Descriptive text</description>
                   <intro>Introductory text</intro>
                   <copyright>Special copyright notice</copyright>
                   <sidebar>Sidebar text</sidebar>
                   <license>Special licence</license>
                   <provenance>Provenance information</provenance>
                </collection>
         </community>
</import_structure>

Desired outcomes:

  • sample file based on development environment and seed data
  • dev DSpace ready file from production Jupiter
@pgwillia
Copy link
Member Author

From @anayram

An XML skeleton schema (filed under the C2 Alberta folder) is ready as a model for community and collection metadata + hierarchies for import into DSpace.
community-collection_mappings_sample.xml

This first version is based on:

  1. Reports generated by Omar from the Rails console containing all available community and collection metadata. A review of these reports was conducted to evaluate fields' DSPace equivalencies and current usage in Jupiter. As a result, some fields can be ignored (more details documented in xml file). Other fields can be captured as provenance metadata.
  1. Jupiter schema definition file - schema.rb
  2. DSpace UI specs for communities and collections

@anayram
Copy link
Member

anayram commented Nov 28, 2024

From Scholaris

We tested importing a Community/Collection structure using the Structure Builder tool and by manually editing a Community via CSV import, and while it is possible to record provenance metadata using the dc.description.provenance field for Collections and Items, it appears that adding this information at the Community level is not supported in DSpace without requiring customizations to the core code. Additionally, the Structure Builder tool limits the metadata fields that can be added for a Community. The good news is that, aside from the provenance information, the import using the Structure Builder tool worked well (screenshot attached) which gives us confidence for taking this approach moving forward 🙂

We have posted to the DSpace Slack to ask if there are other workarounds we can try for retaining provenance information for Communities and will keep you posted if we hear back.

@lagoan
Copy link
Contributor

lagoan commented Nov 29, 2024

@anayram I am implementing the community/collection mapping to an xml file and found that the creators variable for collection can have multiple values and it can sometimes be nil.

After some investigation in staging I found that the only available values can be 

[
nil,
["ERA Administrator"],
["Institute of Health Economics"],
["Canadian Centre for Ecumenism"],
["Institute of Health Economics "],
["Alberta Heritage Foundation for Medical Research"],
["[email protected]"]
]

How should I handle the nil value? Would it be best to add an empty string or add that portion to the json structure?

@lagoan
Copy link
Contributor

lagoan commented Nov 29, 2024

Here is an example with synthetic data of the current state of the community-collection mapping implementation: community-collection_mapping_output_test_20241129.xml

@anayram
Copy link
Member

anayram commented Nov 29, 2024

Thank you @lagoan, if the value is empty or doesn't exist, don't add. This applies to all properties.

@lagoan
Copy link
Contributor

lagoan commented Dec 4, 2024

@anayram Attached is an XML file with the current state of the export communities and collections file to xml format.

I am finalising the development tests and will create a PR with this feature after that.

community_collection_2024-12-04-11-39-27.txt

Note, I added the .txt extension as github would not allow xml files.

@anayram
Copy link
Member

anayram commented Dec 4, 2024

@lagoan all looks good.

Would it be possible to create a full xml from the database for us to check?

Also, since the ability to correctly make provenance information for communities is not yet available, we will need to record this information at the community level as well. A revised xml skeleton model is available from community-collection_mappings_sample.xml

The main changes are to provenance sections. Please let me know if you have any questions!

@lagoan
Copy link
Contributor

lagoan commented Dec 5, 2024

I will get started on the revision xml skeleton you provided.

I can get an xml preview through the console from staging if that is OK. Staging database is close to production if I remember correctly.

I am almost done with the development tests and then I can create the PR. After the code review is done we can cut a release and get a fresh copy from production.

@lagoan
Copy link
Contributor

lagoan commented Dec 5, 2024

@anayram I completed an xml export for communities and collections with the latest mappings from staging. This file can be found here: community_collection_2024-12-05-12-03-21.xml

Please let me know if I can clarify anything.

@anayram
Copy link
Member

anayram commented Dec 5, 2024

All looks good @lagoan @pgwillia @sfarnel @leahvanderjagt

When this is generated from ERA production, and, since all communities and collection records are public (confirmed via reports), we can send this off to the Scholaris team for ingest.

I took a look at owner users and all are part of the ERA help team, so we can assume that editing collections and communities can be reserved to an Admin user group. When this is submitted to Scholaris it would be great to request that write permissions for these communities and collections be reserved to admin users.

Please let me know if you need anything else from me.

@lagoan
Copy link
Contributor

lagoan commented Dec 6, 2024

@anayram here is the community collection export xml file generated from production this morning community_collection_2024-12-06-10-05-46-prod.xml

@anayram
Copy link
Member

anayram commented Dec 6, 2024

All checks passed for me. This looks great and is ready to share with Scholaris for ingest into the ualberta-dev DSpace instance.

@sfarnel @pgwillia @lagoan after it's been ingested, we will need:

  • Check (or set) all communities and collections as writable only by Admin users only.
  • Get the output xml with all collections' and communities' new identifiers
  • Maybe some testing to see if all looks ok?

@pgwillia
Copy link
Member Author

pgwillia commented Dec 6, 2024

Do we want to try it in our DSpace sandbox? http://198.168.187.81:4000/home

@sfarnel
Copy link
Member

sfarnel commented Dec 6, 2024 via email

@anayram
Copy link
Member

anayram commented Dec 6, 2024

@pgwillia @sfarnel . Good idea, working on this with Omar as this is done through a backend tool (StructBuilder)

@sfarnel
Copy link
Member

sfarnel commented Dec 6, 2024

Thanks @anayram @lagoan

@anayram
Copy link
Member

anayram commented Dec 6, 2024

@pgwillia @sfarnel @lagoan
Done, ERA Communities and Collections added to DSpace test instance (use arrows to display collections). Tool documentation here.

@anayram
Copy link
Member

anayram commented Dec 9, 2024

All tests look good to me re: metadata included in the XML structure.

I noticed some collections have logos. Do we want to migrate those? I wonder if this would be manual work as the structure hierarchy tool to migrate communities and collections does not include a way to include references to images.

@lagoan can we check how many communities have logos?

Examples: Canadian Circumpolar Institute

@lagoan
Copy link
Contributor

lagoan commented Dec 9, 2024

@anayram Let me take a look at that.

@lagoan
Copy link
Contributor

lagoan commented Dec 9, 2024

Right now we have 148 Communities with logos. Note that Communities without specific logos attached to them show the ERA logo:
image

@pgwillia
Copy link
Member Author

pgwillia commented Dec 9, 2024

My instinct says that the community/collection XML is ready to share with Scholaris. Is there a reason why we should wait?

@sfarnel
Copy link
Member

sfarnel commented Dec 9, 2024 via email

@anayram
Copy link
Member

anayram commented Dec 10, 2024

Yes, this can be shared anytime unless any mappings are yet to be updated for provenance metadata @lagoan ?

@lagoan
Copy link
Contributor

lagoan commented Dec 10, 2024

Hi @anayram , I believe the file I shared earlier from production community_collection_2024-12-06-10-05-46-prod.xml does have the updated provenance metadata for collections.

Please let me know if I missed anything.

@lagoan
Copy link
Contributor

lagoan commented Dec 10, 2024

Here is a redacted snippet of a collection provenance information

{
  "dc.creator": "An Administrator",
  "ual.owner.community": "[email protected]",
  "ual.owner.collection": "[email protected]",
  "ual.jupiterId.community": "UUID",
  "ual.jupiterId.collection": "UUID",
  "ual.hydraNoid.community": "ID",
  "ual.fedora3UUID.community": "uuid:UUID",
  "ual.hydraNoid.collection": "ID",
  "ual.fedora3UUID.collection": "uuid:UUID"
}

@pgwillia
Copy link
Member Author

I don't know who the best person is to share. I can volunteer to send it and will ask the question about logos to see if Scholaris has suggestions.

@anayram
Copy link
Member

anayram commented Dec 10, 2024

@lagoan sorry, I didn't update the mappings properly. Please find additional mappings for dates and other updates. Thank you,

@lagoan
Copy link
Contributor

lagoan commented Dec 11, 2024

Will share an updated file soon.

@lagoan
Copy link
Contributor

lagoan commented Dec 11, 2024

@anayram here is the new xml export with the dates:

community_collection_2024-12-11-11-43-37-prod.xml

Please let me know if you need anything else.

@lagoan
Copy link
Contributor

lagoan commented Dec 12, 2024

Sorry @anayram , there is an error on that last export. I will upload a copy with the corrected dates tomorrow.

@lagoan
Copy link
Contributor

lagoan commented Dec 12, 2024

@anayram Here is the updated file community_collection_2024-12-12-12-05-00-prod.xml

@lagoan
Copy link
Contributor

lagoan commented Dec 12, 2024

@anayram This file has the changes to the 4 dates for created and updated values:

community_collection_2024-12-12-13-00-53-prod.xml

@anayram
Copy link
Member

anayram commented Dec 12, 2024

@lagoan @pgwillia @sfarnel

All looks good. Sending this over to Scholaris. @pgwillia what is your preference, should we keep this ticket open until we confirm transfer?

@pgwillia
Copy link
Member Author

Thanks @anayram! No preference. How ever you want to call it done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants