FAIR and CARE #192
Replies: 17 comments 7 replies
-
Open data provide baseline information for researchers and curious members of the public to determine trends without needing to go through the process of collecting data themselves. A single dataset can be used for multiple types of research questions, and open, easily findable data are able to provide a level of standardization across research that wouldn’t be possible if various private datasets were to be used for those same analyses. Following the FAIRification framework as outlined under the FAIR Guiding Principles is a good starting point for data managers to make datasets more open and reusable, but there are challenges that limit the ability to make this happen. If data are highly standardized to begin with, then metadata fields are likely unique and have unique identifiers. If datasets are old and have transitioned through multiple management standards, this information might not be as “findable.” Making datasets free and available to the public is an important approach to making data open and reusable, but trust and credibility are important factors in providing open and free data. Not all users will look for QA/QC practices underlying the data collection, which could lead to individuals publishing work based on faulty data, or the publishing of work based on data collected from unreliable sources. The ability to identify a credible source is a critical proficiency for all data users to have. |
Beta Was this translation helpful? Give feedback.
-
Researchers often call for open data in an effort to make scientific research more reproducible. By increasing the transparency of datasets and their data provenance, researchers enable improved evaluations of publication credibility. However, FAIR data principles do not fully consider two key elements in the data lifecycle: data ownership and data application. The Reusable principle does indicate that metadata releases maintain a data usage license, but how these licenses are granted and who may grant them is unclear. With respect to applications, some licenses may state in what context data usage is protected, but that is also not specified. CARE data principles complement FAIR principles and assert Indigenous governance of Indigenous data and knowledge. This framework extends beyond the phrase commonly used by inclusion advocates, “nothing about us without us,” and upholds Indigenous authority to control Indigenous data. The Indigenous Data Sovereignty (IDSov) movement is reinforced by the practices of Indigenous Data Governance (IDGov) wherein tribal nations and communities have both the right and responsibility to steward data ecosystems, aligning research data and collective benefit with cultural governance protocols. The component of data governance is crucial; this is evidenced by instances of public agencies incorporating Indigenous knowledge into their research efforts without centering tribal leadership in the process of research generation. In sum, CARE supports the use and reuse of data while retaining a code of ethics that prioritizes Indigenous Peoples’ rights and wellbeing throughout the data lifecycle. |
Beta Was this translation helpful? Give feedback.
-
I think one of the main reasons open data is important is because of the accessibility aspect. Data that holds important information shouldn’t be sequestered just to people that have degrees or a certain status in society. If data access was only reserved for people with certain degrees or statuses, that would create a huge power imbalance. Data can be interpreted and synthesized in misleading and confusing ways. If people only have access to those interpretations, and not the data itself, people can be misinformed or not have the full picture. To help make my data more open and reusable, organization would be a big thing for me. I remember doing school science projects and not doing a good job of documenting the conditions the data was collected in. I also struggled with formatting data in a clear way. When I would go back to look at the data later, I didn’t always remember the explicit details or I would be confused by my formatting and labels. I can’t imagine how confusing that data would be to someone who wasn’t me and had even less information than I did. In terms of drawbacks, I think about the ethics behind human data. Do people know that their data could be “open” to certain extents and did they agree to it? I wonder how HIPPA laws intersect with FAIR and CARE principles. Like the CARE Principles for Indigenous Data Governance point out, there’s also drawbacks when power differentials and historical contexts are ignored. The knowledge and data of certain groups of people have been misused and exploited in the past. Those groups of people deserve to protect and control their data and knowledge so they aren’t exploited further. |
Beta Was this translation helpful? Give feedback.
-
Having open data ensures that everyone has access to it, is able to interoperate and interact with it, and can manipulate or reuse for their own purposes. All these aspects help improve the flow of information and knowledge. I can see several ways to keep my own data more aligned with open data principals. First would be to document and record my data, how it works, and it's sources. Second would be to keep it regularly updated until it is no longer feasible or necessary to do so, and indicate when and why that happened. Third would be to post my data in a public place, like GitHub, with clear instructions on how to access it and use it. Feel like I am generally do an ok job of trying to document my data, but sharing it for public use will be something new to me. The drawbacks I see is misusing data from marginalized communities, as the CARE principles were training to protect against for indigenous data. I also see corporations or individuals taking some open data, and reusing it in an non-transparent way that has no benefit, or at worst, a negative impact, on the community that provided the data in the first place |
Beta Was this translation helpful? Give feedback.
-
Open data is super important for advancing science, policy, and management, and getting the most out of expensive and labor intensive sampling/data collection. Not only can other people use open data to expand on research and come up with new, exciting ways to apply the data, but open data also allows science to be reproducible and transparent. As a scientist, I can make my data more open and reusable by following the FAIR principles, publishing my data online, and making sure that the dataset and metadata are clear and understandable by other potential users. Personally, I have benefitted from open data in prior projects (mostly continuous monitoring stations of environmental parameters), and this has been a huge help to my research, as I didn't have to use my own limited resources and time to collect that same data. Some drawbacks I see to open data are publishing information about communities/areas without permission of the people directly involved, and that data being used against them (such as by insurance companies or other big corporations). Additionally, scientists may be hesitant to publish their data in case someone else beats them to an analyses they may have wanted to do in the future. |
Beta Was this translation helpful? Give feedback.
-
Open data is valuable in several ways. Foremost, it supports:
Making data 'open' involves proactive effort son the part of the data creator. According to the FAIR Data Principles, there are certain steps that make data more 'open'.
Potential drawbacks to open data include:
|
Beta Was this translation helpful? Give feedback.
-
I have a friend who is a member of the Mohawk tribe. She has one name. Not a first name and last name - just her name. She adds nfm (no first name) to her name when filling out online forms that require you to put in something. So FAIR principles which have good intentions can be at odds with cultural differences. I think having a DOI for your data and descriptive metadata is very helpful, but I thought the RadioLab episode really highlighted the differences between sharing information between actual people as opposed to being filtered through a computer. I'm not anti-technology (obviously since I'm taking this class), but I think we need to broaden our ideas for open data. Unfortunately, I'm pretty new to all this so I don't have any practical suggestions. |
Beta Was this translation helpful? Give feedback.
-
One area of open data I have experienced in the past is the Federal Geospatial Data Consortium (FGDC) and I have a friend that is working on FGDC compliant metadata for the USDA Economic Research Service. |
Beta Was this translation helpful? Give feedback.
-
Open data is important because it contributes and makes data available for everyone to use. The data is easily accessible and decreases restrictions people may run into when running their analysis. As a scientist I heavily rely on open data for many of my projects and applications. And it provides me with confidence that I can create a workflow because I know that there is a likelihood that the data will be available. However, as there are risks with any data, open data must be reliable and viably sourced. Therefore, more research may be involved on making sure that the data you are using is accurate and metadata includes trusted resources. |
Beta Was this translation helpful? Give feedback.
-
Reading about the FAIR principles, I can begin to see how Open Data and Open Science have the potential to usher in a new age of innovation. If data scientists follow principles like these to make their data findable, accessible, interoperable, and reproducible, then the whole system becomes more streamlined. Furthermore without some kind of system like this, the internet would become an impenetrable ocean of disorganized information. FAIR systematizes the structure by which data can be acquired and manipulated to produce meaningful results, which goes a long way towards breaking the data ocean into meaningful components. CARE addresses several issues that come up in working with indigenous knowledge. In some ways, these are similar to ownership issues (for example, the reason you should cite an academic paper when you write about it). Indigenous knowledge is something that may have been painstakingly built over generations, and as such it is owned by those groups of people. There’s also a question of the impact created by releasing indigenous information, and the uses to which it may be put, intended or unintended. For example, releasing knowledge about a particular wild medicinal plant might cause it to be overharvested. In general, it’s important to approach any work involving indigenous data with cultural sensitivity and a willingness to understand and respect data sovereignty. |
Beta Was this translation helpful? Give feedback.
-
Open data is important because it promotes the reuse of data in ways that both the metadata and data can be replicated or combined in different settings. Some ways that I can make my data more open and re-useable include: having a globally unique and persistent identifier, generous and extensive metadata, registering or indexing (meta)data in a searchable source, use standardized communication protocol, explaining how data is accessible under certain conditions, use a broadly applicable language, include qualified references to other (meta)data, and taking care in choosing a repository that allows for FAIR Principles. While doing this reading I was reflecting on my own work so far. One interesting example I will call myself out on is the ‘(meta)data including qualified references to other (meta)data’ because I question if my First Map post would be meeting this. I included a section from a paper I wrote as part of my description (cited myself), but this paper was never published, so I would be referencing something that could not be found by others/ doesn't have a globally unique persistent identifier and wouldn't be meeting the FAIR data principles. (Now thinking I should change this or not quote myself?) The CARE Principles are also important to take into account when trying to make my data more open and re-useable. The CARE Principles also highlight the drawbacks of Open Data that are in many ways similar to the art and archaeology areas, that have failed and in many ways continue to fail with respecting the authority of Indigenous Peoples to control their own data (art, artifcats, etc.) and if it’s use is permitted under certain conditions making sure that use is benefiting Indigenous People’s self determination and collective benefit. I think these CARE Principles while are specifically data related, their main points also cross industries. Many of you may already be aware of the Denver Art Museum and their recent coverage in the news from earlier this year - Denver Art Museum Reparation Requests there’s also an article on the Denver Post but you have to be a member to access this particular article - Denver Post Article - Must be a member to access In contrast, History Colorado, actively works with over 51 sovereign nations in their preservation work, has a Land Acknowledgement, and Anti-Racism Grounding Virtues . I interact with History Colorado quite frequently and would be biased but they do great work at respecting the sovereignty of Indigenous People's rights to stories recorded, artifacts, art etc. and making sure the ethical use of these things are for collective benefit. |
Beta Was this translation helpful? Give feedback.
-
Why is open data important? Data is only as useful and powerful once it has been interpreted or utilized. Making the data open gives that power of interpretation or use to all, not just a single person or entity. What are some ways that you as a data manager or scientist can make your data more open and reusable? By following the FAIR principles. Most likely by following an already standardized data sharing method. One way my team does this is by using the : ICARTT file formats What are some potential drawbacks of open data? Poor , or even maleficent, use of data is possible. For example, geo-location data (even when anonymized) could be used to identify people. Specialized datasets may also be poorly interpreted or misunderstood to find conclusions that are not robust. |
Beta Was this translation helpful? Give feedback.
-
Open data are very useful for science reproducibility and for research opportunities for young coders, students, and even PhDs early in their career trying to publish papers in order to prepare stronger grant applications. There are numerous types of data that are not shareable, however, including most health data and some social science data, due to privacy restrictions and confidentiality guarantees. Researchers should think about open data at the time they are collecting initial data to make sure the right permissions and approvals and documentation of such are in place. GitHub and personal websites are good places to post data files in common (interoperable) formats. I am less sure about the permanent global identifier and registration piece. There are probably many options for these that might be field-specific, which would limit the potential reach. I've been a data analyst a long time and am unfamiliar with any "global" repositories for datasets, and what incentives there are (beyond altruism) for undergoing all of the steps to make data "FAIR". |
Beta Was this translation helpful? Give feedback.
-
Open data allows anyone to access it. This improves the transparency of the science involved by giving the opportunity to anyone to reproduce results or to ask related questions without having to collect data themselves. It may also give more credibility to the dataset itself, because more people are able to look for flaws in the data or in the collection process. |
Beta Was this translation helpful? Give feedback.
-
FAIR is about making sure others can navigate to and use data. Navigation involves being able to find something useful and being able to access it on another device (interoperable across all computers or web devices). Beyond data, it involves making sense of data by at the least reproducing what was done earlier. CARE is about respect and trust. It was developed from an Indigenous lens, but is more broadly applicable to all, and in particular making information (data, metadata, code, etc.) available to all regardless of expertise. This includes explaining what is there in plain language and with reference to authorship, ownership and the like, with particular attention to those who may not have a voice, including non-human relations and place, including plants, animals, and water, land and air. Both FAIR and CARE are valuable mindsets to plan and design digital data resources and code/tool collections. They both imply careful attention to the FAIR and CARE are limited by showing one how to actually implement them. A variety of engineering |
Beta Was this translation helpful? Give feedback.
-
What are some ways that you as a data manager or scientist can make your data more open and reusable? |
Beta Was this translation helpful? Give feedback.
-
Overall, the FAIR and CARE principles offer complementary approaches to data management and sharing, with the FAIR principles focusing on the technical aspects and the CARE principles addressing the ethical and social considerations. Implementing both sets of principles can help ensure that data is managed and used in a technically sound and socially responsible way. The CARE principle lacks political teeth in many ways, but it is a start to give tribes and universities they partner with a platform to go from. |
Beta Was this translation helpful? Give feedback.
-
For the discussion board this week on open data, go ahead and check out:
Some questions to get you started:
As an extra, I recommend listening to this RadioLab episode called NULL. Note that you DO NOT NEED A DISCUSSION POST ON THIS. One of the things we're learning about is the differences between how computers interpret things and humans interpret them -- and this episode is a funny and accessible entry-point to these concepts. Think about these examples as you are running into your own errors -- perhaps with not quoting things that need to be quoted, or quoting things that shouldn't be quoted, or putting dashes into variables names (Python interprets that as a minus sign!).
Beta Was this translation helpful? Give feedback.
All reactions