You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have never done much to remove bad records. For first impressions that might be worth doing a bit of.
In an attempt to do so, I've taken all institutions created before 2018 when we starting managing the data.
Removed everything that has a master source (i only remember that midway though). And everything that includes words like university, national park and some other stop words.
The remaining institution names was given to GPT4 which was asked to detect outliers, and rather include too many than too few.
GPT4 response
I see a mix of different types of organizations in your list, including universities, museums, botanical gardens, research institutions, and other organizations related to science and biodiversity. From your list, these are some of the entities that might be considered as outliers, although this does not mean they are irrelevant. It entirely depends on the context and purpose of your list.
Yamanouchi Pharmaceutical Co., Ltd.: while a legitimate organization, this is a pharmaceutical company which may not fit with the theme of research institutes, museums, or nature parks.
Timpanogos Cave National Monument: These are specific national monuments or natural sites, rather than organizations or institutions, but they could still be relevant depending on the context of your list.
Joshua Tree National Monument: This is a national monument rather than a nature park or educational/scientific institution.
Marwa mokhtar Abd Rabo: This appears to be an individual's name, not an organization.
N.P.P: Without further context, this abbreviation could refer to numerous things, but it does not clearly refer to an organization fitting your categories.
Tree Nursery: A tree nursery might not fit into the categories provided as it is more related to agriculture and horticulture.
Fort Hays: This could be referring to a historic site or a university. If it's the former, it may not fit in the categories.
AGRITEC, Ltd.: This seems to be a private company, and might not fit in the categories.
FCT: This abbreviation could stand for a number of things, but without further information, it's hard to tell if it fits in your categories.
Division of Standardisation: Without more context, this could refer to various entities, but it's not clear that it belongs to your categories.
CAIRCC: This abbreviation could stand for a number of things, but without further context, it's unclear how it fits your categories.
ALGOBANK: This seems like it might be a bank or a database of some sort, but it's unclear how it fits your categories.
The North Carolina Memorial Hostital: This appears to be a hospital, which might not fit into your categories, also the spelling might be incorrect (should be Hospital).
DNA-TAX: Without more context, it's not clear what this refers to. It could be a database or tool related to genetic research.
Instituto Zimotecnico-Z: This doesn't provide enough information to clearly categorize it. Also, the word Zimotecnico does not seem to correspond to a known institution or term.
Sanofi ELF Biorecherches: This seems like it could be a pharmaceutical company or research division, which might not fit your categories.
James Cook Townsville: This could be referring to a city or a campus of James Cook University. If it's the former, it doesn't fit into your categories.
Aix-Marseille Université: This is a comprehensive university and not specifically focused on natural sciences or biology.
All Saint's Church: This appears to be a religious institution, not an educational or research institution.
YMCA Hostel: This is a type of accommodation and doesn't seem to fit with the other institutions listed.
Auroville Foundation: This organization is related to a planned international city in India and may not be directly related to natural sciences or biology.
Beth Gordon Institute: This might be more focused on personal improvement and health, depending on the specific Beth Gordon Institute referred to.
Crispus Attucks High School: This is a high school and might not belong in a list with research institutes and universities unless you're also including secondary education.
County Record Office: This seems to be more related to administration and public records than scientific research or education.
Barton-on-Humber Grammar School: This is another school that might not belong in the list if you're focusing on research institutes and universities.
Willesden Borough Council: This is a governmental entity and might not belong on a list with research institutes and universities unless you're including governmental departments in general.
Shrewsbury School: This is another school that might not belong in the list if you're focusing on research institutes and universities.
San Joaquin County: This is a county and doesn't seem to fit with the other institutions listed.
Desert: Without further context, this entry is quite ambiguous. If it's supposed to be a nature park or a research
East High School: This seems to be a regular high school, not a university or research institute.
Turvey Abbey: While an abbey could potentially house a historical collection, if it doesn't, it might not fit the theme.
St. Paul's School: Similar to East High School, this appears to be a regular school.
St. Bartholomew's Hospital: A hospital doesn't usually fall under any of your categories unless it's associated with a university or has a significant museum.
Boys' High School: Unless this high school has a special collection or is associated with a university, it might not fit the theme.
School of Medicine for Women: This could be relevant if it's part of a university, but if it's a standalone school, it might not fit your categories.
Virginia Military Institute: It might be relevant if it's conducting scientific research or has a significant museum, but otherwise, it could be considered an outlier.
Stowlangtoft Hall: Without more context, it's unclear how this fits into your categories. If it's simply a residential hall, it might be an outlier.
Stoke-on-Trent Athenaeum: An Athenaeum could house a collection or be an institution of learning, but if it's primarily a social club or event venue, it might not fit your categories.
Tristan da Cunha: This appears to be a remote island, not an institution or organization.
Pitcairn: Another remote island that might not fit your categories unless there's a significant nature reserve or research station there.
Polytechnic of North London: It's unclear if this is a university or a secondary school. If it's the latter, it might not fit the theme.
Peper Harow: This appears to be a residential estate or small village, which might not fit your categories.
Plymouth Institution and Athenaeum: Like Stoke-on-Trent Athenaeum, it could be relevant if it houses a significant collection or is an institution of learning.
Oundle School: Appears to be a secondary school, which could be an outlier.
Ellesmere Chambers: Without more context, it's unclear how this fits into your categories.
Puslinch House: Seems to be a residential location, so it could be an outlier unless it houses a significant collection.
Paisley Philosophical Institute: Philosophical societies often have libraries or collections, but if this one doesn't, it might not fit your categories.
Murray Royal Institution: This seems to be a psychiatric hospital, which could be an outlier unless it's associated with a university or has a significant museum.
Hampshire Cultural Trust: This organization is broader than just natural sciences, biodiversity, or research as it promotes all kinds of cultural activities and heritage within Hampshire, UK.
Government of South Georgia: This is a governmental organization that has a broader purpose beyond just scientific research, collection, or conservation.
Forestry Commission: This is a government department in the UK responsible for the protection and expansion of forests and woodlands. Its scope may be broader or different from the other organizations on your list.
National Oceanic and Atmospheric Administration (NOAA): This is a scientific agency within the United States Department of Commerce. It focuses on the conditions of the oceans, major waterways, and the atmosphere, which might be broader than the other organizations on your list.
The Children's Hospital at Westmead: This seems to be a clear outlier. It is a hospital dedicated to pediatric healthcare and does not fit with the theme of universities, research institutes, and collections tied to biodiversity and natural sciences.
ILSbio, LLC Biobank: A private company providing biobanking services, which is different in nature from the mainly public or nonprofit entities on your list.
DOI/FWS, Kenai National Wildlife Refuge: This is a wildlife refuge managed by the U.S. Fish and Wildlife Service, which might have a more specific or narrower focus compared to some other organizations on your list.
Taxon Expeditions: This seems to be more of a tour or travel company, with a focus on scientific expeditions, rather than an institution, museum or collection.
Afribugs: From the name, it's difficult to discern whether this is an institution, scientific collection, museum, or other. It seems to imply a collection of some sort, but without further context, it may be worth checking.
HHS/NIH, National Institutes of Health: While this is a governmental department, its focus is health and medical research, so it may not align with the natural or scientific focus of the other entities.
Microbial Carbohydrate Resource Bank: This sounds like a biotechnological resource, which might not fit the 'nature parks, museums, and universities' context perfectly.
Lamont-Doherty Core Repository, Gulf Coast Repository, Louisiana Geological Survey Core Repository: These repositories likely consist of scientific collections, but their alignment with the rest might depend on the specific focus of your study or project.
U.S. Environmental Protection Agency: This is a governmental department with a focus on the environment, but it may or may not be directly relevant depending on the context.
The Bacteriophage Bank of Korea: This is a biological resource center, and its inclusion depends on whether biological collections of this sort are relevant to your criteria.
Remember that these are just potential outliers and it's always good to check their relevance depending on your specific context or project.
The text was updated successfully, but these errors were encountered:
We have never done much to remove bad records. For first impressions that might be worth doing a bit of.
In an attempt to do so, I've taken all institutions created before 2018 when we starting managing the data.
Removed everything that has a master source (i only remember that midway though). And everything that includes words like
university
,national park
and some other stop words.The remaining institution names was given to GPT4 which was asked to detect outliers, and rather include too many than too few.
GPT4 response
I see a mix of different types of organizations in your list, including universities, museums, botanical gardens, research institutions, and other organizations related to science and biodiversity. From your list, these are some of the entities that might be considered as outliers, although this does not mean they are irrelevant. It entirely depends on the context and purpose of your list.
Remember that these are just potential outliers and it's always good to check their relevance depending on your specific context or project.
The text was updated successfully, but these errors were encountered: