Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssl_taxoname vs. most recent entry in petaxhistory #8

Open
dylanbeaudette opened this issue Sep 14, 2017 · 7 comments
Open

ssl_taxoname vs. most recent entry in petaxhistory #8

dylanbeaudette opened this issue Sep 14, 2017 · 7 comments

Comments

@dylanbeaudette
Copy link
Member

What is the logic used to generate ssl_name in the current snapshot, and when should the most recent record in the petaxhistory table be used instead?

@dylanbeaudette
Copy link
Member Author

dylanbeaudette commented Sep 14, 2017

select count(distinct pedlabsampnum) from site;
 count
-------
 64153

select count(distinct pedlabsampnum) from nasis_site;
 count
-------
 80462

Reasons, as suggested by Henry F.

  • RaCA (most)
  • IA, MN, TX: preparing pedon objects done, but data not in repository
  • junk in ID

@dylanbeaudette
Copy link
Member Author

dylanbeaudette commented Sep 14, 2017

How many pedlabsampnum values are present in one but not the other table?

3135 rows

SELECT DISTINCT site.pedlabsampnum as kssl_plsn, nasis_site.pedlabsampnum as nasis_plsn
FROM kssl.site 
LEFT JOIN kssl.nasis_site USING (pedlabsampnum)
WHERE nasis_site.pedlabsampnum IS NULL;

19444 rows

SELECT count(DISTINCT nasis_site .pedlabsampnum)
FROM kssl.site 
RIGHT JOIN kssl.nasis_site USING (pedlabsampnum)
WHERE site.pedlabsampnum IS NULL;

@dylanbeaudette
Copy link
Member Author

How many records in the KSSL snapshot have a SSL_name that does not = latest record in the petaxhistory table?

about 1500

SELECT pedon_key, nasis_site.pedlabsampnum, nasis_site.peiid, site.taxonname, nasis_taxhistory.taxonname as tn, 
kssl_to_nasis_sanity.labdatadescflag, kssl_to_nasis_sanity.objwlupdated
FROM kssl.site
JOIN kssl.kssl_to_nasis_sanity ON site.pedlabsampnum = kssl_to_nasis_sanity.pedlabsampnum
JOIN kssl.nasis_site ON kssl_to_nasis_sanity.peiid = nasis_site.peiid
JOIN kssl.nasis_taxhistory ON kssl_to_nasis_sanity.peiid = nasis_taxhistory.peiid
WHERE nasis_taxhistory.taxonname != site.taxonname;

A single example:

SELECT pedon_key, nasis_site.pedlabsampnum, nasis_site.peiid, site.taxonname, nasis_taxhistory.taxonname as tn, 
kssl_to_nasis_sanity.labdatadescflag, kssl_to_nasis_sanity.objwlupdated
FROM kssl.site
JOIN kssl.kssl_to_nasis_sanity ON site.pedlabsampnum = kssl_to_nasis_sanity.pedlabsampnum
JOIN kssl.nasis_site ON kssl_to_nasis_sanity.peiid = nasis_site.peiid
JOIN kssl.nasis_taxhistory ON kssl_to_nasis_sanity.peiid = nasis_taxhistory.peiid
WHERE nasis_taxhistory.taxonname != site.taxonname
AND site.taxonname = 'rescue';
 pedon_key | pedlabsampnum | peiid  | taxonname |   tn   | labdatadescflag |    objwlupdated
-----------+---------------+--------+-----------+--------+-----------------+---------------------
     52979 | UCD7305066    | 533246 | Rescue    | Sierra | No              | 2016-06-24 20:01:56
(1 row)

@dylanbeaudette
Copy link
Member Author

Records in kssl.site that have no correspdonding record in kssl.nasis_site OR kssl.nasis_taxhistory.

3115 rows

SELECT pedon_key, site.pedlabsampnum, site.taxonname, 
nasis_site.pedlabsampnum, nasis_site.peiid, nasis_taxhistory.taxonname as nasis_tn, 
kssl_to_nasis_sanity.labdatadescflag, kssl_to_nasis_sanity.objwlupdated
FROM kssl.site
LEFT JOIN kssl.kssl_to_nasis_sanity ON site.pedlabsampnum = kssl_to_nasis_sanity.pedlabsampnum
LEFT JOIN kssl.nasis_site ON kssl_to_nasis_sanity.peiid = nasis_site.peiid
LEFT JOIN kssl.nasis_taxhistory ON kssl_to_nasis_sanity.peiid = nasis_taxhistory.peiid
WHERE nasis_site.peiid IS NULL
AND nasis_taxhistory.peiid IS NULL;

@dylanbeaudette
Copy link
Member Author

Some ideas on how to publish a simpler representation in this related issue.

@dylanbeaudette
Copy link
Member Author

dylanbeaudette commented Apr 19, 2018

Update:

There is a new table that makes a reasonable linkage between KSSL -- > NASIS site/pedon/taxa:

--
-- enforce sanity when connecting KSSL "site" to NASIS data
-- create a new table that ensures 1:1 between labsampnum:peiid AND excludes bogus pedlabsampnum (e.g. ".", "none", "None")
--
DROP TABLE kssl_to_nasis_sanity;
CREATE TABLE kssl_to_nasis_sanity AS
SELECT DISTINCT ON (pedlabsampnum) pedlabsampnum, peiid, labdatadescflag, objwlupdated
FROM nasis_site
WHERE pedlabsampnum IS NOT NULL
AND pedlabsampnum NOT IN ('.', '-', 'none', 'None', '1', 'Unknown', '5', '6', '3')
-- pedon precidence: lab pedon = YES, latest modification date, highest peiid
ORDER BY pedlabsampnum, labdatadescflag DESC, objwlupdated DESC, peiid DESC;

However, this will implicitly exclude records in the KSSL data that:

  • have no corresponding NASIS data
  • are not linked to NASIS/site/pedon objects via pedlabsampnum and labdatadescflag flag checked

The orphaned KSSL records still have limited site / spatial / taxa information that is worth retaining.

A good example, pedon_id = 'S2011MI041003'.

# fine this record in NASIS site data: OK
select peiid, site_id, pedon_id, pedlabsampnum, labdatadescflag, taxonname from nasis_site join nasis_taxhistory using (peiid) where site_id = 'S2011MI041003';
 peiid  |    site_id    |   pedon_id    | pedlabsampnum | labdatadescflag | taxonname
--------+---------------+---------------+---------------+-----------------+-----------
 586039 | S2011MI041003 | S2011MI041003 |               | No              | Rubicon

# find this record in KSSL site: OK
select pedon_id, taxonname, pedlabsampnum from site where pedon_id = 'S2011MI041003';
   pedon_id    | taxonname | pedlabsampnum
---------------+-----------+---------------
 S2011MI041003 | Rubicon   | 12N7863
(1 row)

# this record is not in our sanity-checking table
# because: pedlabsampnum is not populated in NASIS!! (checked 2018-04-19)
select * from kssl_to_nasis_sanity where pedlabsampnum = '12N7863';
 pedlabsampnum | peiid | labdatadescflag | objwlupdated
---------------+-------+-----------------+--------------
(0 rows)

Next steps:

  • add these missing records to the kssl_to_nasis_sanity table
  • ensure that all queries via taxonname are utilizing this table:
    • fetchKSSL and downstream
    • lab data table in SDE
    • KSSL points in SEE
    • use x,y coordinates from NASIS site vs. KSSL site

@dylanbeaudette
Copy link
Member Author

dylanbeaudette commented May 4, 2018

Another example, user pedon / site ID 79CA000003, pedlabsampnum 80P0159, pedon_key 7746.

Basic taxonomic data in repository (kssl.site), skeleton site in NASIS with no taxonomic data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant