Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requery_hier not returning the same number of rows anymore in 2.2.0 #247

Open
FredrikKarlssonSpeech opened this issue Apr 12, 2021 · 3 comments

Comments

@FredrikKarlssonSpeech
Copy link

I just noticed an issue with to the 2.2.0 update and requery_heir

This is what used to happen:

> library(dplyr)
> library(emuR) #Loads 2.1.1
> library(tidyr)
> 
> 
> load_emuDB(file.path("..","Data","GU_emuDB")) -> gu
> query(gu,"[CV = C|V ^ Task=pa|ta|ka]") -> patakaC_V
> requery_hier(gu,patakaC_V,level="Syllable",collapse=FALSE) -> patakaCVSylls
> dim(patakaC_V)
[1] 34293    16
> dim(patakaCVSylls)
[1] 34293    16

Now, if I update to the 2.2.0 version of the package, I do not get the expected behavior:

> remove.packages("emuR")
Removing package from ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library’
(as ‘lib’ is unspecified)
> install.packages("emuR")
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/emuR_2.2.0.tgz'
Content type 'application/x-gzip' length 2824883 bytes (2.7 MB)
==================================================
downloaded 2.7 MB


The downloaded binary packages are in
	/var/folders/vc/lhvg_40x50l3nb3rndb4kwbm0000gp/T//RtmpLqlYhm/downloaded_packages

> library(emuR)

Attaching package: ‘emuR’

The following object is masked from ‘package:base’:

    norm

> requery_hier(gu,patakaC_V,level="Syllable",collapse=FALSE) -> patakaCVSylls
Warning message:
In requery_hier(gu, patakaC_V, level = "Syllable", collapse = FALSE) :
  Length of requery segment list (17181) differs from input list (34293)!
> dim(patakaCVSylls) # Just to check...
[1] 17181    16
@raphywink
Copy link

Ok this would not be good... I spent about 100 hours rewriting almost the entire query engine to actually fix issues like this. Could you maybe send me a reprex with the output you'd expect from the query? And maybe also confirm that the old result was correct?

@raphywink
Copy link

I fixed something in the requery which accidentally got rid of duplicate segments in certain queries. Could you maybe check if the current dev version (2.2.0.9000) fixes the issue? If so then I'll try to release a new version of emuR asap...

@FredrikKarlssonSpeech
Copy link
Author

Sorry, I did not see your previous message but I have installed a

> requery_hier(gu,patakaC_V,level="Syllable",collapse=FALSE) -> patakaCVSylls
Warning message:
In requery_hier(gu, patakaC_V, level = "Syllable", collapse = FALSE) :
  Length of requery segment list (17181) differs from input list (34296)!
> patakaCVSylls %>%
+     distinct() -> patakaSylls
> #requery_hier(gu,patakaSylls,"Task") -> patakaSyllTasks #Not needed?
> requery_hier(gu,patakaC_V,"Task") -> patakaCVSyllTasks
Warning message:
In requery_hier(gu, patakaC_V, "Task") :
  Found missing items in resulting segment list! Replaced missing rows with NA values.
> requery_hier(gu,patakaC_V,level="Syllable",collapse=FALSE) -> patakaCVSylls
Warning message:
In requery_hier(gu, patakaC_V, level = "Syllable", collapse = FALSE) :
  Length of requery segment list (17181) differs from input list (34296)!
> nrow(patakaCVSylls)
[1] 17181
> patakaCVSylls %>%
+     distinct() -> patakaSylls
> nrow(patakaSylls)
[1] 17181
> requery_hier(gu,patakaC_V,"Task") -> patakaCVSyllTasks
Warning message:
In requery_hier(gu, patakaC_V, "Task") :
  Found missing items in resulting segment list! Replaced missing rows with NA values.
> nrow(patakaCVSyllTasks)
[1] 34296

So it seems that the issue remains. I would expect the introduced NAs as the linking is off in 4 instances, but it seems that what the requery_hier returns is a list of unique segments.

The patakaC_V contains simply C and V:s, which in pairs belong to a syllable. So predominately, patakaCVSylls should predominately contain two identical rows for each syllable. (Except for cases where this does not hold then).

So, this code:

> nrow(patakaCVSylls)
[1] 17181
> patakaCVSylls %>%
+     distinct() -> patakaSylls
> nrow(patakaSylls)
[1] 17181

should not return the same result actually. The first nrow should be 34296.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants