Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for different join types #84

Open
jpacik opened this issue Oct 2, 2018 · 30 comments
Open

Add support for different join types #84

jpacik opened this issue Oct 2, 2018 · 30 comments

Comments

@jpacik
Copy link
Contributor

jpacik commented Oct 2, 2018

Currently join implements the inner join operation. Join needs to support the following types of joins as well:

  • cross product join
  • left outer join
  • right outer join
  • full outer join
@jpacik jpacik added the enhancement New feature or request label Oct 2, 2018
@mark-rushakoff
Copy link
Contributor

Tasks could really use some of the other joins for retrieving task run logs. I think we can work around it in Go, but it would be a lot simpler if we could do it in straight Flux.

@lovasoa
Copy link

lovasoa commented Dec 20, 2018

There is a discrepancy between the documentation and the actual product.

The documentation says:

method

The method used to join. Defaults to "inner".
Data type: String
Possible Values:

  • inner
  • cross
  • left
  • right
  • full

But when using anything other than inner, we get an error, such as

error calling function "join": left is not a valid join type

I think the documentation should be updated to reflect the fact that only inner is currently supported.

@legomind
Copy link

Any chance this issue will get some attention in the near future? A left join would be very handy for my use case.

@jpacik
Copy link
Contributor Author

jpacik commented Mar 19, 2019

@legomind I'll try and get this prioritized for you. We currently have some other things in progress at the moment, but I'll let you know when we start work on this.

@nathanielc
Copy link
Contributor

@legomind Could you share your use case? Seeing an example Flux script of what you are trying to do along with the data schema you are using would be very helpful. Even if the Flux script doesn't work, just seeing how you would expect it to would is great.

@legomind
Copy link

@jlapacik Thanks that would be awesome.
@nathanielc My use case is pretty simple, but highly specific. I have two energy meters one at the circuit breakers and one attached to a dehumidifier. The circuit breaker energy monitor reports a measurement for most of the breakers as well as the total house use. It also reports the difference between all the monitored individual breakers (all combined) and the total usage, leaving me with the combined energy use of the remaining, unmonitored breakers (called 'other' in this example).

My second energy monitor is attached directly to the dehumidifier and the measurements are recorded to another db.

My goal is to take subtract the dehumidifiers energy use from the 'other' measurement.
The following script works fine for me until the dehumidifier in turned off, then of course, there is nothing to match to, and the rest of the graph data is omitted.

I am very new influxdb and any suggestions or improvements will be welcome.

other_total = () => {
	other = from(bucket: "iotawatt/autogen")
  		|> range($range)
  		|> filter(fn: (r) => r._measurement == "other_total" and r._field == "value")
  		|> aggregateWindow(every: $__interval, fn: mean)
    	|> keep(columns: ["_value", "_time"])
  
	d = from(bucket: "home_assistant")
 		|> range($range)
 	 	|> filter(fn: (r) => r._measurement == "switch.living_room_dehumidifier" and r._field == "power_consumption")
 	 	|> aggregateWindow(every: $__interval, fn: mean)
  	 	|> keep(columns: ["_value", "_time"])
  
	return join(tables: {o:other, d:d}, on: ["_time"])
		|> map(fn: (r) => ({
        	_time: r._time,
        	_value: r._value_o - r._value_d,
                _measurement: "Other",        }))
}

other_total() |> yield(name: "Other")

@jpacik
Copy link
Contributor Author

jpacik commented Mar 20, 2019

@legomind so if the dehumidifier is turned off for a period of time, currently join would not produce any rows for that time range. But you would like it to still produce rows for the the other measurement, even if the dehumidifier is turned off? Am I understanding that correctly?

@legomind
Copy link

Correct.

@metareason
Copy link

I also got tripped up by this one trying to use left join with a very similar use case to @legomind - in my case I am trying to drop one series to 0 when another 'cutter' series is > 100, and am getting infrequent data from the 'cutter' series and would like to try with a left join.

// join(tables: {tt2:tt2, pt:pt}, method: "left") // error in evaluating AST while starting program: error calling function ""map"": error calling function ""join"": left is not a valid join type
join(tables: {tt2:tt2, cut:cut}, on: ["_time", "_stop", "_start"])
  |> map(fn: (r) => ({
    _time: r._time,
    _value: if r._value_cut >= 100.0 then 0.0 else r._value_tt2
  }))

NOTE: I am also fighting with combinations of
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
vs
|> aggregateWindow(every: 1m, fn: mean, createEmpty: true)

and using |> fill(column: "_value", usePrevious: true) to try and fill in the cutter series so I can use the inner join but seems like that is not working as expected yet.

@legomind
Copy link

@metareason Here is to hoping this is addressed soon with the addition of left outer joins.

@legomind
Copy link

legomind commented Oct 8, 2019

Just wondering if there was any movement on this. I am more than willing to test if needed.

@jpacik
Copy link
Contributor Author

jpacik commented Oct 8, 2019

We will be addressing join's deficiencies at some point in the future. In the meantime we have added a new join function to the internal/promql package with a type signature that more closely reflects what we want for the current join.

Note, promql.join is a specialized inner join function that can also perform left and right semi joins as well. Here's the doc https://github.com/influxdata/flux/blob/master/stdlib/internal/promql/join.md.

@jimirocks
Copy link

Any chance for left joins in the near future? This lack of functionality is basically showstopper for some of my usecases.Since when there is really no data in requested range, even fill function can't help

@jimirocks
Copy link

I am also not able to use promql.join the API says unknown import path: ""internal/promql""" "

@darinfisher
Copy link

I recently had a use case where there are two similar streams, each containing at least a URL as a key. Some of the URLs are in both streams and other are in only one.
I tried to join() these and the resulting stream contained only the URLs that appeared in both streams, an inner join.
I needed the list to contain all of the URLs from both streams, an outer join.

@lavensj
Copy link

lavensj commented Aug 24, 2021

Hi there, I've recently elaborated further on this topic concerning Joining in the Flux query language. In the article below you can find a tutorial on how to perform a (full) outer join in flux applied on a realistic use case accompanied with some side notes about the previous lack of a decent join implementation: https://dzone.com/articles/how-to-pivot-and-join-time-series-data-in-flux

@DLCrosby
Copy link

DLCrosby commented Sep 23, 2021

This is for Flux (InfluxDB V2)

In the UI, documentation on join() function includes:

method:String
The method used to join. Possible values are: inner, cross, left, right, or full. Defaults to "inner".

However when I set method to left, I receive an error that left is not a valid join type

"error calling function "join" @20:1-23:2: left is not a valid join type"

Seems this has been an issue for a couple years now. Will this ever be resolved? I really need to be able to perform a left join - part of my POC of using InfluxDB.

@lavensj
Copy link

lavensj commented Sep 27, 2021

Hi @DLCrosby,
Checkout my comment above referring to the article on Dzone. At the end of the article, you can find a solution on how to perform an outer join (which will give you a similar result as you are looking for when using a left join).

@DLCrosby
Copy link

@lavensj I will look at that again, however the issue remains - Documentation shows other joins besides "inner", however only "inner" works. When will the product be fixed to support additional join types? Given this was first brought up years ago, it is supremely disappointing that the only answer is a complex workaround.

@a-vogel-tappert
Copy link

Hi, any plans to implement outer joins short termed?
Using unions, sorts, limits, aso are very ressource consuming operations to get a result.
Thanks in advance.

@ivalkenburg
Copy link

This would solve so many problems im having

@MihaelBercic
Copy link

4 years after and still not implemented?

@Loeni01
Copy link

Loeni01 commented Apr 6, 2022

Are there any plans to implement other types of joins? When will it be done? I'm looking very much forward to this feature.

@sanderson
Copy link
Contributor

@Loeni01 Work is happening on adding new join types right now. I don’t have an exact timeframe for when they’ll be available, but it will be soon. They will be a part of a new join package. Here’s the Initial spec for the package: #4629

@nathanielc
Copy link
Contributor

Thanks @sanderson for the update, you can follow our progress on this issue here where we have broken out the various steps #3459

@Elleanora
Copy link

Elleanora commented Apr 27, 2022

Until the full join functionality is implemented, a good workaround is to use the combination of the union() and pivot() functions.

Add the following code to final output stream for any table to be used in the downstream union operation:

table_1 = from(bucket: "some_bucket_1")
|> range(start: -1d)
|> map(fn: (r) => ({
join_column_name: r.join_column_name,
_field: "choose_new_column_name_1",
_value: column_value_1
}))
|> group()

table_2 = from(bucket: "some_bucket_2")
|> range(start: -1d)
|> map(fn: (r) => ({
join_column_name: r.join_column_name,
_field: "choose_new_column_name_2",
_value: column_value_2
}))
|> group()

union(tables: [table_1, table_2])
|> group()
|> pivot(rowKey:["join_column_name"], columnKey: ["_field"], valueColumn: "_value")
|> group()

Copy link

This issue has had no recent activity and will be closed soon.

@legomind
Copy link

Keep open

@sanderson
Copy link
Contributor

@legomind Flux supports Inner, right, left, and full outer joins with the join package. The only join type listed in this issue that isn't supported is a cross join. Is there a reason you'd like to keep this issue open?

@legomind
Copy link

Just my ignorance. Lol. Had no idea about that package. Close away...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests