Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create format for Shapelets #13

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

JoshVStaden
Copy link

No description provided.

@JoshVStaden
Copy link
Author

@sjperkins Hi Simon. Here is the pull request as requested. So far, Tigger LSM can support shapelets with a single coefficient describing them. At this stage, we just need to change the coefficients column to support a list of coefficients. I think the idea was to think of the shapelet coefficients as a 2 x 2 matrix, and simply write out the matrix row by row in each column of the file.

An example of how I imagine this would look like would be something along the lines of
#format name ....... coeffs_0 coeffs_1 coeffs_2 coeffs_3 ...
J0 .... 1,2,3,4,5 5,4,3,2,1 6,7,8,9,10 10,9,8,7,6
And this would describe a shapelet coefficient matrix that looked like this:
| 1 2 3 4 5 |
| 5 4 3 2 1 |
| 6 7 8 9 10 |
| 10 9 8 7 6 |
Where it is l rows by m columns. Am I on the right track here?

@sjperkins
Copy link
Member

@sjperkins Hi Simon. Here is the pull request as requested. So far, Tigger LSM can support shapelets with a single coefficient describing them. At this stage, we just need to change the coefficients column to support a list of coefficients. I think the idea was to think of the shapelet coefficients as a 2 x 2 matrix, and simply write out the matrix row by row in each column of the file.

When you say a 2 x 2 matrix, do you perhaps mean a 2D matrix?

An example of how I imagine this would look like would be something along the lines of
#format name ....... coeffs_0 coeffs_1 coeffs_2 coeffs_3 ...
J0 .... 1,2,3,4,5 5,4,3,2,1 6,7,8,9,10 10,9,8,7,6
And this would describe a shapelet coefficient matrix that looked like this:
| 1 2 3 4 5 |
| 5 4 3 2 1 |
| 6 7 8 9 10 |
| 10 9 8 7 6 |
Where it is l rows by m columns. Am I on the right track here?

That could work. There's also the option of have a single coefficient column, expressing the coefficients as a list and parsing it with ast.literal_eval

>>> from __future__ import print_function
>>> import ast

>>> print(ast.literal_eval("[[1,2,3],[4,5,6]]"))

>>> [[1, 2, 3], [4, 5, 6]]

What do you think?

@o-smirnov
Copy link
Contributor

Sigh, we need to write things down at meetings -- seems like everyone on the same page when we talk, and then we go our own ways with a different mental image...

Mathematically, the coefficients are indeed a 2D matrix, of arbitrary large size. There should be just one coefficients column ("``shapelet_coeff''") in the LSM, containing an arbitrary length list. This list is mapped to the 2D matrix as follows:

If shapelet_coeff is 1,2,3,4,5,6,7,8,9,10,11 then the matrix is

1  3  6 10 0
2  5  9  0
4  8  0
7  12
11

and zero everywhere else. This way, you don't need a variable number of columns, and you always, unambiguously, know how many coefficients you have.

@sjperkins
Copy link
Member

Sigh, we need to write things down at meetings -- seems like everyone on the same page when we talk, and then we go our own ways with a different mental image...

Yeah, sorry. I think I had a solution in search of a problem.

@JoshVStaden
Copy link
Author

@o-smirnov Hi Oleg. I have made the changes you requested to how tigger reads in the coefficients field. I have run into an issue with how the coefficients are structured and wanted to run it past you and @landmanbester before going forward.

Mathematically, the coefficients are indeed a 2D matrix, of arbitrary large size. There should be just one coefficients column ("``shapelet_coeff''") in the LSM, containing an arbitrary length list. This list is mapped to the 2D matrix as follows:

According to my shapelet script, the coefficients are not read as a 2D matrix of arbitrary size, but, for each source, they are treated as vectors (i.e. a list of coefficients for l, and a list of coefficients for m).

Just to run it past you, if the coefficients are a 2D Matrix, would this matrix simply be the product between each element in each specific vector?

So, for example, if vec_coeffs_l is the vector holding the coefficients for l dimension, and likewise for vec_coeffs_m, and mat_coeffs is the matrix of coefficients that you described, then would it be the case that mat_coeffs[3,4] = vec_coeffs_l[3] * vec_coeffs_m[4]?

n += 1
x += n
return n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use np.tril_indices here

@landmanbester
Copy link

@JoshVStaden yes the basis function is assumed separable so to get the basis function for ij you simply take the product of the basis for coeff_l[i] and coeff_m[j] but there is still only a single coefficient per 2D basis function. The ordering that Oleg is suggesting can be illustrated by the following figure

shapelets

We enumerate the basis functions by zig-zagging along diagonals. For the above case, starting at the bottom left corner, we enumerate the basis functions as

[(0,0), (0,1), (1,0), (2,0), (1,1), (0,2), (0,3), (1,2), (2,1), (3,0),....]

and so on. This ordering should be implicit in the lsm format. Thus, given a 1D list with n coefficients say, you can always use the ordering to associate each coefficient with its basis function. For example, if we receive the following list of coefficients

[a0, a1, a2, a3]

then we know to reconstruct the function as

f(l, m) = a0 B0(l) B0(m) + a1 B0(l) B1(m) + a2 B1(l) B0(m) + a3 B2(l) B0(m)

Is that a bit clearer now?

@landmanbester
Copy link

Oh, and I am just realising that to complete the top right corner of the square in that figure you are going to need to specify the maximum order for the 1D basis functions. In the figure, the maximum order for both l and m is 4. If the maximum order for l and m are not the same you end up with a rectangle instead of a square but I am not sure if that will ever happen in practice. Maybe we can just add an order parameter to the lsm and assume it is always square. @o-smirnov what do you think?

@JoshVStaden
Copy link
Author

Just so that I am following, we are assuming the coefficient matrix is just the element-wise product between the two coefficient vectors for l and m, correct? And the issue here is that, without specifying the maximum order, the format we specified would not be able to write the coefficients for the top right hand corner in Landman's example image there? So what we would do is specify the maximum order, so that the script can start writing down the diagonals towards the top right hand corner of that example (in the case of the code, it would be towards the bottom right hand side of the matrix).

If what I am saying is correct, would it not make more sense to simply specify the two l and m matrices, and have the code automatically generate the matrix from that? So one would specify shapelet_coeffs as a single field, with an even number of elements in that field, and the code would simply split it up into the l and m parts, and multiply them?

So, for example, for an input of 1,2,3,4,5,6, the code would output the following matrix:

4 5 6
8 10 12
12 15 18

Because the l vector is 1,2,3 and the m vector is 4,5,6. And then, if the user inputs an odd number of coefficients, the code would simply throw an exception.

Does this make sense, or does this go against the general goals of this format?

@ratt-priv-ci
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants