-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create format for Shapelets #13
base: master
Are you sure you want to change the base?
Conversation
@sjperkins Hi Simon. Here is the pull request as requested. So far, Tigger LSM can support shapelets with a single coefficient describing them. At this stage, we just need to change the coefficients column to support a list of coefficients. I think the idea was to think of the shapelet coefficients as a 2 x 2 matrix, and simply write out the matrix row by row in each column of the file. An example of how I imagine this would look like would be something along the lines of |
When you say a 2 x 2 matrix, do you perhaps mean a 2D matrix?
That could work. There's also the option of have a single coefficient column, expressing the coefficients as a list and parsing it with >>> from __future__ import print_function
>>> import ast
>>> print(ast.literal_eval("[[1,2,3],[4,5,6]]"))
>>> [[1, 2, 3], [4, 5, 6]] What do you think? |
Sigh, we need to write things down at meetings -- seems like everyone on the same page when we talk, and then we go our own ways with a different mental image... Mathematically, the coefficients are indeed a 2D matrix, of arbitrary large size. There should be just one coefficients column ("``shapelet_coeff''") in the LSM, containing an arbitrary length list. This list is mapped to the 2D matrix as follows: If shapelet_coeff is 1,2,3,4,5,6,7,8,9,10,11 then the matrix is
and zero everywhere else. This way, you don't need a variable number of columns, and you always, unambiguously, know how many coefficients you have. |
Yeah, sorry. I think I had a solution in search of a problem. |
@o-smirnov Hi Oleg. I have made the changes you requested to how tigger reads in the coefficients field. I have run into an issue with how the coefficients are structured and wanted to run it past you and @landmanbester before going forward.
According to my shapelet script, the coefficients are not read as a 2D matrix of arbitrary size, but, for each source, they are treated as vectors (i.e. a list of coefficients for l, and a list of coefficients for m). Just to run it past you, if the coefficients are a 2D Matrix, would this matrix simply be the product between each element in each specific vector? So, for example, if vec_coeffs_l is the vector holding the coefficients for l dimension, and likewise for vec_coeffs_m, and mat_coeffs is the matrix of coefficients that you described, then would it be the case that mat_coeffs[3,4] = vec_coeffs_l[3] * vec_coeffs_m[4]? |
n += 1 | ||
x += n | ||
return n | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use np.tril_indices here
@JoshVStaden yes the basis function is assumed separable so to get the basis function for ij you simply take the product of the basis for coeff_l[i] and coeff_m[j] but there is still only a single coefficient per 2D basis function. The ordering that Oleg is suggesting can be illustrated by the following figure We enumerate the basis functions by zig-zagging along diagonals. For the above case, starting at the bottom left corner, we enumerate the basis functions as [(0,0), (0,1), (1,0), (2,0), (1,1), (0,2), (0,3), (1,2), (2,1), (3,0),....] and so on. This ordering should be implicit in the lsm format. Thus, given a 1D list with n coefficients say, you can always use the ordering to associate each coefficient with its basis function. For example, if we receive the following list of coefficients [a0, a1, a2, a3] then we know to reconstruct the function as f(l, m) = a0 B0(l) B0(m) + a1 B0(l) B1(m) + a2 B1(l) B0(m) + a3 B2(l) B0(m) Is that a bit clearer now? |
Oh, and I am just realising that to complete the top right corner of the square in that figure you are going to need to specify the maximum order for the 1D basis functions. In the figure, the maximum order for both l and m is 4. If the maximum order for l and m are not the same you end up with a rectangle instead of a square but I am not sure if that will ever happen in practice. Maybe we can just add an order parameter to the lsm and assume it is always square. @o-smirnov what do you think? |
Just so that I am following, we are assuming the coefficient matrix is just the element-wise product between the two coefficient vectors for l and m, correct? And the issue here is that, without specifying the maximum order, the format we specified would not be able to write the coefficients for the top right hand corner in Landman's example image there? So what we would do is specify the maximum order, so that the script can start writing down the diagonals towards the top right hand corner of that example (in the case of the code, it would be towards the bottom right hand side of the matrix). If what I am saying is correct, would it not make more sense to simply specify the two l and m matrices, and have the code automatically generate the matrix from that? So one would specify So, for example, for an input of
Because the l vector is Does this make sense, or does this go against the general goals of this format? |
Can one of the admins verify this patch? |
No description provided.