-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why the Chebyshev polynomials is needed? #35
Comments
+1, I was doing GCN paper review and read this paper today, came up with the same question. |
I think in the cited wavelet on graph paper (Hammond et al.), they used chebyshev polynomial to approximate a known wavelet function because chebyshev interpolation has certain desired property, in the case where the filters are unknown, the chebyshev polynomials are not necessary, any linear combination of chebyshev polynomials up to Kth degree can be represented by a Kth degree polynomial. |
Has anyone tried node classification on a general graph without chebyshev polynomials (i.e. using only the laplacian and its powers)? What was the accuracy compared to the chebyshev method? |
See here: |
I have used the laplacian, and the normalized laplacian to train a convolution NN. The accuracy on the training, testing, and validation set were all above 85%. The data I used was proxy related data and the model was for a security related to problem. However, doing that is how I found this work, and xaviers work as a whole... so I’ve seen it work, but Xavier Bresson and his team build on those concepts so well.... So, not the best modeling approach, but it is tractable for certain problems. |
Yes, it is stable under coefficient perturbation but otherwise the complexities are the same. |
Any polynomial (monomials, Chebyshev, Legendre, etc.) of order K has the same representation power and can be used. We used Chebyshev because we had experience with it and thought it would be easier to optimize as they form an orthogonal basis. There's however not much difference in practice. Chebyshev polynomials are more orthogonal (under the identity measure, and truly orthogonal with measure In the vertex domain (for a ring/circle graph), Chebyshev and Legendre polynomials are also more orthogonal than monomials: |
I have the same question as @zeal-github , and opened a question in stack exchange. After reading this issue, I think the question is clear now. Chebyshev polynomials may have better property with respect to parameter orthogonality, but the computation complexity is the same as directly using laplacian matrix. |
Why did you choose the Chebyshev Polynomials of first kind instead of the Legendre Polynomials? |
Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice. |
In your opinions, it doesn't matter which orthogonal polynomials are chosen. Because in practice, the performance is not depends on the kinds of orthogonal polynomials. Am I understanding correctly? |
Yes. It seems that SGD works well enough to find the best polynomial, whatever the basis. That might also be because we use low-order polynomials when learning. |
What is SGD? |
I have another big problem. Does the orthogonal polynomials causes low-pass filter? Or Spectral CNN (Spectral Networks and Deep Locally Connected Networks on Graphs) itself is a low-pass filter GNN? |
Neither spectral nor polynomial convolutions enforce low-pass filters. Polynomial convolutions enforce local filters (the locality is controlled by the order of the polynomials). Local filters can be high-pass (simply think about a multiplication by the Laplacian itself, a second-order derivative). The low-pass limitation is specific to GCN [Kipf et al.] as they merged the 0- and 1-neighborhood in the quest of learning a single parameter per filter. |
When approximate the initial filters kernel gθ(Λ) = diag(θ) with a polynomial
why we can‘t simply combine the eigenvector and eigenvalue matrix together like
because
*
The paper instead, use Chebyshev polynomials and also said that
I'm quite confused here. I don't know why the Chebyshev is needed?
The text was updated successfully, but these errors were encountered: