Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CNN and attention-lstm? #28

Closed
xinsuinizhuan opened this issue Aug 20, 2019 · 26 comments
Closed

Add support for CNN and attention-lstm? #28

xinsuinizhuan opened this issue Aug 20, 2019 · 26 comments

Comments

@xinsuinizhuan
Copy link

attention-lstm is better than lstm, and cnn is need for forecast to group cnn and(attention-lstm)net.

@xinsuinizhuan
Copy link
Author

net
attention-lstm and your old project's net

@josephjaspers
Copy link
Owner

I am not sure what an attention-lstm is could you send a link?
I will start working on the CNN.

I need to implement/find a good implementation of convolution to use. (iI have written it myself before but it was very slow).

@xinsuinizhuan
Copy link
Author

@josephjaspers
Copy link
Owner

josephjaspers commented Sep 8, 2019

Added CNN as of:

bbc1a2a

It is very slow, and the user must calculate the output-shape itself.
It is "experimental" currently but I will be completing within the next week.
(Improving performance, auto calculating shape, etc).

You can modify the example with this to test:
(I will add an example/tests etc soon).

	auto network = neuralnetwork(
		BC::nn::Convolution<System, double>(28,28,1, 7, 7, 3),
		BC::nn::flatten(system_tag, BC::shape(22, 22, 3)),
		BC::nn::logistic(system_tag, 22*22*3),
		BC::nn::feedforward(system_tag, 22*22*3, 256),
		BC::nn::logistic(system_tag, 256),
		BC::nn::feedforward(system_tag, 256, 10),
		BC::nn::softmax(system_tag, 10),
		BC::nn::logging_output_layer(system_tag, 10, BC::nn::RMSE).skip_every(100/5)
	);

	std::cout << " training..." << std::endl;
	auto start = std::chrono::system_clock::now();
	for (int i = 0; i < epochs; ++i){

		std::cout << " current epoch: " << i << std::endl;
		for (int j = 0; j < samples/batch_size; ++j) {
			network.forward_propagation(BC::reshape(inputs[j], BC::shape(28,28, 1, batch_size)));
			network.back_propagation(outputs[j]);
			network.update_weights();
		}
	}

@josephjaspers
Copy link
Owner

TODO:

Improve implementation of CNN (most likely switch to img2col implementation).
Add Maxpooling
Add GPU Support.
Add auto-deduction of the shape.
Add Attention-LSTM

@xinsuinizhuan
Copy link
Author

you are so great.I expect it so much.

@xinsuinizhuan
Copy link
Author

xinsuinizhuan commented Sep 30, 2019

How about Maxpooling,when use the cnn, shoud add Maxpooling? When I want try to use this truct model, how should i do?
图片

@josephjaspers
Copy link
Owner

I have not yet implemented max-pooling, I want to try to optimize CNN first, but maxpooling isn't particularly difficult to implement so I can see if I can do that quickly.

@xinsuinizhuan
Copy link
Author

How about Maxpooling,when use the cnn, shoud add Maxpooling? When I want try to use this truct model, how should i do?
图片

How about this net struct, now can implement?

@josephjaspers
Copy link
Owner

Yes I can work on that soon.
I am also working on optimizing LSTM and the Convolution Layer.

For Convolution and Maxpooling I may borrow Caffe's implementation.

@josephjaspers
Copy link
Owner

You can see that I have started working on max-pooling here: https://github.com/josephjaspers/blackcat_tensors/blob/master/include/neural_networks/functions/Max_Pooling.h

@xinsuinizhuan
Copy link
Author

Yes. You are so great. I have seen the Max_Pooling.h, so i ask you that. Now we should first focus on the lstm single_predict, the same input and output, then to implement Convolution and Maxpooling.I expect it.

@xinsuinizhuan
Copy link
Author

Because i seen one paper, that net struct is efficient to make my forcast. Now only lstm layer the forcast result is not so good, we should make other net sturct or use the attention-lstm.But that is next to implement and test.

@josephjaspers
Copy link
Owner

josephjaspers commented Nov 1, 2019

The order of things I will work on is...

  1. Optimizing Convolution (It is too slow)
  2. MaxPooling
  3. Optimizing LSTM (It can be much faster than the current implementation)
  4. AttentionLSTM
    I have found Caffe's implementation of Convoltution/MaxPooling so I will most likely be importing their implementation into this project.

@xinsuinizhuan
Copy link
Author

xinsuinizhuan commented Nov 3, 2019 via email

@josephjaspers
Copy link
Owner

I am working on improving Convolution, hopefully I will have it finished by today or tomorrow.
I will try fix single_predict soon after.

@josephjaspers
Copy link
Owner

josephjaspers commented Nov 3, 2019

Hi, I just added a new version of Convolution. (It still needs testing, does not support single-predict currently).

https://github.com/josephjaspers/blackcat_tensors/blob/master/include/neural_networks/Layers/Convolution_Experimental.h

However it should be much faster than the current version.
I will look into the single_predict function now.

Then...
--> Testing/adding single_predict to the faster Convolution method.
--> Max_Pooling
--> Optimizing LSTM
--> Adding Attention-LSTM

@josephjaspers
Copy link
Owner

Because i seen one paper, that net struct is efficient to make my forcast. Now only lstm layer the forcast result is not so good, we should make other net sturct or use the attention-lstm.But that is next to implement and test.

I also just fixed a bug where set_learning_rate wouldn't actually set the learning_rate of the Layer. So perhaps re-running may improve performance.

@xinsuinizhuan
Copy link
Author

You are so great. I am so glad to see the updata. I will test the new code just right.So begin our work as you plan.

@xinsuinizhuan
Copy link
Author

Because i seen one paper, that net struct is efficient to make my forcast. Now only lstm layer the forcast result is not so good, we should make other net sturct or use the attention-lstm.But that is next to implement and test.

I also just fixed a bug where set_learning_rate wouldn't actually set the learning_rate of the Layer. So perhaps re-running may improve performance.

Yes, it is better in performance.But it need train more epochs:
befor version:
BC::nn::RMSE loss, begin is 0.156, i set epoch == 1028, then it reduce to 0.12, and have many mutations in the data.
now version:
BC::nn::RMSE loss, begin is 0.256, i set epoch == 5000, then it reduce to 0.07, and the data is relatively stable. When i set epoch == 1024, it only reduce to 0.166.

@josephjaspers
Copy link
Owner

Eventually I would like to add optimizers (like momentum and Adam), though I havent begun to work on them yet.

@xinsuinizhuan
Copy link
Author

xinsuinizhuan commented Nov 8, 2019

Eventually I would like to add optimizers (like momentum and Adam), though I havent begun to work on them yet.

I am glad to see your reply! I am so expected. I want to use the net in practical, so you should stepping up time, when you no busy, include the simgle_predict and up functions.

@josephjaspers
Copy link
Owner

Convolution (the experimental version) is now the standard version.
I have tested it (not windows but on linux).
It is considerably faster than the previous version, however it consumes a lot of memory.

ff33dff

@josephjaspers
Copy link
Owner

Maxpooling branch. (Not complete)

5e6602f

@josephjaspers
Copy link
Owner

convolution and maxpooling, have been added!
The attention-lstm ticket has been moved to:
#48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants