-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test speed? Has it improved with dataset? #3
Comments
Hello! I would say that things are faster, but there has never been an opportunity to see a large project go from not using dataset - and having some performance metrics - to using dataset and seeing those metrics improve. Also, I have not used dataset on a new project in a good while - only because I have not had a new project that could make use of it, having been on one project for so long and now working a lot with iOS and heavy JavaScript applications. This to say, I seem to recall there being an issue where dataset might have a bug where dumps are not being used when they should be in certain situations, those where nested example/test contexts add data to a set created in an outer test scope (implemented as a class heirarchy in rspec 1 and Test::Unit). So, I would suggest a small experiment, and make sure things operate with your versions of gems, paying attention to the dump files that are created in the tmp directory. |
Hi Adam, Thanks a lot for the feedback. Happy to hear that you feel its faster. I'm currently working on some test cases where I need lots of data for each scenario. So currently I'm creating all the ActiveRecord objects before each test and save them to the database. Incredibly slow obviously. The alternatives are playing with transactions, fixture data and dataset. I believe the approach of dataset would be the fastest if I understand the idea correctly. My understanding is that you load a database dump if it exists, and otherwise create the dataset the normal way if there is no database dump and than save that database dump for follow up runs. The database dump only needs to be updated when the dataset changes. So the win is that you don't have to create many ruby objects and you don't have to do many inserts for these scenarios. I'll play with it and see if I can contribute back to get it up to date with the latest gems. |
If you load a dataset like this, where the name is resolved to a dataset subclass: describe "my stuff" do
dataset :very_expensive_inserts_with_validations_and_everything
end Then your understanding is correct - that dataset will run, all the Ruby code will insert stuff, and then a sql dump will be made. Any other code that wants the :very_expensive_inserts_with_validations_and_everything data will get the dump loaded, instead of having all the Ruby code executed. Another thing dataset supports is this: describe "my stuff" do
dataset :expensive do
MyActiveRecord.new.save
end
end Here is where my memory begins to fail me :) I believe this will load the dump of :expensive, and then run the block for each test. It MAY also create a dump for this block, but if it does, I think now that is not ideal, as you may create instance variables in the blocks, as I wanted them to act like before :each blocks. You've rekindled my interest. I think this thing could be awesome if I had time to work on it :) |
I just did a quick test for my self:
Output:
So that is really promising! However note that I did something different than dataset (which dumps the whole database structure as well:
And since it doesn't delete the whole database it is more flexible as well, because you could do data inserts before loading the dataset if you felt like it. Do you see downsides to the above approach? Jeroen |
Sorry the above wasn't a reply to your reply :), I didn't refresh. I do think that your initiative has a lot of value and I think it is a waste that it doesn't seem to be used a lot (I remember seeing it a while ago and therefore I found it again). When I first saw it I didn't see the huge value because I didn't grasp that the main advantage was speed. I thought it was just about reusing datasets which you can do with stuff like machinist pretty ok too. Now, I had a really slow test suite and I figured I needed something like you had already built :) Happy that i've rekindled your interest :) I hope to contribute soon :) Jeroen |
Wow, that is quite a significant difference in performance. I suddenly realize how silly I was to not start and end with performance metrics - who would build a performance-focused tool without them! Live and learn :) Re: Database schema dump/load, I don't think it would be a problem to assume that the schema remains constant during a test run. In fact, I can't think at the moment why anyone would want to reload the schema, and may even consider it a bug that it occurs in dataset. Re: Value of dataset, I do believe that the only value is in the performance space. Dataset should probably be changed to support existing factory APIs, thereby allowing the marketing and implementation to be focused on it's core value add. Thanks for the interaction and encouragement! |
Cool! I'm happy to encourage you :) I was also thinking that it might be smart to move the TestUnit, RSpec and Cucumber adapters to a different library say e.g. dataset-testunit, dataset-rspec, dataset-cucumber. That way a lot of complexity in the code and tests can be removed and the core has less chance of getting out of date. I have some more ideas but I'll keep them for later :) This is already a lot of work I guess. If you are ok with the above I'll start working on that soon to get the core up to date. |
That sounds like a great idea to me. I really hope this proves valuable for you, and that you have fun :) |
I have started to speed up my own test suite locally by introducing some hacks. Here is my latest hack which could work really well for an activerecord adapter (it works perfectly for my slow scenario):
The beauty in the above code is that you don't have to dump the complete database and it is easy to find out what to insert because you just put a logger around the code. I guess it could by even more optimized by translating the above to a COPY statement (for postgresql), but I'm not sure how much the improvement would be. This solution is as fast as the fastest pg_dump alternative (around 0.03s)! Cool! I'm going to implement this hack into my application and see how far I can push and than port it back to a more dataset like kind of library. |
I love it! Can hardly wait to hear how far it will carry you. |
My progress so far: https://gist.github.com/1097294 It already works quite nicely. I have added a small TODO list to it for things that I think need to be done. Do you think I could reuse your test suite? |
Excellent, and succinct. I'd have to review the tests, but I would guess that at least they could provide some indication of things that should be considered. I am taking a vacation next week. Perhaps I can spend some time digging into this project again. |
in the the README I find the following text:
Did you find answer to that assumption? I believe that the dataset approach can increase my test suite's speed greatly, but I would love to hear your experience?
Thanks a lot,
Jeroen
NB. I still have to integrate dataset into my application's test suite therefore it would be nice to get an answer on the above before i go into all the effort.
The text was updated successfully, but these errors were encountered: