-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting datasets from datavault slows down computer #417
Comments
For big datasets, you should not try to get the whole thing all at once, but rather load the data in chunks. When you call get, pass the number of rows and then loop until no more data comes back, something like: rows = []
while True:
r = cxn.data_vault.get(1000)
if not r:
break
rows.extend(r) Fetching the data in small chunks like this will ensure that the server can do other things in the meantime, like accept writes from your other measurement script. Note that with the csv backend, the server still loads the entire dataset from disk into memory, so if you have a very large dataset there may still be a small server pause while the data is loaded, even if you then fetch the data over the network in smaller chunks. As for your other questions, the new version of the data vault does not support choosing between hdf5 and csv format for data storage; it will continue to read csv data sets, but all new data sets will be stored with hdf5. Of course, we could add the ability to select a file format if that is something you need. Also, it is certainly possible to open partially completed datasets that are stored as hdf5; other clients can, for example, open the dataset and then get notifications when new data is added to the dataset. |
Thank you! I have switched to an asynchronous connection to data_vault for grabbing data. It seems to be working with smaller datasets so far. I was wondering if this is the right way to do it. from labrad.wrappers import connectAsync
from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks, returnValue
import scipy.io as sio
import sys
@inlineCallbacks
def get_file(host, fdir, fname):
try:
cxn = yield connectAsync(host=host)
print("connected")
dv = cxn.data_vault
print(yield dv.cd(fdir))
print(yield dv.open(fname))
print("opened file")
M = yield dv.get()
print M
except:
reactor.stop()
reactor.stop()
returnValue(M)
@inlineCallbacks
def save_file():
M = yield get_file('rashba', sys.argv[1], int(sys.argv[2]))
path = "C:/Users/carli/Dropbox/NHMFL_AUG2017/matlab/"
filename = path + sys.argv[1] + "/" + sys.argv[2] + ".mat"
sio.savemat(filename, {'d'+str(sys.argv[1]):M})
if __name__ == '__main__':
print(sys.argv)
save_file()
reactor.run() A couple of more things:
|
labrad doesn't support passing keyword args when you call remote settings, at least not yet, so you have to pass conditional args instead. You could, for example, do As for reading an hdf5 file while writing to it from a separate program, I have no idea whether that will work; it's certainly not something I would recommend if you can avoid it, because we haven't tested that sort of scenario to ensure that the data can't get corrupted or something like that. What was the exact java error you were seeing? |
Hi,
I'm trying to get large datasets (1GB+) from datavault 2.3.4 with csv files using get() from a remote computer. It takes a long time and the computer running datavault slows down to the point of almost freezing which is interfering with the measurement it is running.
I was wondering:
edit:
I have a synchronous script that connects and writes data to datavault line-by-line (version 3.0.1 this time). When I synchronously get() data from a remote computer, the script pauses and doesn't write data anymore until get() returns a dataset. Is there any easy way to get data from datavault without my script pausing?
thanks
The text was updated successfully, but these errors were encountered: