You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should probably exchange ideas on data sampling / splitting.
The approach I am currently using is very memory friendly for huge datasets by shuffling the data-matrix in-place and then making continuous array views.
See the nnet/data.jl source file... I'm using the idea of fixed arrays of DataPoint objects, and wrappers which access the arrays in different ways. I haven't benchmarked completely, though.
On Sep 4, 2015, at 3:56 PM, Christof Stocker [email protected] wrote:
We should probably exchange ideas on data sampling / splitting.
The approach I am currently using is very memory friendly for huge datasets by shuffling the data-matrix in-place and then making continuous array views.
julia> @time test = view(X, :, 7000001:10000000)
elapsed time: 1.3097e-5 seconds (192 bytes allocated)
I couldn't find a better way so far. It does has its limitations when the sampling should be sensitive to the class distribution
—
Reply to this email directly or view it on GitHub.
We should probably exchange ideas on data sampling / splitting.
The approach I am currently using is very memory friendly for huge datasets by shuffling the data-matrix in-place and then making continuous array views.
Let X be a 10x10000000 Array{Float64,2}
I couldn't find a better way so far. It does has its limitations when the sampling should be sensitive to the class distribution
The text was updated successfully, but these errors were encountered: