Feature Request: Shared Arrays support #7

Skylion007 · 2016-05-07T00:55:56Z

Due to the massive amount of computation required to work on large datasets, it would be nice if this library would be able to train the model in parallel. I tried implement the pull request myself but ran into the wall that this library used an object, Ratings to store the ratings data. Unfortunately, shared arrays do not support this feature. Having each process have a separate copy of the Ratings would just not be feasible, as my dataset barely fits into memory as is. I was able to sketch out how one would go about implementing this though. If anyone can think of a way I can safely to do this, perhaps by somehow sharing the object ratingset between all process, we can add this nice functionality.

Here is a sketch of how train would work with some parallelism optimizations. Hopefully someone will be able to build off my work and make it fully parallel. This should help make it parallel for small datasets though. Maybe someone more experience in Julia than I can figure out how to easily distribute the RatingSet object in a memory efficient manner. Any ideas?

function train_parr(rating_set::RatingSet,
               max_rank;
               min_epochs=0,
               max_epochs=100,
               learning_rate=0.001,
               regularizer=0.02)
    p = Progress(max_rank, 1, "Computing truncated rank $(max_rank) SVD ")
    addprocs(7)

    user_features = SharedArray(Float32, (length(rating_set.user_to_index), max_rank), init = S -> S[Base.localindexes(S)] = 0.1)
    item_features = SharedArray(Float32, (length(rating_set.item_to_index), max_rank), init = S -> S[Base.localindexes(S)] = 0.1)
    residuals = X = SharedArray(Float32, size(rating_set.training_set), init = S -> S[Base.localindexes(S)] = [Residual(rating.value, 0.0, 0.0) for rating in rating_set.training_set[Base.localindexes(S)]])
    #
    num_ratings = length(rating_set.training_set)
    for rank=1:max_rank
        errors = SharedArray(Float32, (3,), init = S -> [0.0, Inf, Inf])
        for i=1:max_epochs
            @sync @parallel for j=1:num_ratings
                rating, residual = rating_set.training_set[j], residuals[j]
                item_feature = item_features[rating.item, rank]
                user_feature = user_features[rating.user, rank]
                residual.curr_error = residual.value - user_feature * item_feature
                error_diff = residual.prev_error - residual.curr_error
                errors[1] += error_diff * error_diff
                residual.prev_error = residual.curr_error
                item_features[rating.item, rank] += learning_rate * (residual.curr_error * user_feature - regularizer * item_feature)
                user_features[rating.user, rank] += learning_rate * (residual.curr_error * item_feature - regularizer * user_feature)
            end
            # distance decreases, then increases, then decreases. we want to catch it on second decrease
            if i > min_epochs && errors[1] < errors[2] && errors[2] > errors[3]
                break
            end
            errors[1], errors[2], errors[3] = 0.0, errors[1], errors[2]
        end
        for residual in residuals
            residual.value = residual.curr_error
            residual.prev_error = 0.0
        end
        next!(p)
    end
    rmprocs(workers())

The text was updated successfully, but these errors were encountered:

Skylion007 · 2016-05-07T02:40:18Z

Apparently if you change your types to immutable we can actually implement SharedArray support rather easily. Here is how you can do it. I will probably give a shot at implementing it fully later but it should be trivial as long as the types are changed.

Skylion007 · 2016-05-08T20:47:46Z

See #8 I added the feature myself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Shared Arrays support #7

Feature Request: Shared Arrays support #7

Skylion007 commented May 7, 2016

Skylion007 commented May 7, 2016 •

edited

Loading

Skylion007 commented May 8, 2016

Feature Request: Shared Arrays support #7

Feature Request: Shared Arrays support #7

Comments

Skylion007 commented May 7, 2016

Skylion007 commented May 7, 2016 • edited Loading

Skylion007 commented May 8, 2016

Skylion007 commented May 7, 2016 •

edited

Loading