Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Shared Arrays support #7

Open
Skylion007 opened this issue May 7, 2016 · 2 comments
Open

Feature Request: Shared Arrays support #7

Skylion007 opened this issue May 7, 2016 · 2 comments

Comments

@Skylion007
Copy link

Due to the massive amount of computation required to work on large datasets, it would be nice if this library would be able to train the model in parallel. I tried implement the pull request myself but ran into the wall that this library used an object, Ratings to store the ratings data. Unfortunately, shared arrays do not support this feature. Having each process have a separate copy of the Ratings would just not be feasible, as my dataset barely fits into memory as is. I was able to sketch out how one would go about implementing this though. If anyone can think of a way I can safely to do this, perhaps by somehow sharing the object ratingset between all process, we can add this nice functionality.

Here is a sketch of how train would work with some parallelism optimizations. Hopefully someone will be able to build off my work and make it fully parallel. This should help make it parallel for small datasets though. Maybe someone more experience in Julia than I can figure out how to easily distribute the RatingSet object in a memory efficient manner. Any ideas?

function train_parr(rating_set::RatingSet,
               max_rank;
               min_epochs=0,
               max_epochs=100,
               learning_rate=0.001,
               regularizer=0.02)
    p = Progress(max_rank, 1, "Computing truncated rank $(max_rank) SVD ")
    addprocs(7)

    user_features = SharedArray(Float32, (length(rating_set.user_to_index), max_rank), init = S -> S[Base.localindexes(S)] = 0.1)
    item_features = SharedArray(Float32, (length(rating_set.item_to_index), max_rank), init = S -> S[Base.localindexes(S)] = 0.1)
    residuals = X = SharedArray(Float32, size(rating_set.training_set), init = S -> S[Base.localindexes(S)] = [Residual(rating.value, 0.0, 0.0) for rating in rating_set.training_set[Base.localindexes(S)]])
    #
    num_ratings = length(rating_set.training_set)
    for rank=1:max_rank
        errors = SharedArray(Float32, (3,), init = S -> [0.0, Inf, Inf])
        for i=1:max_epochs
            @sync @parallel for j=1:num_ratings
                rating, residual = rating_set.training_set[j], residuals[j]
                item_feature = item_features[rating.item, rank]
                user_feature = user_features[rating.user, rank]
                residual.curr_error = residual.value - user_feature * item_feature
                error_diff = residual.prev_error - residual.curr_error
                errors[1] += error_diff * error_diff
                residual.prev_error = residual.curr_error
                item_features[rating.item, rank] += learning_rate * (residual.curr_error * user_feature - regularizer * item_feature)
                user_features[rating.user, rank] += learning_rate * (residual.curr_error * item_feature - regularizer * user_feature)
            end
            # distance decreases, then increases, then decreases. we want to catch it on second decrease
            if i > min_epochs && errors[1] < errors[2] && errors[2] > errors[3]
                break
            end
            errors[1], errors[2], errors[3] = 0.0, errors[1], errors[2]
        end
        for residual in residuals
            residual.value = residual.curr_error
            residual.prev_error = 0.0
        end
        next!(p)
    end
    rmprocs(workers())
@Skylion007
Copy link
Author

Skylion007 commented May 7, 2016

Apparently if you change your types to immutable we can actually implement SharedArray support rather easily. Here is how you can do it. I will probably give a shot at implementing it fully later but it should be trivial as long as the types are changed.

@Skylion007
Copy link
Author

See #8 I added the feature myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant