Note: we are currently using protobuf 2.4.0a
- Install the latest XCode with command line tools -- or just the command line tools
- Install homebrew
- Install protobuf: brew install protobuf
- Get parallel, as some scripts depend on it.
brew install parallel
- Run the DataFetchPipeline.rb in scripts/
- Look at build.xml or the pipeline stages available.
- Install the Google's protocol buffers binaries: https://protobuf.googlecode.com
- Unpack the archive and put protoc.exe somewhere in PATH
- Run run-protoc-win.sh to generate protobuf's classes Steps 4-6 are the same as for OSX
To check what's inside the generated HAR file:
hadoop fs -ls -R har:///projects/dataset.har
- Make sure that you have a public ssh key. If you don't, follow this guide: https://help.github.com/articles/generating-ssh-keys
- Append your public ssh key to
[email protected] ~/.ssh/authorized_keys
- Make sure that you can
ssh haddock.unibe.ch -l deploy
without being asked a password. - Run your test with
./ant.sh uploadJar -DmainClass=ch.unibe.scg.cells.hadoop.JUnitRunner -DclassArgument=ch.unibe.scg.cells.hadoop.CellsTestSuite
. In case of an unsuccessful run you, you will get the errors in the console.
hadoop job -kill job_<your_job_id>
Copy the local scripts across the cluster:
./scripts/deploy_scripts.sh
Run the DataFetchPipeline:
ssh leela ./scripts/ohloh/DataFetchPipeline.rb
Or locally, for testing.:
./scripts/ohloh/DataFetchPipeline.rb --max_repos 3
Open with:
hbase shell
List tables:
list
Check HBase table size:
hadoop fs -du -h -s /hbase/
Check size of HAR file:
hadoop fs -du -h /projects/dataset.har