Setup (on a recentish OSX)

Note: we are currently using protobuf 2.4.0a

Setup (on a recentish OSX)

Install the latest XCode with command line tools -- or just the command line tools
Install homebrew
Install protobuf: brew install protobuf
Get parallel, as some scripts depend on it. brew install parallel
Run the DataFetchPipeline.rb in scripts/
Look at build.xml or the pipeline stages available.

Setup (on MS Windows)

Install the Google's protocol buffers binaries: https://protobuf.googlecode.com
Unpack the archive and put protoc.exe somewhere in PATH
Run run-protoc-win.sh to generate protobuf's classes Steps 4-6 are the same as for OSX

Monitoring the cluster

To check what's inside the generated HAR file:

hadoop fs -ls -R har:///projects/dataset.har

Running hadoop tests with junit outside the unibe network

Make sure that you have a public ssh key. If you don't, follow this guide: https://help.github.com/articles/generating-ssh-keys
Append your public ssh key to [email protected] ~/.ssh/authorized_keys
Make sure that you can ssh haddock.unibe.ch -l deploy without being asked a password.
Run your test with ./ant.sh uploadJar -DmainClass=ch.unibe.scg.cells.hadoop.JUnitRunner -DclassArgument=ch.unibe.scg.cells.hadoop.CellsTestSuite. In case of an unsuccessful run you, you will get the errors in the console.

To kill a hang job

hadoop job -kill job_<your_job_id>

To download the data

Copy the local scripts across the cluster:

./scripts/deploy_scripts.sh

Run the DataFetchPipeline:

ssh leela ./scripts/ohloh/DataFetchPipeline.rb

Or locally, for testing.:

./scripts/ohloh/DataFetchPipeline.rb --max_repos 3

HBase shell

Open with:

hbase shell

List tables:

list

Check MR ouptput

Check HBase table size:

hadoop fs -du -h -s /hbase/

Check size of HAR file:

hadoop fs -du -h /projects/dataset.har

Name		Name	Last commit message	Last commit date
Latest commit History 854 Commits
.settings		.settings
benchmarks		benchmarks
hadoop-conf		hadoop-conf
hbase-conf		hbase-conf
libs		libs
scripts		scripts
src/ch/unibe/scg		src/ch/unibe/scg
.classpath		.classpath
.gitignore		.gitignore
.project		.project
GIT-README.txt		GIT-README.txt
README.md		README.md
ant.sh		ant.sh
build.xml		build.xml
conventions.md		conventions.md
find-bugs.xml		find-bugs.xml
jarjar-1.4.jar		jarjar-1.4.jar
run-protoc-win.sh		run-protoc-win.sh
run-protoc.sh		run-protoc.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup (on a recentish OSX)

Setup (on MS Windows)

Monitoring the cluster

Running hadoop tests with junit outside the unibe network

To kill a hang job

To download the data

HBase shell

Check MR ouptput

About

Releases

Packages

Contributors 2

Languages

nes1983/cc

Folders and files

Latest commit

History

Repository files navigation

Setup (on a recentish OSX)

Setup (on MS Windows)

Monitoring the cluster

Running hadoop tests with junit outside the unibe network

To kill a hang job

To download the data

HBase shell

Check MR ouptput

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages