Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Develop #38

Open
wants to merge 526 commits into
base: master
Choose a base branch
from
Open

Develop #38

wants to merge 526 commits into from

Conversation

deeps123123
Copy link

No description provided.

tomasz added 30 commits November 7, 2012 19:17
Summary:

merging new layout fix

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
Summary:
TestTransferBlock doesn't fail the test when needed. Fix it. Test still passes after the fix.
Also, use the more commonly used DFSTestUtil.createFile() to create test files.

Test Plan: run the test
Summary:
1. listStatus: handle non-existing directory correctly.
2. fix start-fsshellservice.sh and stop-fsshellservice.sh
3. Change all Fsshellxxx names to FsShellxxx.

Test Plan: manual tests

Reviewers: pkhemani, weiyan, hkuang

Reviewed By: weiyan
Summary:
We need to rethrow the exception to preserve the old behaviour.
Right now when the error happens we do not actually handle it sometimes.

Test Plan: unittests and the test cluster

Reviewers: aching

Reviewed By: aching

CC: mattwkelly
Summary:
We want to fallback to the default block placement policy for
re-replication since BlockPlacementPolicyConfigurable causes too much
traffic to a small set of nodes during re-replication.

Test Plan:
1) All unit tests.
2) Enhanced TestReplication

Revert Plan:

Tags:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: kannan, hkuang, sdong, weiyan, tomasz

Reviewed By: hkuang

Task ID: 1374530
…ild.xml

Summary:
Since Tom checked in the new layout changes, namespace-notifier codes are not compatible.
And it seems there is no easy fix for that. Temporarily exclude it to unblock build_all.sh

Test Plan: run build_all

Reviewers: sdong

Reviewed By: sdong
Summary:
It looks like the update method of VolumeMap
has to be synchronized to be consistent.
Is this the case?

Test Plan: sanity-check

Reviewers: tomasz, hkuang, sdong

Reviewed By: sdong
Summary:
This patch plugs in the JournalManager interface into
FSEditLog/FSImage.
It only changes the core hdfs classes.

Compile:

ant compile-core
ant compile-core-test

This patch does not touch the avatar side.
I did not run all unit test, so it's a working copy.

Test Plan: ported a whole bunch of unit tests

Reviewers: hkuang, avf

Reviewed By: avf

Differential Revision:
Summary:
With this patch, we are going to have substantial savings on writing
to
edit log, as all longs, shorts, will be written in binary form
without
expensive transformations into utf8 strings (coding,decoding). Also
for
each operations, we will not be writing the length of writables.
This
change is compatible with Apache image layout -39 -> in our case it
will
be -40, since we have a shift for hardlinks.

Test Plan: will run ant-test-core

Reviewers: hkuang, avf

Reviewed By: avf

Differential Revision:

Task ID: 883594

Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
Summary: This patch changes the standby mechanism of ingesting edits,
and plugs in the journal manager interface.With this patch we have a
full flavour implementation of journal manager, the standby is doing
checkpoints, and also is able to failover.

****I added mechanism for recovering from ingest failure, after several
failures, the state of stanby is reset, and we attempt to recover from
problems

****With this patch, it is possible to save namespace when standby is
up, and standby should proceed in this situation, so there is no need
for handling stale checkpoints etc.

----------------------------------------------

Problems:

1) What should happen when we are quiescing standby to -1 transaction
id. Since I try to obtain the streams gradually with retries, the last
stream that does not exist will cause 30 retries before the standby
quiesces

2) The biggest problem that I see now is with the catchUp concept. For
now it's implemented in a way where ingest looks if it's consuming the
inprogress segement. If so, it checks the size of the stream and the
current position and compares them. If it's consuming a finalized
segment, we assume that it's caching up. This means that BK
EditLogInputStream must implement lenght() method for in progress
segments at least, which would give byte size of the stream.

----------------------

All tests, including new ones work, but I'm still working on this patch.
Added tests for
-ingest recover
-reprocessing ingest
-retries when obtaining input stream

--------------------

I removed snapshot from compiling, because the new layout is not there
at all!! I also made some changes in OfferService.java and Client.java,
which are in the trunk, which fixed the timing out test cases.

Test Plan: ant test, manually new test cases

Reviewers: avf, hkuang

Reviewed By: avf

    Task ID: 883594
Summary:
My patch yesterday was wrong. We still need to handle it as any
IOException, with the only exception. This fixes it, we are only
changing the cleanup behaviour, not the rest of the error processing.

Test Plan: deploy and run

Reviewers: aching, rvadali

Reviewed By: rvadali
Summary:

-This patch optimizes the serialization of edit records, by removing
intermediate sptep of setting a thread local variable for longs and
shorts.

-Fixed some todo's in the edit log, exception handling, runtime.exits,
etc

-Optimized logEdit, by removing the checkJournalStep if not needed

-Added micro-benchamrk for edit log writing

Test Plan: ant test

Reviewers: avf, hkuang

Reviewed By: avf

Task ID: 883594
Summary: Ingest was calling syncIfNeeded on a closed log

Test Plan: ant test TestAvatarAPI, TestCAvatarCleanShutdown

Reviewers: avf, hkuang

Reviewed By: avf
Summary:
For some reason chekpointing was disabled in TestAvatarSyncLastTxid, and
it was causing assertion failure for the last txid.

Test Plan: ant test TestAvatarSyncLastTxid

Reviewers: hkuang, avf

Reviewed By: hkuang
Summary: Add the two RPC calls. Also add some comments

Test Plan: manual test it with FsShellServiceClient

Reviewers: weiyan, pkhemani

Reviewed By: pkhemani

Task ID: 1407780
Summary:
The Balancer when it starts up creates a file on HDFS to denote
that a balancer is running. At times during failover since
fs.ha.retrywrites is not set this operation may fail.

Test Plan: TestAvatarBalancer.

Reviewers: hkuang, tomasz

Reviewed By: tomasz
Summary:

This patch fixes the case when the primary is shutdown by interruption,
and the log does not contain the END_LOG_SEGMENT transaction. Ingest
needs to handle this case and relplay the log in exactly the same way.

Test Plan: manually

-primary restart with no transactions except start_segment

-primary restart with some extra transactions except start_segment

added testcase

Reviewers: avf, hkuang

Reviewed By: avf
Summary:

This patch:

1) Adds lastTxId to avatarMxBean, so we can track the lag of standby
behind the primary

2) getLagBytes - it will show the lag bytes ONLY during the regular
ingest. The transaction id is much easier to track, and will be a good
measure of the lag anyways.

3) Some other minor issues
Test Plan: ant test

Reviewers: hkuang, avf

Reviewed By: hkuang

Task ID: 883594
…ks for file level of information or further

Summary:
Now Fsck always prints out total number of data nodes and racks. It is a very expensive operation.
Change the codes not to print it out if user asks for file or block level information. It's very unlikely that user are interested in total number of nodes when having "-files" specified.

Test Plan: manual test

Reviewers: hkuang, tomasz, weiyan

Reviewed By: weiyan
Summary:
This patch adds a dfsadmin command to roll edit log. Plus it add safety
check so the edit log cannot be rolled at the standby.

Test Plan: manually, ant test

Reviewers: hkuang, avf

Reviewed By: avf

Task ID: 883594
Summary:
This fixes:
1) double write to edit log in standby
2) wrong state assertion when finalizing checkpoint

Test Plan: manually

Reviewers: avf, hkuang

Reviewed By: avf
Test Plan:
Unit test added.

Revert Plan:

Tags:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: hkuang, tomasz, weiyan, sdong

Reviewed By: sdong

Task ID: 1435265
… Changes only)

Summary:
Added the new metrics in the JobStatsHook borrowing the counter
information from the Counter enum in Task.java.Preliminary version to
make sure I am on right path.

Test Plan: Not sure how I can test it locally though.

Reviewers: sambavim, kevinwilfong, mukundn, rvadali

Reviewed By: rvadali

CC: satadru

Task ID: 1423664
Summary:
Instead of having the clients be created in synchronized
sections - have the cache be stored in a concurrent hashmap and create
the clients outside of the lock

Test Plan: unittests

Reviewers: rvadali, aching

Reviewed By: rvadali

Task ID: 1381776
Summary:
This will help us specifically for the case when we are trying
to find the large jobs that ran in the cluster. But going forward it
should be useful overall to have the actual command line that was used
to launch a task

Test Plan: my local setup

Reviewers: rvadali, aching

Reviewed By: aching

Task ID: 1441024
…verification

Summary: Provide more choice for the tests.

Test Plan: Run it.

Reviewers: dhruba

Reviewed By: dhruba
Summary: We've made then change in RPC client so that if DNS changes, it will resolve it and use the new address. However, when the proxy is not yet successfully created, this will not be triggered so data node will stuck there. This change adds a resolve logic too.

Test Plan: Run "ant test" to make sure it doesn't break anything.

Reviewers: tomasz, pritam, hkuang

Reviewed By: pritam
Summary:
For errors that are not Connection Refused, we should retry the failed
heartbeat instead of failing the job.
The retry is done after errorCount * heartbeatInterval.
The max retry count is 10.

Test Plan: test cluster

Reviewers: dms, aching, stupaq

Reviewed By: dms
Summary:
This patch changes the behaviour of shutting down rpc server.

Previously
1) running = false causes the connections to be cleaned and sockets to
be closed
2) handlers can still make changes to the namespace
3) responses cannot be sent

Now

1) listener is shutdown but it does not cleanup connections
2) reponder is put into extension mode
3) running = false
4) handlers can finish
5) responses are sent to clients

------------------

This will help with tests, but also should save some retry logic on the
clients during failover in which the clients don't know if a call
succeeded.

------------------

Also, increased heap space for test, as with large output the tests
crash.

Test Plan:
added TestAvatarShutdown in which a number fo files is created, but the
handlers hold untill a shutdown command comes. Then they proceed, and
the repsonses are sent to clients.

After, we restrt the cluster and all the files must be present in the
namespace

Reviewers: hkuang, pritam

Reviewed By: pritam
tomasz added 29 commits November 8, 2012 09:45
Summary:
Now we have two levels of handling problems with journals.
-NNStorage level - what we previously had
-JournalAndStream - enabled/disabled functionality

With this diff, when we set a journal to be disabled, I also want to
report it to failed storage directories, if it's a file journal manager.

The orther direction is already handled, when we start new log segment,
we call attemtRestoreRemovedStorage and the directory is restored, then
the JournalAndStream is reinitialized with enabled option.

I added listing f journals and their state on web ui as well.

also unified some loging, so it's easier to grep

Test Plan: ant test, manually

Reviewers: hkuang, pritam, sdong

Reviewed By: hkuang
Summary:
I discovered two problems when palying with dfstest2 today.

1) When a required journal fails on rollEditLog

JournalSet.mapJournalsAndReportErrors reports one failed journal to
disableAndReportErrorOnJournals, which in turn calls checkJournals(),
which throws IOException. Normally this exception is handled in
FSEditLog.sync, where we call runtime.exit. But here, since we roll the
edit log, the exception is just thrown to the caller, and the failed
journal is disables. Any subsequent operation (logEdit, logSync, etc)
will continue to write to the remaining journal. The nn will not exit.

With this diff, I set a flag that the exception is thorwn, and the next
time when we call disableAndReport... checkJournals() is called and
exception is rethrown, and will be handled in Sync(). I could call
checkJournals always, but I want to avoid doing this, and hence the use
of the flag.

2) When both journals fail on roll, both are eliminated, and then
editLogStream becomes null, and edit log operations fail on NPE, which
is also not good, because this will kill all rpc threads, and the nn
will be still running. I added check for NPE in logEdit, and an
exception in sync is thorwn to trigger shutdown.

Test Plan: added two test cases for the described situations, manually as well

Reviewers: hkuang, pritam, avf

Reviewed By: hkuang
….com/D602268

Summary: Added a unit test file for CoronaReleaseManager. A bug in comparing file path is fixed too.

Test Plan: ant test -Dtestcase=TestReleaseManager -Dtest.output=yes

Reviewers: rvadali, aching

Reviewed By: rvadali

Task ID: 1838671
Summary: There are three changes: (1) Trace some fields in FileStatus objects, (2) Trace config opt, such as job ID at initialization, and (3) delete unused variables.

Test Plan: With tracing enabled (i.e., fs.hdfs.impl = org.apache.hadoop.hdfs.APITraceFileSystem), run simple MapReduce job on local host.  Observe recording of conf options.  Run "fs -ls" on both a directory and file.  Observe results.

Reviewers: sdong

Reviewed By: sdong
…uery in the silver cluster

Summary:
TaskCompletionEvents are not populated properly for the jobs when run
in Corona. Added the unit test to reproduce this error so far.

Test Plan:
This unit test will verify when the bug is identified. Identifying
the bug is in progress.

Reviewers: rvadali

Reviewed By: rvadali

CC: mukundn

Task ID: 1802925
Summary:
After talking to Justin, it seems that this tool is used extensively,
and is very slow.
I looked into it, and I found a few inefficiencies

-Formatter.format- very expensive
-readin permissions
-parsing unnecessary ints and longs
-buffered writer

Altogether, I tested it on some old DFS1 image of size 7.3G

-before- ~35m 30s on my dev box
-after-  ~17m 00s

So we should get x2 speedup for this.

Test Plan: manually

Reviewers: hkuang, sdong

Reviewed By: hkuang
Summary:
With this patch I make the handling of failures the same in logSync and
SyncThread.
In both cases we want to abort if either the journal set is empty for
some reason, or the stream is null.

Test Plan: ant test

Reviewers: hkuang, avf, pritam

Reviewed By: hkuang
…file

Summary:
As part of task 1741642, CoronaReleaseManager will change the tag file timestamp for a working copy of a
release so that the working release will not be cleaned when it is used by some job. But somehow, setTime doesn't work.

Test Plan: ant test -Dtestcase=TestReleaseManager

Reviewers: rvadali

Reviewed By: rvadali

Task ID: 1841095
Summary:
varifyStartup function instantiates nodes (at least one, which is not
shutdown properly (tearDown shuts down the cluster, but the node is not
there), and it interferes with the subsequent tests.

Test Plan: ant test

Reviewers: pritam

Reviewed By: pritam
Summary:
I ported this to alligator but I moved the check for policy to the
beginning of initialize, and then the "dir" object is not initialized.
If the policy check throws IOException, then FSNamesystem.close() is
called and dir.close() throws NPE.

In trunk that doesn't happen since dir is already initialized. But the
additional check won't do any harm, hece I'm submitting this on trunk.

Test Plan: ant test

Reviewers: pritam, junfang, dikang

Reviewed By: dikang
This reverts commit 04f8e95a8e32031e863a2b2d53735a95b4cfef8b.
Summary: DFSAdmin#getBlockInfo is a great tool that I used to debug the block replication problem. I found it more useful if it prints the block's generation stamp as well. This is a simple diff to enhance the tool.

Test Plan: existing unit tests

Reviewers: dikang

Reviewed By: dikang
… blockMap

Summary: Previous diff only makes sure that initial block reports do not add mismatched replicas into NN but non-initial block reports went a different path that still adds mismatched GS replicas into NameNode. This diff fixed this bug.

Test Plan: existing tests

Reviewers: tomasz

Reviewed By: tomasz

Task ID: 1783644
Summary:
My previous fix fixed one thing but broke another - I reverted it. It seems that there
is some dependency between veriFyStartup, so now when I instantiate an
extra noder successfully, I just add it to a list which is cleared at
tesrDown(). This one is safer.

Test Plan: run the test several times

Reviewers: pritam, dikang

Reviewed By: pritam
Summary:
With this this I get rid of a bunch of unnecessary string coversions,
date conversions, and other calls for elements that are not printed in
the output.
This gives additional savings - I estimate 15+% - but more importantly
for standby - less heap pressure.

The shortest time that I got from this is 12 minutes for DFS1 image
(previously taking 35 minutes), although this might be a bit biased
because of os cache.

Test Plan: compare output

Reviewers: hkuang, pritam

Reviewed By: hkuang
Summary: This will be handy for testing the cleanup code.

Test Plan: test cluster

Reviewers: jeanxu

Reviewed By: jeanxu
Summary:
This diff adds cleaning of replication queues to standby. It turns out
that we did not ever see the sizes of most of these queues because the
sizes are updated in computeDataNodeWork(), which always exits because
of the safemode.

This is most likely the reason of the memory leak. At this moment on
dfs1. Just before restart on Saturday there were 6M+ missing blocks.
This is the count I can see, all other counts are hidden. They show to
be 0, but they arent...

Test Plan: ant test

Reviewers: hkuang, pritam, weiyan

Reviewed By: hkuang

Task ID: 1845279
Summary:
This fixes my unintentional change to cleaner interval. We can clean the
queues every minute, there is no need to to dthis every second.

Test Plan: ant

Reviewers: dikang, hkuang

Reviewed By: hkuang
Summary: Heartbeat match should be a debug statement

Test Plan: n/a

Reviewers: jeanxu

Reviewed By: jeanxu

CC: pamelavagata
Summary:
This patch adds saving the list of excess blocks when doing metasave. We
can save meatadata on next shutdown so I could see if there is anything
wrong with excess replicas.

Test Plan: amnually

Reviewers: hkuang, weiyan

Reviewed By: hkuang
Summary:
In our production system, I saw this check take up to 5 minutes, making NN non-responsive.
This diff breaks the check into batches, each holding pendingReplications lock for at most 5000 blocks. It also move the logging out of the lock.

Test Plan: TestPendingReplication

Reviewers: tomasz, pritam

Reviewed By: tomasz

Task ID: 1852353
Summary: Simple fix

Test Plan: ant test

Reviewers: hkuang, pritam

Reviewed By: pritam
Summary:
packageNativeHadoop.sh was not placing libsnappy.so in the correct
location, causing production usage of SnappyCodec to fail

Test Plan:
ant -Dversion=0.20 -Dcompile.native=true -Doffline=true -Dcompile.c++=true -Dlibhdfs=true jar bin-package

and verify that all snappy *.so files are under build/hadoop-0.20/lib/native/Linux-amd64-64

Reviewers: sdong

Reviewed By: sdong
Summary: There is no need to disable this by default.

Test Plan: tested on test cluster

Reviewers: jeanxu

CC: pamelavagata
Summary:
1. Proxy Job Tracker needs to explicitly aggregate the task error counters.
2. Made the error xml part of the corona jar so that it does not have to
   be deployed as part of configuration

Test Plan:
Tested on a single node setup

Revert Plan:

Tags:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: aching, jeanxu, pamelavagata

Reviewed By: jeanxu
Summary:
Test for checking the completion events API

Task ID: #1802925

Blame Rev:

Reviewers: jeanxu

CC:

Test Plan:
ran the test

Revert Plan:

Tags:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Summary:
- add getRedirect to cluster manager thrift interface definition
- add getRedirect implementation to the ClusterManager

Test Plan:
  - Tested changes locally such that running a "select avg(value) from
    my_table" with the poolgroup=group_b and the
    pool=pool_redirectsource results in the following being logged in
    the job stats table: "pool":"group_b.pool_sla"

Reviewers: rvadali, jeanxu, kevinwilfong

Reviewed By: kevinwilfong

Task ID: 1841904
Summary:
Forgot to add these to the last set of changes to ClusterManager.thrift

Test Plan: Tested with the set of changes in diff above

Reviewers: jeanxu, rvadali, kevinwilfong

Reviewed By: kevinwilfong

Task ID: 1841904
Summary:
Currently, OutputStreams are not guaranteed to be removed from DFSClient.filechecker, if OutStream.close() fails. So if clients give it up after the close() failure (which is the recommandation), the OutputStream object is leaked to FileChecker.pendingCreates forever.

This patch tries to fix the issue by always removing files from filechecker when OutputStream.close() is called. Also, the file is removed from the map when recoverLease() is called

Test Plan: ant test

Reviewers: hkuang, tomasz, weiyan

Reviewed By: hkuang

Task ID: 1862779
@facebook-github-bot
Copy link

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla - and if you have received this in error or have any questions, please drop us a line at [email protected]. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants