Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about getting chunk data #234

Open
yangyangjuanjuan opened this issue Oct 19, 2018 · 5 comments
Open

question about getting chunk data #234

yangyangjuanjuan opened this issue Oct 19, 2018 · 5 comments

Comments

@yangyangjuanjuan
Copy link

I read the code but did not find a function to get a chunk/block data. Is there a good way to read the block by given that block's information? The block information like:
blocks: 2
BlockInfo:
offset: 67108864
kfsChunkId_t: 262186
int64_t: 1
ServerLocation: 127.0.0.1 21001
chunkOff_t: 16777216

@mikeov
Copy link
Contributor

mikeov commented Oct 21, 2018

QFS client read from a “logical” chunk position's would obviously fetch chunk data, though with replicated file there is no control which replica will be used.

KfsClient::CompareChunkReplicas() fetches all chunk replicas data and compares it. KfsClient::VerifyDataChecksums() fetches chunk checksum vectors and compares them, instead of the actual data.

These QFS client methods are used by qfsdataverify tool.

@yangyangjuanjuan
Copy link
Author

Thanks for clarification.
I asked this question because I hope to find a way load chunk locally. This would be a nice feature for downstream development based on qfs system. For example, I would like to assign task to the node which has data chunk in local, so it avoids transferring data by network.
I found there are chunk folders on chunk server, and chunks are stored there. But they are not stored as plan text. Is there a function can read it?
In addition, may I assume each block will end with a complete row (if I update a well formatted .csv file to qfs)? In another word, would there a row get spitted into two chunks?
Thanks again.

@mikeov
Copy link
Contributor

mikeov commented Oct 28, 2018

KfsClient::GetDataLocation() can be used to retrieve chunks / stripes location.

Chink / stripe boundaries are always at fixed positions / locations / offsets, i.e. they are independent of file content. In other words, the assumption that the row boundaries of csv file will coincide with chunk boundaries will not hold true. Small files are obvious special case where data fits in one chunk or in one stripe for striped files.

@yangyangjuanjuan
Copy link
Author

Thanks for your reply @mikeov
Would you give me some suggestions for the following case?
I have QFS deployed on a cluster, and I want to do some map reduce on a data set which is composed by two chunks(blocks) in QFS, each chunk has three replicas. I want each map task assigned to a chunk server which has chunk's replicate locally. KfsClient::GetDataLocation() can be used to find right chunk server, but how can I load a local chunk on a chunk server? If I use KfsClient::Read(), will it always try the local replicate first (if there is)? Thanks again!

@mikeov
Copy link
Contributor

mikeov commented Nov 23, 2018

QFS client attempts to read from the "local" (same node as client) chunk server when possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants