Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filetail, log file move to history dir ,how to configure file generation and archive strategies #52

Open
dongbin86 opened this issue May 4, 2017 · 6 comments

Comments

@dongbin86
Copy link

I have nginx log file to collect in "/data/logs/nginx/xyz.log"
every day ,mv the log and compress to /data/logs/nginx/2017/05/xyz.log.tar.gz
and recreate new log file /data/logs/nginx/xyz.log
how to configure file generation and archive strategies ?
i try use default "Active File with Reverse Counter " naming type ,but i found streamsets try to collect /data/logs/nginx/2017 ,but that is a dir

@metadaddy
Copy link
Contributor

I'm not clear on what you want SDC to do. Do you want it to read the xyz.log.tar.gz file each day?

@dongbin86
Copy link
Author

no,i only need to collect current xyz.log
but ,if I move the log to a dir that in the same dir , when multifilereader to refresh offset and recompute header hash ,they will coccur java.io.FileNotFoundException,becase ,that dir not exclude

@dongbin86
Copy link
Author

I think LiveFile.java -> refresh() should change to this

if (changed) {
try (DirectoryStream directoryStream = Files.newDirectoryStream(path.getParent())) {
for (Path path : directoryStream) {
if (!path.toFile().isDirectory()){
BasicFileAttributes attrs = Files.readAttributes(path, BasicFileAttributes.class);
String iNode = attrs.fileKey().toString();
int headLen = (int) Math.min(this.headLen, attrs.size());
String headHash = computeHash(path, headLen);
if (iNode.equals(this.iNode) && headHash.equals(this.headHash)) {
if (headLen == 0) {
headLen = (int) Math.min(HEAD_LEN, attrs.size());
headHash = computeHash(path, headLen);
/*get file header content and compute md5 as hashvalue/
}
refresh = new LiveFile(path, iNode, headHash, headLen);
break;
}
}

    }
  }

@dongbin86
Copy link
Author

if use filetail to collect logs ,then at the same log dir,not permit subdir exist, otherwise livefile refresh will throw exception, but in real product env , logs aways compress and mv to subdirs,
so I think,if the file has been renamed in the same dir ,then fresh, but if the file has been deleted or mv to away ,refresh should return null

@metadaddy
Copy link
Contributor

@Sumpan Have you tried the above fix? Is it working for you?

@dongbin86
Copy link
Author

yes , I change some code ,and i works fine ,here's the pull request https://github.com/streamsets/datacollector/pull/27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants