Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse: add feature to parse and upload msg files #233

Merged
merged 7 commits into from
Oct 26, 2023
Merged

Conversation

joe-prosser
Copy link
Contributor

@joe-prosser joe-prosser commented Oct 25, 2023

Adds a feature to parse unicode msg files and upload them directly to a source. Extracts the minimum information we require from the msg using cfb crate

@joe-prosser joe-prosser self-assigned this Oct 25, 2023
Copy link
Collaborator

@tommilligan tommilligan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work getting this up and running, it looks excellent to me. Honestly I don't think there's anything stopping us from just backporting this code into the api, so we can support msg files on the server side there (if we actually want to do that).

Nothing from me - just style/nits which I've pushed review commits for

Comment on lines 186 to 194
clean_headers_string = CONTENT_TYPE_MIME_HEADER_RX
.clone()
.replace(&headers_string, "")
.to_string();

clean_headers_string = CONTENT_TRANSFER_ENCODING_MIME_HEADER_RX
.clone()
.replace(&clean_headers_string, "")
.to_string();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need .clone() here - .replace(...) returns a Cow which when you call to_string(), will only call clone() internally if necessary.

docs

Suggested change
clean_headers_string = CONTENT_TYPE_MIME_HEADER_RX
.clone()
.replace(&headers_string, "")
.to_string();
clean_headers_string = CONTENT_TRANSFER_ENCODING_MIME_HEADER_RX
.clone()
.replace(&clean_headers_string, "")
.to_string();
clean_headers_string = CONTENT_TYPE_MIME_HEADER_RX
.replace(&headers_string, "")
.to_string();
clean_headers_string = CONTENT_TRANSFER_ENCODING_MIME_HEADER_RX
.replace(&clean_headers_string, "")
.to_string();

Comment on lines 273 to 283
.filter_map(|path| {
if let Ok(path) = path {
if !path.path().extension().is_some_and(|msg| msg == "msg") {
None
} else {
Some(path)
}
} else {
None
}
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written slightly strangely - nested if normally indicates you can use a combinator, in this case .ok(). Also you negate the if statement for some reason, which makes it harder to read?

Suggested change
.filter_map(|path| {
if let Ok(path) = path {
if !path.path().extension().is_some_and(|msg| msg == "msg") {
None
} else {
Some(path)
}
} else {
None
}
})
.filter_map(|path| {
let path = path.ok()?;
if path.path().extension().is_some_and(|msg| msg == "msg") {
Some(path)
} else {
None
}
})

Comment on lines 313 to 321
upload_batch_of_documents(
client,
&source,
&documents,
transform_tag,
*no_charge,
&statistics,
)?;
documents.clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you call this more than once (again after the loop), you can easily pull this out into a closure:

    let send = |documents: &mut Vec<Document>| -> Result<()> {
        upload_batch_of_documents(
            client,
            &source,
            documents,
            transform_tag,
            *no_charge,
            &statistics,
        )?;
        documents.clear();
        Ok(())
    };

compound_file: &mut CompoundFile<File>,
) -> Result<AttachmentMetadata> {
let mut attachment_name_path = attachment_path.clone();
attachment_name_path.push(STREAM_PATH_ATTACHMENT_FILENAME.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't have to clone here

Suggested change
attachment_name_path.push(STREAM_PATH_ATTACHMENT_FILENAME.clone());
attachment_name_path.push(&*STREAM_PATH_ATTACHMENT_FILENAME);

* dereferences your Lazy instance to the inner value, then & borrows the PathBuf into a Path (which is all you need for push here)

}

fn read_unicode_stream_to_string(
stream_path: PathBuf,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in a few other places you don't actually need a PathBuf, just a Path, which allows you to remove a bunch of clone()s elsewhere.

@joe-prosser joe-prosser merged commit 39601c2 into master Oct 26, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants