-
I noticed it has a hard time trying to resolve the characters properly and starts to swap between "unknown" "narrator" and regular characters once it runs into an issue resolving, then I have to go through and swap every bit of dialog to their proper character. I wonder if there is a way to have NLP correctly identify each character. not every author ends dialog with "said Jenny" also would be good to somehow get rid of footers such as book title and page number as I am having to modify the PDF manually |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
it does use NLPActually it does use NLP within BOOKNLP for finding characters and such and a bunch of other stuff. Booknlp DEMO SPACE I made if you wanna see its output files For the task of figuring out which character said what (THE TASK OF character Quotation attribution) So for better performance I'd have to find some way to improve the training of the models, I've worked on creating a custom pipeline before, with mixed results lol, still working on machine learning,
my first attempt I wrote a paper about is hereQuotation_identification_BERT.v1
About the pdf
-PDF is hard to deal with and has many inconsistencies with its formatting idk :/ |
Beta Was this translation helpful? Give feedback.
-
Ah ok, I will have to look into this and see if there is anything I can do to make it resolve characters.No way I can go through thousands of lines of dialogue manually, it took me an hour just to do one chapter.I wonder if there is a way to add a button that will update the character dialogue after manually adjusting a certain amount.Partly it starts to mess up when it starts trying to assign the narrator as an unknown character.On Oct 2, 2024, at 18:44, Drew Thomasson ***@***.***> wrote:
it does use NLP
Actually it does use NLP within BOOKNLP for finding characters and such and a bunch of other stuff.
Booknlp DEMO SPACE I made if you wanna see its output files
For the task of figuring out which character said what (THE TASK OF character Quotation attribution)
it uses 1/3 of the BERT models
So for better performance I'd have to find some way to improve the training of the models,
I've worked on creating a custom pipeline before, with mixed results lol, still working on machine learning,
I have a planned model in the works with an improved (and multilingual) pipeline but-lol school takes up my time.
my first attempt I wrote a paper about is here
Quotation_identification_BERT.v1
lol it still preforms worse than BOOKNLP at the moment tho sadly.
About the pdf
this program uses Calibre to convert PDF into txt to pass through the program,
Honestly, you might want to find someway of finding the books in a native non-PDF form, EPUB is the easiest for it to use
…-PDF is hard to deal with and has many inconsistencies with its formatting idk :/
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
I think you're right, I used a .txt file and it seems to be doing much better