Resolving Characters Properly #37

dmarsh400 · 2024-10-02T23:02:07Z

dmarsh400
Oct 2, 2024

I noticed it has a hard time trying to resolve the characters properly and starts to swap between "unknown" "narrator" and regular characters once it runs into an issue resolving, then I have to go through and swap every bit of dialog to their proper character.

I wonder if there is a way to have NLP correctly identify each character. not every author ends dialog with "said Jenny"

also would be good to somehow get rid of footers such as book title and page number as I am having to modify the PDF manually

Answered by dmarsh400

Oct 4, 2024

I think you're right, I used a .txt file and it seems to be doing much better

View full answer

DrewThomasson · 2024-10-03T01:44:12Z

DrewThomasson
Oct 3, 2024
Maintainer

it does use NLP

Actually it does use NLP within BOOKNLP for finding characters and such and a bunch of other stuff.

Booknlp DEMO SPACE I made if you wanna see its output files

For the task of figuring out which character said what (THE TASK OF character Quotation attribution)
it uses 1/3 of the BERT models

So for better performance I'd have to find some way to improve the training of the models,

I've worked on creating a custom pipeline before, with mixed results lol, still working on machine learning,

I have a planned model in the works with an improved (and multilingual) pipeline but-lol school takes up my time.

my first attempt I wrote a paper about is here

Quotation_identification_BERT.v1

lol it still preforms worse than BOOKNLP at the moment tho sadly.

About the pdf

this program uses Calibre to convert PDF into txt to pass through the program,
Honestly, you might want to find someway of finding the books in a native non-PDF form, EPUB is the easiest for it to use

-PDF is hard to deal with and has many inconsistencies with its formatting idk :/

0 replies

dmarsh400 · 2024-10-03T13:39:05Z

dmarsh400
Oct 3, 2024
Author

Ah ok, I will have to look into this and see if there is anything I can do to make it resolve characters.No way I can go through thousands of lines of dialogue manually, it took me an hour just to do one chapter.I wonder if there is a way to add a button that will update the character dialogue after manually adjusting a certain amount.Partly it starts to mess up when it starts trying to assign the narrator as an unknown character.On Oct 2, 2024, at 18:44, Drew Thomasson ***@***.***> wrote: it does use NLP Actually it does use NLP within BOOKNLP for finding characters and such and a bunch of other stuff. Booknlp DEMO SPACE I made if you wanna see its output files For the task of figuring out which character said what (THE TASK OF character Quotation attribution) it uses 1/3 of the BERT models So for better performance I'd have to find some way to improve the training of the models, I've worked on creating a custom pipeline before, with mixed results lol, still working on machine learning, I have a planned model in the works with an improved (and multilingual) pipeline but-lol school takes up my time. my first attempt I wrote a paper about is here Quotation_identification_BERT.v1 lol it still preforms worse than BOOKNLP at the moment tho sadly. About the pdf this program uses Calibre to convert PDF into txt to pass through the program, Honestly, you might want to find someway of finding the books in a native non-PDF form, EPUB is the easiest for it to use

…

-PDF is hard to deal with and has many inconsistencies with its formatting idk :/ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

5 replies

DrewThomasson Oct 3, 2024
Maintainer

Honestly it might just be the pdf format messing things up,

Mind if we do an experiment lol?

Give me the name of the book your trying this for,

I'll try using a epub version,
you try using the pdf version
and then we can compare our output book.csv files
cause I swear usually for the books I give it "as long as the quotes are properly formatted" they come out with like a 90% accuracy, idk what's causing your accuracy to dive down so much on your books lol.

dmarsh400 Oct 4, 2024
Author

I think you're right, I used a .txt file and it seems to be doing much better

Answer selected by DrewThomasson

dmarsh400 Oct 4, 2024
Author

also there was an issue where the quote would end with a period or a comma for some reason and I think that was messing things up to I used find and replace to remove all the (.") and (,") with (") at the end of a sentence

DrewThomasson Oct 4, 2024
Maintainer

weird lol

DrewThomasson Oct 4, 2024
Maintainer

Hopyfully ill find time to finalize this pipeline for people to use if they ever run into using books who's quotes are formatted weird or whatnot tho

https://github.com/DrewThomasson/ext-Reformatter-LLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolving Characters Properly #37

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Resolving Characters Properly #37

dmarsh400 Oct 2, 2024

Replies: 2 comments · 5 replies

DrewThomasson Oct 3, 2024 Maintainer

it does use NLP

my first attempt I wrote a paper about is here

About the pdf

dmarsh400 Oct 3, 2024 Author

DrewThomasson Oct 3, 2024 Maintainer

dmarsh400 Oct 4, 2024 Author

dmarsh400 Oct 4, 2024 Author

DrewThomasson Oct 4, 2024 Maintainer

DrewThomasson Oct 4, 2024 Maintainer

dmarsh400
Oct 2, 2024

Replies: 2 comments 5 replies

DrewThomasson
Oct 3, 2024
Maintainer

dmarsh400
Oct 3, 2024
Author

DrewThomasson Oct 3, 2024
Maintainer

dmarsh400 Oct 4, 2024
Author

dmarsh400 Oct 4, 2024
Author

DrewThomasson Oct 4, 2024
Maintainer

DrewThomasson Oct 4, 2024
Maintainer