I want parallel processing results by file #93

Ribo-Py · 2021-10-11T21:06:01Z

I have below code to parallel process multiple files. I want the results ordered by file (first loop), however, it is now ordered by clause (third loop) as shown in image. How may I fix it?

`
def extractFile(file):
pdf = pdfplumber.open(os.path.join(myPath, file))
n_clause = 0

for i in range(len(pdf.pages)):
    page = pdf.pages[i]
    text = page.extract_text()
    # tables = tabula.read_pdf(os.path.join(myPath, file), pages=i+1, multiple_tables=True)

    if re.search(keywords.casefold(), text.casefold()):
        highlights = text.split('.')
        for sentence in highlights:
            if re.search(keywords.casefold(), sentence.casefold()):
                n_clause += 1
                if n_clause <= clause_cap:
                    print(f'[Contract Name: {file}] \n Page {i+1} - Clause {n_clause}: {attention(sentence, keywords)}')
                else:
                    break

files = [x for x in os.listdir(myPath) if x.endswith(".pdf")]

with mp.Pool(6) as pool:
print(pool.map(extractFile, files))
`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want parallel processing results by file #93

I want parallel processing results by file #93

Ribo-Py commented Oct 11, 2021 •

edited

Loading

I want parallel processing results by file #93

I want parallel processing results by file #93

Comments

Ribo-Py commented Oct 11, 2021 • edited Loading

Ribo-Py commented Oct 11, 2021 •

edited

Loading