You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have below code to parallel process multiple files. I want the results ordered by file (first loop), however, it is now ordered by clause (third loop) as shown in image. How may I fix it?
`
def extractFile(file):
pdf = pdfplumber.open(os.path.join(myPath, file))
n_clause = 0
for i in range(len(pdf.pages)):
page = pdf.pages[i]
text = page.extract_text()
# tables = tabula.read_pdf(os.path.join(myPath, file), pages=i+1, multiple_tables=True)
if re.search(keywords.casefold(), text.casefold()):
highlights = text.split('.')
for sentence in highlights:
if re.search(keywords.casefold(), sentence.casefold()):
n_clause += 1
if n_clause <= clause_cap:
print(f'[Contract Name: {file}] \n Page {i+1} - Clause {n_clause}: {attention(sentence, keywords)}')
else:
break
files = [x for x in os.listdir(myPath) if x.endswith(".pdf")]
with mp.Pool(6) as pool:
print(pool.map(extractFile, files))
`
The text was updated successfully, but these errors were encountered:
I have below code to parallel process multiple files. I want the results ordered by file (first loop), however, it is now ordered by clause (third loop) as shown in image. How may I fix it?
`
def extractFile(file):
pdf = pdfplumber.open(os.path.join(myPath, file))
n_clause = 0
files = [x for x in os.listdir(myPath) if x.endswith(".pdf")]
with mp.Pool(6) as pool:
print(pool.map(extractFile, files))
`
The text was updated successfully, but these errors were encountered: