You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I call chunk_batch function in WordChunker, it will shows the error message as below:
batch_chunks: List[List[Chunk]] = chunker.chunk_batch(text=texts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/chonkie/chunker/base.py", line 214, in chunk_batch
return pool.map(self.chunk, text)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/pool.py", line 774, in get
raise self._value
File "/opt/conda/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
put(task)
File "/opt/conda/lib/python3.11/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'BaseChunker._get_tokenizer_counter.<locals>.<lambda>'
here is the code that I run:
fromchonkieimportWordChunkerfromautotiktokenizerimportAutoTikTokenizertokenizer=AutoTikTokenizer.from_pretrained("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")
chunker=WordChunker(
tokenizer=tokenizer,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
)
# the `documents` is a `List[str]` object, it contains several articles.texts: List[str] = []
fordocumentindocuments:
texts.append(document.content)
list_my_chunks: List[List[ChunkModel]] = []
batch_chunks: List[List[Chunk]] =chunker.chunk_batch(text=texts)
The text was updated successfully, but these errors were encountered:
When I call
chunk_batch
function in WordChunker, it will shows the error message as below:here is the code that I run:
The text was updated successfully, but these errors were encountered: