You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem:
I'm trying to run extract_features on my data but keep getting BrokenPipeError. I tried it on two different computers (both with the same environment) with the same error. The dataset is quite large, merged DataFrame shape: (880169, 522), so it is expected to run for 20 hours. It runs for a few hours and then crashes,
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.10/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.10/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Anything else we need to know?:
I also tried running it with a smaller chunksize and fewer jobs, but with no change.
Hi @johan-sightic !
Thanks for filing the issue and sorry for the long delay!
The "Broken pipe" typically means that the worker processes have been killed for some reason by the OS. Most likely, this is due to memory issues.
If your data consists of multiple IDs I would recommend you produce the features maybe in chunks of identifiers.
If your data consists of just a single ID, you might either want to use a bigger machine or produce features for windows of the data (this is different from the features for the full data, but maybe your use-case allows for this)
The problem:
I'm trying to run
extract_features
on my data but keep gettingBrokenPipeError
. I tried it on two different computers (both with the same environment) with the same error. The dataset is quite large, merged DataFrame shape: (880169, 522), so it is expected to run for 20 hours. It runs for a few hours and then crashes,Settings:
Error (repeated many times):
Anything else we need to know?:
I also tried running it with a smaller
chunksize
and fewer jobs, but with no change.Environment:
The text was updated successfully, but these errors were encountered: