You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Google Colab and trying to use Stanford sutime library inside a function that is being called by pyspark.
This function takes a row of a given RDD and then uses the sutime library to return a (sentence, frequency) pair.
def convert1(row):
s=str(row.dosefrequency)
s=s.lower()
try:
i=sutime.parse(s) #This parses the input string and output the frequency like P1D(Per Day).
if len(i)>0:
if 'timex-value' in i[0]:
return [s,i[0]['timex-value']]
else:
return []
else:
return []
except Exception as e:
return []
My input RDD looks like:-
rdd.take(3)
'''
[Row(practiceid=701, dosequantity='200', dosefrequency='take 2 tablet by oral route every day', count_dosequantity=716, count_dosefrequency=1, count_patientuid=306, DM Current -hychqudose='200mg', DM Expected Value='400mg'),
Row(practiceid=595, dosequantity='200', dosefrequency='take 1 tablet by oral route 2 times every day', count_dosequantity=327, count_dosefrequency=1, count_patientuid=230, DM Current -hychqudose='200mg', DM Expected Value='400mg'),
Row(practiceid=623, dosequantity='200', dosefrequency='take 1 (200MG) by oral route 2 times every day', count_dosequantity=339, count_dosefrequency=1, count_patientuid=180, DM Current -hychqudose='200mg', DM Expected Value='400mg')]
'''
This is how i am calling the function using flatmap :
Traceback (most recent call last):
File "/usr/lib/python3.6/pickle.py", line 916, in save_global
__import__(module_name, level=0)
ModuleNotFoundError: No module named 'edu'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 841, in save_global
return Pickler.save_global(self, obj, name=name)
File "/usr/lib/python3.6/pickle.py", line 922, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle <java class 'edu.stanford.nlp.python.SUTimeWrapper'>: it's not found as edu.stanford.nlp.python.edu.stanford.nlp.python.SUTimeWrapper
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/pickle.py", line 916, in save_global
__import__(module_name, level=0)
ModuleNotFoundError: No module named 'java'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 841, in save_global
return Pickler.save_global(self, obj, name=name)
File "/usr/lib/python3.6/pickle.py", line 922, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle <java class 'java.lang.Object'>: it's not found as java.lang.java.lang.Object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/serializers.py", line 468, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 1097, in dumps
cp.dump(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 357, in dump
return Pickler.dump(self, obj)
File "/usr/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 501, in save_function
self.save_function_tuple(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 496, in save_function
self.save_function_tuple(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 496, in save_function
self.save_function_tuple(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.6/pickle.py", line 605, in save_reduce
save(cls)
File "/usr/lib/python3.6/pickle.py", line 490, in save
self.save_global(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 850, in save_global
return self.save_dynamic_class(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 662, in save_dynamic_class
obj=obj)
File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/usr/lib/python3.6/pickle.py", line 490, in save
self.save_global(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 850, in save_global
return self.save_dynamic_class(obj)
File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 666, in save_dynamic_class
save(clsdict)
File "/usr/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle _jpype._JMethod objects
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/usr/lib/python3.6/pickle.py in save_global(self, obj, name)
915 try:
--> 916 __import__(module_name, level=0)
917 module = sys.modules[module_name]
ModuleNotFoundError: No module named 'edu'
During handling of the above exception, another exception occurred:
PicklingError Traceback (most recent call last)
65 frames
PicklingError: Can't pickle <java class 'edu.stanford.nlp.python.SUTimeWrapper'>: it's not found as edu.stanford.nlp.python.edu.stanford.nlp.python.SUTimeWrapper
During handling of the above exception, another exception occurred:
ModuleNotFoundError Traceback (most recent call last)
ModuleNotFoundError: No module named 'java'
During handling of the above exception, another exception occurred:
PicklingError Traceback (most recent call last)
PicklingError: Can't pickle <java class 'java.lang.Object'>: it's not found as java.lang.java.lang.Object
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
TypeError: can't pickle _jpype._JMethod objects
During handling of the above exception, another exception occurred:
PicklingError Traceback (most recent call last)
/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/serializers.py in dumps(self, obj)
476 msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg)
477 print_exec(sys.stderr)
--> 478 raise pickle.PicklingError(msg)
479
480
PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects
Although I tried to call the convert1 function explicitly using the code :
rr=rdd.take(10)
for i in range(10):
x=convert1(rr[i])
print(x)
The above code perfectly fine for me. It is not working with flatMap.
Kindly ask for necessary details if required.
The text was updated successfully, but these errors were encountered:
There is a special pickler for Java objects as the general Python one can't handle them. I would try it and see if it helps. The Java needs a context set up for each serialization stream so that the same object does not get serialized more than once Python didn't provide hooks for that kind of behavior.
@Thrameos We have not used pickle anywhere in our code it is just that "pickle" is being used internally for some purpose. Maybe it has something to do with flatMap function, not sure. Can you elaborate on why pickle is being used?
Pickle appears to be used internally by spark. I haven't used this myself but from the exception it appears to be a pickler. I have seen a similar issue when people attempt to use certain multiprocessing commands.
I am using Google Colab and trying to use Stanford sutime library inside a function that is being called by pyspark.
This function takes a row of a given RDD and then uses the sutime library to return a (sentence, frequency) pair.
My input RDD looks like:-
This is how i am calling the function using flatmap :
But it gives me the following error:
Although I tried to call the convert1 function explicitly using the code :
The above code perfectly fine for me. It is not working with flatMap.
Kindly ask for necessary details if required.
The text was updated successfully, but these errors were encountered: