PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects #46

codeavenger18 · 2020-10-28T07:19:24Z

I am using Google Colab and trying to use Stanford sutime library inside a function that is being called by pyspark.

This function takes a row of a given RDD and then uses the sutime library to return a (sentence, frequency) pair.

def convert1(row):
  s=str(row.dosefrequency)    
  s=s.lower()  
  try:
    i=sutime.parse(s)     #This parses the input string and output the frequency like P1D(Per Day).
    if len(i)>0:
      if 'timex-value' in i[0]:
        return [s,i[0]['timex-value']]
      else:
        return []
    else:
      return []
  except Exception as e:
    return []

My input RDD looks like:-

rdd.take(3)
'''
[Row(practiceid=701, dosequantity='200', dosefrequency='take 2 tablet by oral route  every day', count_dosequantity=716, count_dosefrequency=1, count_patientuid=306, DM Current -hychqudose='200mg', DM Expected Value='400mg'),
 Row(practiceid=595, dosequantity='200', dosefrequency='take 1 tablet by oral route 2 times every day', count_dosequantity=327, count_dosefrequency=1, count_patientuid=230, DM Current -hychqudose='200mg', DM Expected Value='400mg'),
 Row(practiceid=623, dosequantity='200', dosefrequency='take 1 (200MG)  by oral route 2 times every day', count_dosequantity=339, count_dosefrequency=1, count_patientuid=180, DM Current -hychqudose='200mg', DM Expected Value='400mg')]
'''

This is how i am calling the function using flatmap :

details = rdd.flatMap(lambda row: convert1(row)).collect()

But it gives me the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/pickle.py", line 916, in save_global
    __import__(module_name, level=0)
ModuleNotFoundError: No module named 'edu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 841, in save_global
    return Pickler.save_global(self, obj, name=name)
  File "/usr/lib/python3.6/pickle.py", line 922, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <java class 'edu.stanford.nlp.python.SUTimeWrapper'>: it's not found as edu.stanford.nlp.python.edu.stanford.nlp.python.SUTimeWrapper

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/pickle.py", line 916, in save_global
    __import__(module_name, level=0)
ModuleNotFoundError: No module named 'java'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 841, in save_global
    return Pickler.save_global(self, obj, name=name)
  File "/usr/lib/python3.6/pickle.py", line 922, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <java class 'java.lang.Object'>: it's not found as java.lang.java.lang.Object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/serializers.py", line 468, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 1097, in dumps
    cp.dump(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 357, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends
    save(tmp[0])
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 496, in save_function
    self.save_function_tuple(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 496, in save_function
    self.save_function_tuple(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 605, in save_reduce
    save(cls)
  File "/usr/lib/python3.6/pickle.py", line 490, in save
    self.save_global(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 850, in save_global
    return self.save_dynamic_class(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 662, in save_dynamic_class
    obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 490, in save
    self.save_global(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 850, in save_global
    return self.save_dynamic_class(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 666, in save_dynamic_class
    save(clsdict)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
TypeError: can't pickle _jpype._JMethod objects
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/usr/lib/python3.6/pickle.py in save_global(self, obj, name)
    915         try:
--> 916             __import__(module_name, level=0)
    917             module = sys.modules[module_name]

ModuleNotFoundError: No module named 'edu'

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
65 frames
PicklingError: Can't pickle <java class 'edu.stanford.nlp.python.SUTimeWrapper'>: it's not found as edu.stanford.nlp.python.edu.stanford.nlp.python.SUTimeWrapper

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
ModuleNotFoundError: No module named 'java'

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
PicklingError: Can't pickle <java class 'java.lang.Object'>: it's not found as java.lang.java.lang.Object

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
TypeError: can't pickle _jpype._JMethod objects

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/serializers.py in dumps(self, obj)
    476                 msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg)
    477             print_exec(sys.stderr)
--> 478             raise pickle.PicklingError(msg)
    479 
    480 

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects

Although I tried to call the convert1 function explicitly using the code :

rr=rdd.take(10)
for i in range(10):
  x=convert1(rr[i])
  print(x)

The above code perfectly fine for me. It is not working with flatMap.

Kindly ask for necessary details if required.

The text was updated successfully, but these errors were encountered:

Thrameos · 2020-10-28T20:53:44Z

There is a special pickler for Java objects as the general Python one can't handle them. I would try it and see if it helps. The Java needs a context set up for each serialization stream so that the same object does not get serialized more than once Python didn't provide hooks for that kind of behavior.

codeavenger18 · 2020-10-29T05:46:29Z

@Thrameos We have not used pickle anywhere in our code it is just that "pickle" is being used internally for some purpose. Maybe it has something to do with flatMap function, not sure. Can you elaborate on why pickle is being used?

Thrameos · 2020-10-29T17:51:07Z

Pickle appears to be used internally by spark. I haven't used this myself but from the exception it appears to be a pickler. I have seen a similar issue when people attempt to use certain multiprocessing commands.

codeavenger18 · 2020-10-30T07:46:43Z

Is there any workaround for resolving this issue? What Else Can I try?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects #46

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects #46

codeavenger18 commented Oct 28, 2020

Thrameos commented Oct 28, 2020

codeavenger18 commented Oct 29, 2020

Thrameos commented Oct 29, 2020

codeavenger18 commented Oct 30, 2020

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects #46

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects #46

Comments

codeavenger18 commented Oct 28, 2020

Thrameos commented Oct 28, 2020

codeavenger18 commented Oct 29, 2020

Thrameos commented Oct 29, 2020

codeavenger18 commented Oct 30, 2020