Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects #46

Open
codeavenger18 opened this issue Oct 28, 2020 · 4 comments

Comments

@codeavenger18
Copy link

I am using Google Colab and trying to use Stanford sutime library inside a function that is being called by pyspark.

This function takes a row of a given RDD and then uses the sutime library to return a (sentence, frequency) pair.

def convert1(row):
  s=str(row.dosefrequency)    
  s=s.lower()  
  try:
    i=sutime.parse(s)     #This parses the input string and output the frequency like P1D(Per Day).
    if len(i)>0:
      if 'timex-value' in i[0]:
        return [s,i[0]['timex-value']]
      else:
        return []
    else:
      return []
  except Exception as e:
    return []

My input RDD looks like:-

rdd.take(3)
'''
[Row(practiceid=701, dosequantity='200', dosefrequency='take 2 tablet by oral route  every day', count_dosequantity=716, count_dosefrequency=1, count_patientuid=306, DM Current -hychqudose='200mg', DM Expected Value='400mg'),
 Row(practiceid=595, dosequantity='200', dosefrequency='take 1 tablet by oral route 2 times every day', count_dosequantity=327, count_dosefrequency=1, count_patientuid=230, DM Current -hychqudose='200mg', DM Expected Value='400mg'),
 Row(practiceid=623, dosequantity='200', dosefrequency='take 1 (200MG)  by oral route 2 times every day', count_dosequantity=339, count_dosefrequency=1, count_patientuid=180, DM Current -hychqudose='200mg', DM Expected Value='400mg')]
'''

This is how i am calling the function using flatmap :

details = rdd.flatMap(lambda row: convert1(row)).collect()

But it gives me the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/pickle.py", line 916, in save_global
    __import__(module_name, level=0)
ModuleNotFoundError: No module named 'edu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 841, in save_global
    return Pickler.save_global(self, obj, name=name)
  File "/usr/lib/python3.6/pickle.py", line 922, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <java class 'edu.stanford.nlp.python.SUTimeWrapper'>: it's not found as edu.stanford.nlp.python.edu.stanford.nlp.python.SUTimeWrapper

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/pickle.py", line 916, in save_global
    __import__(module_name, level=0)
ModuleNotFoundError: No module named 'java'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 841, in save_global
    return Pickler.save_global(self, obj, name=name)
  File "/usr/lib/python3.6/pickle.py", line 922, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <java class 'java.lang.Object'>: it's not found as java.lang.java.lang.Object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/serializers.py", line 468, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 1097, in dumps
    cp.dump(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 357, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends
    save(tmp[0])
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 496, in save_function
    self.save_function_tuple(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 496, in save_function
    self.save_function_tuple(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 605, in save_reduce
    save(cls)
  File "/usr/lib/python3.6/pickle.py", line 490, in save
    self.save_global(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 850, in save_global
    return self.save_dynamic_class(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 662, in save_dynamic_class
    obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 490, in save
    self.save_global(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 850, in save_global
    return self.save_dynamic_class(obj)
  File "/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/cloudpickle.py", line 666, in save_dynamic_class
    save(clsdict)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
TypeError: can't pickle _jpype._JMethod objects
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/usr/lib/python3.6/pickle.py in save_global(self, obj, name)
    915         try:
--> 916             __import__(module_name, level=0)
    917             module = sys.modules[module_name]

ModuleNotFoundError: No module named 'edu'

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
65 frames
PicklingError: Can't pickle <java class 'edu.stanford.nlp.python.SUTimeWrapper'>: it's not found as edu.stanford.nlp.python.edu.stanford.nlp.python.SUTimeWrapper

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
ModuleNotFoundError: No module named 'java'

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
PicklingError: Can't pickle <java class 'java.lang.Object'>: it's not found as java.lang.java.lang.Object

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
TypeError: can't pickle _jpype._JMethod objects

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
/content/spark-3.0.1-bin-hadoop3.2/python/pyspark/serializers.py in dumps(self, obj)
    476                 msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg)
    477             print_exec(sys.stderr)
--> 478             raise pickle.PicklingError(msg)
    479 
    480 

PicklingError: Could not serialize object: TypeError: can't pickle _jpype._JMethod objects

Although I tried to call the convert1 function explicitly using the code :

rr=rdd.take(10)
for i in range(10):
  x=convert1(rr[i])
  print(x)

The above code perfectly fine for me. It is not working with flatMap.

Kindly ask for necessary details if required.

@Thrameos
Copy link

There is a special pickler for Java objects as the general Python one can't handle them. I would try it and see if it helps. The Java needs a context set up for each serialization stream so that the same object does not get serialized more than once Python didn't provide hooks for that kind of behavior.

@codeavenger18
Copy link
Author

@Thrameos We have not used pickle anywhere in our code it is just that "pickle" is being used internally for some purpose. Maybe it has something to do with flatMap function, not sure. Can you elaborate on why pickle is being used?

@Thrameos
Copy link

Pickle appears to be used internally by spark. I haven't used this myself but from the exception it appears to be a pickler. I have seen a similar issue when people attempt to use certain multiprocessing commands.

@codeavenger18
Copy link
Author

Is there any workaround for resolving this issue? What Else Can I try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants