Revision history [back]

When using the multiprocessing.Pool in Python with a function that returns a custom object, you need to ensure that the custom object can be pickled, which is necessary for transferring data between processes. Here is an example of how to do it:

First, create the custom object and define the necessary functions for pickling and unpickling:

import pickle

class CustomObject:
    def __init__(self, data):
        self.data = data

    def __getstate__(self):
        return {"data": self.data}

    def __setstate__(self, state):
        self.data = state["data"]

def custom_object_pickle(obj):
    return pickle.dumps(obj)

def custom_object_unpickle(data):
    return CustomObject.__setstate__(CustomObject(), pickle.loads(data))

Next, define the function that will be executed in parallel using the multiprocessing.Pool:

def parallel_function(arg):
    # do some calculations
    custom_obj = CustomObject(data)
    return custom_obj

Then, create the multiprocessing.Pool object and pass the functions for pickling and unpickling the custom object as arguments to the initializer:

if __name__ == "__main__":
    pool = multiprocessing.Pool(
        processes=multiprocessing.cpu_count(),
        initializer=multiprocessing.get_logger().info,
        initargs=("test",),
        pickle_protocol=pickle.HIGHEST_PROTOCOL,
        pickle_custom_objects=[CustomObject],
    )

    results = pool.map(parallel_function, args_list)

    pool.close()
    pool.join()

Note that the pickle_protocol argument specifies the highest protocol version to use for pickling, and the pickle_custom_objects argument specifies a list of custom objects that need to be pickled and unpickled.

By following these steps, you should be able to use the multiprocessing.Pool in Python to parallelize a function that returns a custom object.