Ask Your Question
1

What is the process used by Hadoop to perform input splits?

asked 2022-10-30 11:00:00 +0000

djk gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2022-08-02 01:00:00 +0000

huitzilopochtli gravatar image

Hadoop divides input data into fixed-size chunks called input splits. The process of input splitting is carried out by the following steps:

  1. The Hadoop framework first determines the size of the input data by consulting the underlying file system.

  2. It then calculates the total number of input splits based on the size of the data and the size of the default block size in HDFS.

  3. Hadoop then assigns each split to a different mapper task to ensure parallel processing.

  4. The framework also considers the location of the data on the nodes in the cluster to ensure that the execution of the task is closest to where the data is stored, to minimize data movement.

  5. Once the input splits have been created, they are sent to the mapper functions for further processing.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2022-10-30 11:00:00 +0000

Seen: 16 times

Last updated: Aug 02 '22