Lazy Evaluation in Apache Spark

Published on Author adminLeave a comment

What is Lazy Evaluation in Spark

As the name itself indicates its definition, lazy evaluation in Spark means that the execution will not start until an action is triggered. In Spark, the picture of lazy evaluation comes in Spark transformation. Transformations are lazy in nature meaning when we call some operation in RDD, it does not execute immediately; Spark maintains the record of which operation is being called. We can think RDD as the set of instructions that are to be performed on data that we built up through transformation. Although, transformation are lazy in nature, but we can execute operation any time by calling an action on data. In lazy evaluation data is not loaded until it is necessary.

Advantages of Lazy Evaluation in Spark transformation

  1. Increases Manageability : Using Apache Spark RDD lazy evaluation, users can freely organize their Apache Spark program into smaller operations. It reduces the number of passes on data by grouping operations.
  2. Saves Computation and increases speed : Lazy Evaluation plays a key role in saving calculation overhead. Since value does not need to be calculated of, it is not used. Only necessary values are computed. It saves the trip between driver and cluster, thus speeds up the process.
  3. Reduces complexities : The two main complexities of any operation are time and space complexity. Using Spark lazy evaluation we can overcome both. Since we do not execute every operation, the time gets saved. It let us work with an infinite data structure. The action is triggered only when the data is required, it reduces overhead.
  4. Optimization : It provides optimization by reducing the number of queries.

Lazy Evaluation example

 

Leave a Reply

Your email address will not be published. Required fields are marked *