Decoding Spark’s Power Duo: Magic of RDD and DAG in Distributed Data ProcessingLet’s discuss about the two main abstractions in Apache Spark: DAG (Directed Acyclic Graph) and RDD (Resilient Distributed Dataset).Jan 12, 2024Jan 12, 2024
Unveiling the Symphony of Driver, Executor, and Cluster Manager in Distributed Data ProcessingLet’s delve deeper into the concept of Cluster Manager, Driver and Worker nodes using the same library analogy:Jan 9, 2024Jan 9, 2024
Demystifying Spark ArchitectureImagine you have a massive amount of data, like a huge collection of books. Now, you want to analyse and gain insights from all these…Jan 8, 2024Jan 8, 2024