Which of the following statements about RDDs is true?

Disable ads (and more) with a membership for a one time $4.99 payment

Get certified in Apache Spark. Prepare with our comprehensive exam questions, flashcards, and explanations. Ace your exam!

The statement that RDDs are immutable is true. Resilient Distributed Datasets (RDDs) are a fundamental data structure in Apache Spark that are designed to be unmodifiable once created. This immutability has several advantages: it simplifies parallel processing and allows for safe distribution across a cluster, as the original data structure is preserved without the risk of it being changed by multiple operations.

The immutability of RDDs aligns with functional programming principles, where data structures do not change state. Instead, any transformation applied to an RDD results in the creation of a new RDD. This characteristic not only helps with tracking lineage for fault tolerance — every RDD remembers how it was derived from other RDDs — but also enhances scalability.

While RDDs can contain any data type, including both structured and unstructured data, and can be cached in memory or persisted to disk, the core feature that defines RDDs in Spark is their immutable nature. This immutability is a key aspect that distinguishes RDDs from other data structures commonly used in programming that may allow changes after creation.