Apache Spark Certification Practice Test

Session length

1 / 20

Which of the following is NOT a component that sits on top of core Spark?

Spark SQL

GraphX

Hadoop Streaming

Hadoop Streaming is a utility that allows users to create and run Hadoop jobs with any executable or script as the mapper and/or the reducer. It is part of the Hadoop ecosystem and serves as a way to leverage Hadoop’s capabilities through external scripts or executables.

In contrast, Spark SQL, GraphX, and MLlib are all components specifically developed to function on top of the core Spark framework. They utilize Spark's in-memory computing capabilities to enhance performance and facilitate various types of data processing. Spark SQL is used for structured data processing using SQL queries, GraphX is employed for graph processing and analysis, and MLlib provides machine learning algorithms built to leverage Spark's computational power.

Since Hadoop Streaming is not part of the Spark ecosystem and does not sit on top of core Spark, it is the correct choice when identifying which component is NOT built into Spark.

Get further explanation with Examzify DeepDiveBeta

MLlib

Next Question
Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy