Tuesday 5 January 2016

CCA Spark and Hadoop Developer Exam (CCA175) Details


Exam Question Format

Number of Questions: 10–12 performance-based (hands-on) tasks on CHD5 cluster. See below for full cluster configureation
Time Limit: 120 minutes
Passing Score: 70%
Language: English, Japanese (forthcoming)
Price: USD $295


Audience and Prerequisites

There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam.

Register for CCA175>>


Required Skills

Data Ingest
The skills to transfer data between external systems and your cluster. This includes the following:
  • Import data from a MySQL database into HDFS using Sqoop
  • Export data to a MySQL database from HDFS using Sqoop
  • Change the delimiter and file format of data during import using Sqoop
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Load data into and out of HDFS using the Hadoop File System (FS) commands

Transform, Stage, Store
Convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS. This includes writing Spark applications in both Scala and Python:
  • Load data from HDFS and storing results back to HDFS using Spark
  • Join disparate datasets together using Spark
  • Calculate aggregate statistics (e.g., average or sum) using Spark
  • Filter data into a smaller dataset using Spark
  • Write a query that produces ranked or sorted data using Spark

Data Analysis
Use DDL (Data Definition Language) in order to create tables in the Hive metastore for use by Hive and Impala.
  • Read and/or create a table in the Hive metastore in a given schema
  • Extract an Avro schema from a set of datafiles using avro-tools
  • Create a table in the Hive metastore using the Avro file format and an external schema file
  • Improve query performance by creating partitioned tables in the Hive metastore
  • Evolve an Avro schema by changing JSON files

Exam delivery and cluster information

CCA175 is a remote-proctored exam available anywhere, anytime. See the FAQ for more information and system requirements.

CCA175 is a hands-on, practical exam using Cloudera technologies. Each user is given their own CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.

Documentation Available online during the exam

Cloudera Product Documentation
Hadoop - Apache Hadoop 2.5.0-cdh5.3.2
Cloudera Impala Guide
Apache Hive
Sqoop Documentation (v1.4.5-cdh5.3.2)
Spark Overview - Spark 1.2.1 Documentation
Apache Crunch - Apache Crunch
Apache Pig
Kite: A Data API for Hadoop
Apache Avro 1.7.7 Documentation
Apache Parquet
Cloudera HUE
Apache Oozie
Apache Sqoop documentation
Apache Flume 1.5.0 documentation
DataFu 1.1.0
JDK 7 API Docs

Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality is disabled. You may not use notes or other exam aids.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...