Tuesday 5 January 2016

How to prepare "CCA Spark and Hadoop Developer Exam (CCA175)"



                       



Hi .. I cleared the CCA Spark and Hadoop Developer Exam (CCA175) on 4th Jan 2016.
I cleared the certification with 9 out of 10 questions , 1 with wrong output as per their expectation.
I have written the exam on 2nd Jan 2016, they will take 3-5 days to validate the exam.
I will share my experience with you & the pattern, you can follow it.
1. Every question in hands-on, no objective
2. Every problem we need write the code & execute on their cdh5 cluster must.
3. Result must & should be their expected form, if not 0 score
examples:
  • expected output x,y    =>   you got output x, y   or   x  ,   y 
    • they will not consider as a correct answer also.
    • such a minor points also you have to remember.
  • In code also very carefully we need to write it. other wise they will make it as a wrong answer only.
    • simple example for that code need to written in single line in spark scala or python code but readability purpose you might return in multiple lines then the answer will treat as a wrong
    • val wordcount = file.flatMap(line => line.split(" ").map( word => (word, 1))
    • We can above statement like this also.
      • val words = file.flatMap(line => line.split(" ")
      • val wordcount = words.map( word => (word, 1))
4. Please read the questions clearly .. because
  • some questions they will give partial solution file in specific location on cdh5 vm , we need to give the correct solution on that file only
  • some questions we need to read data from hdfs
  • some questions we need to read data from local file system
  • some questions we need to read data from RDBMS
  • some questions we need to read data from HDFS in different file formats also using spark with scala or python or hive or sqoop
    • Like avro, parquet, json
  • few questions they given partial information only.
    • I got 1 question like that.
    • I almost waste my time 10 mins there only. finally solved it by deeper observation in data.
5. I faced some issues in writing exam
  • font is very low, difficult to read the questions & giving the answers also
  • cluster is very slow, don't do multiple tasks it hangs
    • same issue i faced, almost it is taken for me 2-3 mins to come up again. like 2 -3 times happen also.
    • it may kill your time around 5-10 mins also
  • ***Remember locations of normal cdh5 software's and this exam clustercdh5 will be differ as i observed.
    • Because of this 1 Question is very difficult to answer from my side. taken more time to solve.
    • I used my previous experience on Big Data knowledge to solve that problem. Other wise i may lose the 1 Question answer.
6. Hope you got an idea. What the steps to observe in exam.
7. I am happy to share my experience with this Exam.
8. If any one looking Spark & Scala Training in Hyderabad, you can reach me also.
9. Follow me on below links.
10. check my youtube videos also, going with my channel is "kalyan hadoop"


11. Thanks for your time.

CCA Spark and Hadoop Developer Exam (CCA175) Details


Exam Question Format

Number of Questions: 10–12 performance-based (hands-on) tasks on CHD5 cluster. See below for full cluster configureation
Time Limit: 120 minutes
Passing Score: 70%
Language: English, Japanese (forthcoming)
Price: USD $295


Audience and Prerequisites

There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam.

Register for CCA175>>


Required Skills

Data Ingest
The skills to transfer data between external systems and your cluster. This includes the following:
  • Import data from a MySQL database into HDFS using Sqoop
  • Export data to a MySQL database from HDFS using Sqoop
  • Change the delimiter and file format of data during import using Sqoop
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Load data into and out of HDFS using the Hadoop File System (FS) commands

Transform, Stage, Store
Convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS. This includes writing Spark applications in both Scala and Python:
  • Load data from HDFS and storing results back to HDFS using Spark
  • Join disparate datasets together using Spark
  • Calculate aggregate statistics (e.g., average or sum) using Spark
  • Filter data into a smaller dataset using Spark
  • Write a query that produces ranked or sorted data using Spark

Data Analysis
Use DDL (Data Definition Language) in order to create tables in the Hive metastore for use by Hive and Impala.
  • Read and/or create a table in the Hive metastore in a given schema
  • Extract an Avro schema from a set of datafiles using avro-tools
  • Create a table in the Hive metastore using the Avro file format and an external schema file
  • Improve query performance by creating partitioned tables in the Hive metastore
  • Evolve an Avro schema by changing JSON files

Exam delivery and cluster information

CCA175 is a remote-proctored exam available anywhere, anytime. See the FAQ for more information and system requirements.

CCA175 is a hands-on, practical exam using Cloudera technologies. Each user is given their own CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.

Documentation Available online during the exam

Cloudera Product Documentation
Hadoop - Apache Hadoop 2.5.0-cdh5.3.2
Cloudera Impala Guide
Apache Hive
Sqoop Documentation (v1.4.5-cdh5.3.2)
Spark Overview - Spark 1.2.1 Documentation
Apache Crunch - Apache Crunch
Apache Pig
Kite: A Data API for Hadoop
Apache Avro 1.7.7 Documentation
Apache Parquet
Cloudera HUE
Apache Oozie
Apache Sqoop documentation
Apache Flume 1.5.0 documentation
DataFu 1.1.0
JDK 7 API Docs

Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality is disabled. You may not use notes or other exam aids.
Related Posts Plugin for WordPress, Blogger...