- final keys
- final values
- intermediate keys
- intermediate values
- UNION DISTINCT, RANK
- OVER, RANK
- OVER, EXCEPT
- UNION DISTINCT, RANK
Q3. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?
- Add a partitioned shuffle to the Map job.
- Add a partitioned shuffle to the Reduce job.
- Break the Reduce job into multiple, chained Reduce jobs.
- Break the Reduce job into multiple, chained Map jobs.
Q4. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?
- encrypted HTTP
- unsigned HTTP
- compressed HTTP
- signed HTTP
- Java or Python
- SQL only
- SQL or Java
- Python or SQL
Q6. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?
- Reducer
- Combiner
- Mapper
- Counter
- SUCCEEDED; syslog
- SUCCEEDED; stdout
- DONE; syslog
- DONE; stdout
- public void reduce(Text key, Iterator values, Context context){…}
- public static void reduce(Text key, IntWritable[] values, Context context){…}
- public static void reduce(Text key, Iterator values, Context context){…}
- public void reduce(Text key, IntWritable[] values, Context context){…}
Q9. To get the total number of mapped input records in a map job task, you should review the value of which counter?
- FileInputFormatCounter
- FileSystemCounter
- JobCounter
- TaskCounter (NOT SURE)
- A, P
- C, A
- C, P
- C, A, P
- combine, map, and reduce
- shuffle, sort, and reduce
- reduce, sort, and combine
- map, sort, and combine
Q12. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the _ service, which is _.
- Oozie; open source
- Oozie; commercial software
- Zookeeper; commercial software
- Zookeeper; open source
- data
- name
- memory
- worker
- hot swappable
- cold swappable
- warm swappable
- non-swappable
- on disk of all workers
- on disk of the master node
- in memory of the master node
- in memory of all workers
- on the reducer nodes of the cluster
- on the data nodes of the cluster (NOT SURE)
- on the master node of the cluster
- on every node of the cluster
- distributed cache
- local cache
- partitioned cache
- cluster cache
Q18. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?
- cache inputs
- reducer inputs
- intermediate values
- map inputs
- spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --warehouse-dir user/hue/oozie/deployments/spark
- sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --warehouse-dir user/hue/oozie/deployments/sqoop
- sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop
- spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --password spark --warehouse-dir user/hue/oozie/deployments/spark
- compressed (NOT SURE)
- sorted
- not sorted
- encrypted
- JUnit
- XUnit
- MRUnit
- HadoopUnit
- hadoop-user
- super-user
- node-user
- admin-user
- can be configured to be shared
- is partially shared
- is shared
- is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)
- a static job() method
- a Job class and instance (NOT SURE)
- a job() method
- a static Job class
- S3A
- S3N
- S3
- the EMR S3
- schema on write
- no schema
- external schema
- schema on read
- read-write
- read-only
- write-only
- append-only
- hdfs or top
- http
- hdfs or http
- hdfs
- Hive
- Pig
- Impala
- Mahout
- a relational table
- an update to the input file
- a single, combined list
- a set of <key, value> pairs
- Override the default Partitioner.
- Skip bad records.
- Break up Mappers that do more than one task into multiple Mappers.
- Combine Mappers that do one task into large Mappers.