Big Data Hadoop Beginner Quiz

Big Data Quiz

Big Data Quiz : This Big Data Beginner Hadoop Quiz contains set of 60 Big Data Quiz which will help to clear any exam which is designed for Beginner.



1) Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.

  1. TRUE
  2. FALSE

Answer : A

 
 
2) Default bock size in HDFS is____________

  1. 128 KB
  2. 64 MB
  3. 64 KB
  4. 128MB

Answer : B

 
 
3) Which of the following statement is/are TRUE regarding Hadoop
i)Performs best with a ‘modest’ number of large files
ii)Performs best with a large number of small files

  1. i)
  2. ii)
  3. Both i) & ii)
  4. none of the above

Answer : A

 
 
4) By defalut each block is replicated  _______times

  1. 1
  2. 2
  3. 3
  4. 4

Answer : C

 
 
5) Large block size makes transfer time more effective?

  1. TRUE
  2. FALSE

Answer : A

 
 
6) Which of the following is NOT a demon process?

  1. secondarynamenode
  2. jobtracker
  3. tasktracker
  4. mapreducer

Answer : D

 
 
7) SPOF (single point of failure) , can be handled by using _________

  1. secondarynamenode
  2. backupserver
  3. jobtracker
  4. passive nodes

Answer : D

 
 
8) Clients access the blocks directly from ________for read and write

  1. data nodes
  2. name node
  3. secondarynamenode
  4. none of the above

Answer : A

 
 
9) Information about locations of the blocks of a file is stored at __________
 

  1. data nodes
  2. name node
  3. secondarynamenode
  4. none of the above

Answer : B

 
 
10) What makes data into Big Data?

  1. volume
  2. velocity
  3. variety
  4. all of the above

Answer : D




 
11) Which of the following statement(s) are TRUE ?
i) Hadoop is comprised of five separate daemons.
ii) Each daemon runs in its own Java Virtual Machine (JVM).

  1. Only ii)
  2. Only i)
  3. Only i) & ii)
  4. All i), ii) & iii)

Answer : C

 
 
12) Hadoop is ‘Rack-aware’  and HDFS replicates data blocks on nodes on different racks
 

  1. TRUE
  2. FALSE

Answer : A

 
 
13) Which node stores the checksum?

  1. datanode
  2. secondarynamenode
  3. namenode
  4. all of the above

Answer : A

 
 
14) MapReduce programming model is  _____________

  1. Platform Dependent but not language-specific
  2. Neither platform- nor language-specific
  3. Platform independent but language-specific
  4. Platform Dependent and language-specific

Answer : B

 
 
15) Which is optional in map reduce program?

  1. Mapper
  2. Reducer
  3. both are optional
  4. both are mandatory

Answer : B

 
 
16) TaskTracker  reside on  _________ and run ________ task.

  1. datanode, map/reduce
  2. datanode,reducer
  3. datanode,mapper
  4. namenode, map/reduce

Answer : A

 
 
17) The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?

  1. file system storage
  2. network transmissions
  3. data retrieval
  4. all of the above

Answer : B

 
 
18) What is the default input format?

  1. sequencefileformat
  2. BinaryFileFormat
  3. TextInputFormat
  4. none of the above

Answer : C

 
 
19) Which is TRUE about HIVE?

  1. No support for update and delete
  2. No support for singleton inserts
  3. Correlated sub queries are not supported
  4. all of the above

Answer : D

 
 
20) Sqoop is a tool which can be used to
 

  1. Imports tables from an RDBMS into HDFS
  2. Exports files from HDFS into RDBMS tables
  3. Uses a JDBC interface
  4. all of the above

Answer : D




 
21) Which tool can be used to transfer data from Microsoft SQL Server databases to Hadoop or HIVE.

  1. HBASE
  2. PIG
  3. SQOOP
  4. Flume

Answer : C

 
 
22) _________ is a distributed, reliable, available service for efficiently moving large amounts of data as it is produced

  1. FLUME
  2. SQOOP
  3. PIG
  4. HIVE

Answer : A

 
 
23) ________ is a workflow engine , runs on a server typically outside the cluster
 

  1. Oozie
  2. Zookeeper
  3. Chukwa
  4. Mahout

Answer : A

 
 
24) To custom OutputFormats must provide a __________ implementation

  1. InputWriter
  2. RecordWriter
  3. OutWriter
  4. WritableComparable

Answer : B

 
 
25) Combiner is

  1. Like a ‘mini-Reducer’
  2. Runs locally on a single Mapper’s output
  3. Output from the Combiner is sent to the Reducers
  4. all of the above

Answer : D

 
 
26) For Better Load Balancing and to avoid potential performance issues

  1. custom Partitioner
  2. custom combiner
  3. custom reducer
  4. user more reducer

Answer : A

 
 
27) Anything written using the OutputCollector.collect method will be written to __________

  1. Local file system
  2. HDFS
  3. Windows file systems only
  4. none of the above

Answer : B

 
 
28) Which component of the HIVE architecture submits the individual map-reduce jobs from the DAG to the Execution Engine

  1. compiler
  2. optimizer
  3. driver
  4. none of the above

Answer : C

 
 
29) Which HIVE command will load data from an HDFS file/directory to the table?

  1. LOAD DATA INPATH ‘/user/myname/AB.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
  2. LOAD DATA LOCAL INPATH ‘/user/myname/AB.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
  3. Both statements are correct
  4. none of the above

Answer : A

 
 
30) Which HIVE command will display tables created by user?

  1. show table;
  2. select * from tab;
  3. show tables;
  4. none of the above

Answer : C




 
31) Which HIVE file format is not splitable after compression?

  1. RCFILE
  2. SEQUENCEFILE
  3. TEXTFILE
  4. all of the above

Answer : C

 
 
32) HIVE command : LOAD DATA INPATH ‘/user/myname/log.txt’ INTO TABLE mylog;

  1. Load the data from local file ‘/user/myname/log.txt’ to table mylog
  2. Load the data from HDFS file ‘/user/myname/log.txt’ to table mylog
  3. Overwrite the data from local file ‘/user/myname/log.txt’ to table mylog
  4. none of the above

Answer : B

 
 
33) HIVE command:  LOAD DATA LOCAL INPATH ‘/examples/files/ab1.txt’ OVERWRITE INTO TABLE sample

  1. Load the data from local file ‘/examples/files/ab1.txt’ to table sample
  2. Load the data from HDFS file ‘/examples/files/ab1.txt’ to table sample
  3. Overwrites the data from local file ‘/examples/files/ab1.txt’ to table sample
  4. all of the above

Answer : C

 
 
34) join operation is performed at  ___________

  1. mapper
  2. reducer
  3. shuffle and sort
  4. none of the above

Answer : B

 
 
35) When is the earliest that the reducer() method of any reduce task in a given job is called?

  1. immediately after all map tasks have completed
  2. As soon as a map task emits at least one record
  3. As soon as at least one map task has finished processing its complete input split
  4. none of the above

Answer : A

 
 
36) You have built a MapReduce job that denormalizes a very large table, resulting in an extremely large amount of output data. Which two cluster resources will your job stress the most?

  1. RAM , Network
  2. Network , Disk Input output
  3. CPU, RAM
  4. all of the above

Answer : B

 
 
37) You have 10 files in the directory /user/amit/example. Each file is 640MB. You submit a MapReduce job with /user/foo/example as the input path.

  1. all files in the directory
  2. A single file
  3. A single input split
  4. none of the above

Answer : C

 
 
38) ___________ Ensure no (key, value) pair is processed more than once
 

  1. InputSplit
  2. RecordReader
  3. mapper
  4. reducer

Answer : B

 
 
39) ___________ reads the record and passes it to the mapper

  1. RecordReader
  2. reducer
  3. InputSplit
  4. none of the above

Answer : A

 
 
40) Which of the following is the correct sequence of operations for a MR job?

  1. RecordReader,shuffle and sort,mapper,reducer,InputSplit
  2. InputSplit,RecordReader,mapper,shuffle and sort, reducer
  3. InputSplit,RecordReader,reducer,shuffle and sort, mapper
  4. none of the above

Answer : B




 
41) Which deamon distributes the individual task to data nodes?

  1. tasktracker
  2. jobtracker
  3. namenode
  4. datanode

Answer : B

 
 
42) __________object allows the mapper to interact with the rest of the Hadoop system

  1. Context object
  2. InputSplit
  3. Recordreader
  4. Shuffle and Sort

Answer : A

 
 
43) How many instances of JobTracker can run on a Hadoop Cluster?

  1. only one
  2. maximum two
  3. any number but should not be more than number of datanodes
  4. none of the above

Answer : A

 
 
44) How many instances of Tasktracker run on a Hadoop cluster?

  1. unlimited TaskTracker on each datanode
  2. one TaskTracker for each datanode
  3. maximum 2 Tasktarcker for each datanode
  4. none of the above

Answer : B

 
 
45) Which PIG LATIN statement is used for per record transformation of data(projection)?

  1. JOIN
  2. FOREACH – GENERATE
  3. FATTEN
  4. FILTER

Answer : B

 
 
46) Which PIG statement is used to remove nesting?

  1. JOIN
  2. FOREACH – GENERATE
  3. FILTER
  4. none of the above

Answer : C

 
 
47) Consider a relation that has a tuple of the form (a,(b,c))  .  What is the output, If we apply statement   GENERATE $0,FLATTEN($1)

  1. (a,b,c)
  2. (a,b) and (a,c)
  3. invalid operation
  4. none of the above

Answer : A

 
 
48) Command to invoke grunt to use local file system

  1. pig
  2. pig -x local
  3. pig local
  4. all of the above

Answer : B

 
 
49) ______is currently a better choice for low-latency access.

  1. HBase
  2. HIVE
  3. PIG
  4. all of the above

Answer : A

 
 
50) Port number to find namenode and dfshealth information in the browser is________

  1. 50070
  2. 50060
  3. 50030
  4. none of the above

Answer : A




 
51) To look for jobtracker in the browser  use ________  in the browser

  1. http://localhost:50070/
  2. http://localhost:50060/
  3. http://localhost:50030/
  4. none of the above

Answer : C

 
 
52) To look for tasktracker in the browser  use ________  in the browser

  1. http://localhost:50070/
  2. http://localhost:50060/
  3. http://localhost:50030/
  4. none of the above

Answer : B

 
 
53) Which Deamon processes must run on namenode

  1. tasktracker and jobtracker
  2. namenode and jobtracker
  3. namenode and secondarynamenode
  4. none of the above

Answer : B

 
 
54) Which Deamon processes must run on datanode

  1. tasktracker and datanode
  2. namenode and jobtracker
  3. datanode and secondarynamenode
  4. tasktracker and jobtracker

Answer : A

 
 
55) Which Deamon process must run on secondarynamenode

  1. tasktracker
  2. namenode
  3. secondarynamenode
  4. datanode

Answer : C

 
 
56) Hadoop was named after the toy elephant of Doug Cutting’s son.
 

  1. TRUE
  2. FALSE

Answer : A

 
 
57) Which of the  following accurately describe Hadoop?

  1. distributed computing approch
  2. open source
  3. java based
  4. all of the above

Answer : D

 
 
58) We can update rows and delete rows of a table in HIVE?

  1. TRUE
  2. FALSE

Answer : B

 
 
59) HIVE is NOT designed for

  1. OLTP
  2. low latency applications
  3. user facing/interactive applications
  4. all of the above

Answer : D

 
 
60) In HDFS HIVE these will create a directory

  1. table
  2. partition
  3. bucket
  4. all of the above

Answer : D