Sunday, March 29, 2015

Cloudera 3.5.2 Problems starting Job Tracker

Cloudera 5.3.2 Problems starting Task Tracker

Following the the instructions provided at: http://edpflager.com/?p=1945 installed tried to setup a Single Node Hadoop Machine.  Task tracker would not start while trying to run the command:
for x in `cd /etc/init.d ; ls hadoop-0.20-mapreduce-*` ; do sudo service $x start ; done
Search on internet did not provide a working solution.  While searching for the log files, stumbled across the location of the log files: http://localhost:50030/logs/, and the location of the log file for task tracker: http://localhost:50030/logs/hadoop-hadoop-tasktracker-localhost.localdomain.log.  The log files provided clear details why the task tracker did not start.  The detailed messages were:
2015-03-29 07:40:30,293 WARN org.apache.hadoop.mapred.TaskTracker: 
TaskTracker local dir /var/lib/hadoop-hdfs/cache/mapred/mapred/local 
error File /var/lib/hadoop-hdfs/cache/mapred/mapred/local does not exist, 
removing from local dirs
2015-03-29 07:40:30,294 ERROR org.apache.hadoop.mapred.TaskTracker: 
Can not start task tracker because 
org.apache.hadoop.util.DiskChecker$DiskErrorException: No mapred local directories are writable
 at org.apache.hadoop.mapred.TaskTracker$LocalStorage.checkDirs(TaskTracker.java:284)
 at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1770)
 at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:4124)

From the log statements it is clear that the start scripts to start the map reduce jobs did not create the folder /var/lib/hadoo-hdfs/cache/mapred/mapred/local.  Physically created the directory, and changed ownership to hdfs, group membership to hdfs.  Tried to restart the tasktracker without success, and following are the log statments:
2015-03-29 12:55:52,241 INFO org.apache.hadoop.mapred.TaskTracker: 
registered UNIX signal handlers for [TERM, HUP, INT]
2015-03-29 12:55:52,783 INFO org.mortbay.log: Logging to org.slf4j.impl.
Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-03-29 12:55:52,836 INFO org.apache.hadoop.http.HttpServer: 
Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2015-03-29 12:55:52,839 INFO org.apache.hadoop.http.HttpServer: 
Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context task
2015-03-29 12:55:52,839 INFO org.apache.hadoop.http.HttpServer: 
Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2015-03-29 12:55:52,839 INFO org.apache.hadoop.http.HttpServer: 
Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2015-03-29 12:55:53,031 WARN org.apache.hadoop.mapred.TaskTracker: 
TaskTracker local dir /var/lib/hadoop-hdfs/cache/mapred/mapred/local 
error Operation not permitted, removing from local dirs
2015-03-29 12:55:53,032 ERROR org.apache.hadoop.mapred.TaskTracker: 
Can not start task tracker because org.apache.hadoop.util.DiskChecker$DiskErrorException: 
No mapred local directories are writable
 at org.apache.hadoop.mapred.TaskTracker$LocalStorage.checkDirs(TaskTracker.java:284)
 at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1770)
 at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:4124)
Then removed the directory /var/lib/hadoop-hdfs/cache/mapred/mapred.   Restarting task tracker worked, and the log statements show as:
2015-03-29 12:58:34,898 INFO org.apache.hadoop.mapred.TaskTracker: registered UNIX signal handlers for [TERM, HUP, INT]
2015-03-29 12:58:35,501 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-03-29 12:58:35,551 INFO org.apache.hadoop.http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2015-03-29 12:58:35,554 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context task
2015-03-29 12:58:35,554 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2015-03-29 12:58:35,554 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2015-03-29 12:58:35,798 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2015-03-29 12:58:35,803 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as mapred
2015-03-29 12:58:35,803 INFO org.apache.hadoop.conf.Configuration.deprecation: slave.host.name is deprecated. Instead, use dfs.datanode.hostname
2015-03-29 12:58:35,805 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /var/lib/hadoop-hdfs/cache/mapred/mapred/local
2015-03-29 12:58:35,820 INFO org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-03-29 12:58:35,821 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
2015-03-29 12:58:35,848 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-03-29 12:58:35,886 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 49740
2015-03-29 12:58:35,931 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-03-29 12:58:35,932 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 49740: starting
2015-03-29 12:58:35,933 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:49740
2015-03-29 12:58:35,933 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_localhost:localhost/127.0.0.1:49740
2015-03-29 12:58:35,961 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_localhost:localhost/127.0.0.1:49740
2015-03-29 12:58:35,967 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2015-03-29 12:58:35,977 INFO org.apache.hadoop.mapred.TaskTracker:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4d8088da
2015-03-29 12:58:35,979 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1 and reserved physical memory is not configured. TaskMemoryManager is disabled.
2015-03-29 12:58:35,980 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2015-03-29 12:58:35,987 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060

Once the task tracker started successfully, the first word count map reduce job ran successfully.
 

Sunday, March 1, 2015

Rest Web Services, Jersey 2.x, and differences from 1.x

While trying to help one of my nephews develop REST services using Jersey 2.x, running on Tomcat 7.0.59, have come across issues with the changes in the package names.  These package change names, coupled with a bug in MOXyJsonProvider as detailed at: http://stackoverflow.com/questions/19114043/jax-ws-rs-using-jersey-returning-collection-map-etc made me waste few hours searching for the solution, and caused me scare supporting and maintaining an existing application which uses REST services implemented using Jersey 1.x implementation.  While many solutions were suggested in the web the two required changes which worked were:

1.  Replacing package names in the deployment descriptor from:
<servlet-name>ServletAdaptor</servlet-name>
<servlet-class>com.sun.jersey.spi.container.servlet.ServletContainer
</servlet-class>
<init-param>
      <param-name>com.sun.jersey.config.property.packages

</param-name>
      <param-value>...</param-value>

</init-param>  

with: 
<servlet-name>ServletAdaptor</servlet-name>
<servlet-class>org.glassfish.jersey.servlet.ServletContainer

</servlet-class>
 

<init-param>
      <param-name>jersey.config.server.provider.packages</param-name>
      <param-value>...</param-value>

</init-param>

2. Disabling MOXyJson provider with JascksonJson provider:
<init-param>
<param-name>jersey.config.disableMoxyJson.server</param-name>
<param-value>true</param-value>
</init-param>

<init-param>
<param-name>jersey.config.server.provider.packages</param-name>
<param-value>org.codehaus.jackson.jaxrs;...</param-value>
</init-param>


After making the above changes, deploying the REST services in Tomcat 7.0.59 worked as expected.