Wednesday, March 8, 2017

Jasper Reports Excel Issues-Fixed

Have been using Jasper Reports for generating reports in excel formats.  Had to merge several excel files, and was using the POI libraries.  The requirement was to retain cell style.  Opening the resulting excel spreadsheet was always throwing the following error:

Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.
Repaired Records: Format from /xl/styles.xml part (Styles)


The error stopped when upgraded the POI libraries to 3.15 version.




Thursday, February 23, 2017

Sqoop1.99.7 Installation


Expecting that Sqoop would work out of box downloaded sqoop-1.99.7-bin-hadoop200.tar.gz and followed the installation instructions from sqoop 1.99.1 and others as they were the first search results and the installation instructions for various versions were available by just changing the version number.  Documentation for installation instructions were available upto version 1.99.6 at url https://sqoop.apache.org/docs/1.99.6/Installation.html.  But documentation to 1.99.7 could not be found based on similar url.  Following the available documentation encountered hadoop configuration ClassNotFoundExceptions when tried to start sqoop server.

Documentation upto 1.99.6 refers to a non-existing catalina.properties.  Web search lead to several non-working solutions.  Some of the working solutions suggested to copy required hadoop libraries to to sqoop lib directory.   Finally stumbled across the documentation for 1.99.7 installation instructions at htps://sqoop.apache.org/docs/1.99.7/admin/Installation.html.  The documentation is very clear that environment variables related to HADOOP have to be set.  Based on these instructions added HADOOP_HOME to the user profile, and encountered the following error:

Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/

Based on the solution presented at: http://brianoneill.blogspot.com/2014/10/sqoop-1993-w-hadoop-2-installation.html, modified the property org.apache.sqoop.submission.engine.mapreduce.configuration.diretory in <SQOOP_HOME>/conf/sqoop.properties file to point to the correct hdfs configuration.

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=<location to hadoop configuration, e.g /etc/hadoop/conf>

Finally <SQOOP_HOME>/sqoop.sh server start resulted in a successful start of sqoop sever with the following output:

etting conf dir: bin/../conf
Sqoop home directory: <sqoop home>/sqoop
Starting the Sqoop2 server...
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
7    [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

Sqoop2 server started.

Monday, January 16, 2017

HBase 1.2.4 and Phoenix 4.9.2 Spark integration

While the link: https://blogs.apache.org/phoenix/entry/spark_integration_in_apache_phoenix provides good example of Spark integration with Apache Phoenix, library dependencies and incompatible library versions made it a challenge.  Tried to integrate Apache HBase 1.2.4, and Apache Phoenix 4.8.2, and 4.9.2.  Setting up HBase in distributed mode was a breeze.  Adding Phoenix libraries to HBase, and testing JDBC Driver with SQurrelSQL was easy.  Running the example by Josh Mahonin in the above link required some work.  IntelliJ returned the following compilation error messages while compiling:

Error:
scalac: missing or invalid dependency detected while loading class file
'ProductRDDFunctions.class'. 
Could not access type Logging in package org.apache.spark, 
because it (or its dependencies) are missing. 
Check your build definition for missing or conflicting dependencies. 
(Re-run with `-Ylog-classpath` to see the problematic classpath. 
A full rebuild may help if 'ProductRDDFunctions.class' was compiled against 
an incompatible version of org.apache.spark.

After a little bit of research on google found the following fix:
https://github.com/kalyanhadooptraining/phoenix/commit/98cf1b408358c0f9687b1aadf91ede64fdc0a05d by Kalyanhadooptraining

Applied the fixes as above, replaced Spark version from 1.6.1 to 2.1.0 in pom.xml, rebuilt Phoenix libraries.  After copying the newly build Phoenix libraries to HBase and restart, Spark integration with Apache Phoenix as described by Josh Mahonin worked as expected.  Was able to verify the results using SQuirrelSQL client.