Monday, January 16, 2017

HBase 1.2.4 and Phoenix 4.9.2 Spark integration

While the link: https://blogs.apache.org/phoenix/entry/spark_integration_in_apache_phoenix provides good example of Spark integration with Apache Phoenix, library dependencies and incompatible library versions made it a challenge.  Tried to integrate Apache HBase 1.2.4, and Apache Phoenix 4.8.2, and 4.9.2.  Setting up HBase in distributed mode was a breeze.  Adding Phoenix libraries to HBase, and testing JDBC Driver with SQurrelSQL was easy.  Running the example by Josh Mahonin in the above link required some work.  IntelliJ returned the following compilation error messages while compiling:

Error:
scalac: missing or invalid dependency detected while loading class file
'ProductRDDFunctions.class'. 
Could not access type Logging in package org.apache.spark, 
because it (or its dependencies) are missing. 
Check your build definition for missing or conflicting dependencies. 
(Re-run with `-Ylog-classpath` to see the problematic classpath. 
A full rebuild may help if 'ProductRDDFunctions.class' was compiled against 
an incompatible version of org.apache.spark.

After a little bit of research on google found the following fix:
https://github.com/kalyanhadooptraining/phoenix/commit/98cf1b408358c0f9687b1aadf91ede64fdc0a05d by Kalyanhadooptraining

Applied the fixes as above, replaced Spark version from 1.6.1 to 2.1.0 in pom.xml, rebuilt Phoenix libraries.  After copying the newly build Phoenix libraries to HBase and restart, Spark integration with Apache Phoenix as described by Josh Mahonin worked as expected.  Was able to verify the results using SQuirrelSQL client.