Sparkling Water combines two open source technologies: Apache Spark and H2O - a machine learning engine. It makes H2O’s library of Advanced Algorithms including Deep Learning, GLM, GBM, KMeans, PCA, and Random Forest accessible from Spark workflows. Spark users are provided with the options to select the best features from either platforms to meet their Machine Learning needs. Users can combine Sparks’ RDD API and Spark MLLib with H2O’s machine learning algorithms, or use H2O independent of Spark in the model building process and post-process the results in Spark.
Good work guys.
The build system is really frustrating. I have a habit of installing temporary third party libraries in non-standard paths something like ~//temp_software/. But the CMake build system is not recognizing the path even after I have provide the path explicitly on command line. Further, I had to mention paths for static, dynamic libraries and include files of those standard paths as well, like glog, gflag, etc… Had it been typical autoconf & automake it would have worked better. Can I contribute on this front?
Finally downloaded the VM to taste the system. Appreciate your work on Kudu.
Disclosure: I work on similar database development for a CDN company.