What is H2O and Why use it?
H2O is the world’s leading open source deep learning platform. H2O is used by over 80k data scientist and more than 9k organization around the world.A number of well companies are using H2O for there big data processing.H2O is a in-memory,distributed,fast and scalable machine learning and predictive analytic platform.It’s provide easy productionlization model in an enterprise environment.
The primary things H2O brings are ease of use and efficient scalability to data set too large to fit in the memory of our largest machine.H2O algorithms are the fewer in number but apparently significantly quicker.As a bonus, the intelligent defaults mean your code is very compact and clear to read: we can get a well-tunned, state-of-the-art, deep learning model as a one linear.Lets see how can we tune our model.
I’m assuming you know python and don’t worry no advance feature will be used in this tutorial, so competence in any programming language should be enough to follow along.Python users would benefits form being familiar with pandas,not least because it’ll make all your data science easier.
Installation and Quick Start
H2O is very easy to install and use.We are using H2O with python so first you have to install python in the system, then I’ll show you how to install it using pip.
H2O works equally well with python 2.7 and python 3.5.On windows,See using python on windows,Remember to choose 64 bit install (Unless you stuck with a 32-bit version of windows, of course).
You need java installed, which you can get at the JAVA download page.Choose the JDK.If you have the Java JDK, but you are not sure, you could just go ahead and install H2O, and come back and (re-)install Java if you are told there is a problem.
Install H2O with Python (pip)
From the command line type pip install -U h2o. That’s it. Easy-peasy, lemon-squeezy. You see something like Figure 2-1 after it’s complete.
The -U just says to also upgrade any dependencies.May be you need admin rights.
To test it, Start Python, type import h2o, and if that does not complain,follow it with h2o.init(). Some information will scroll past, ending with a nice table showing, among other things, the number of nodes, total memory, and total cores available, something like Figure 2-2.(If you ever need to report a bug, make sure to include all the information from that table.)
By default, H2O instance will be allowed to use all of cores, and 25% of the system memory.This is fine but, For the sake of argument, what if you don’t want this and give it some exact amount of the memory and cores for example you want to give 2 GB or 4GB of your system memory and 2 or 4 cores of your 8 core?
So, to do this first step stop the H2O with h2o.shutdown(); (Or best practice to use ‘h20.cluster().shutdown()’), than type h2o.init(nthreads=2,max_mem_size=4).The following excerpt from the information table confirm that it worked:
Connecting to H2O server at //127.0.0.1:54321… successful.
H2O cluster uptime: 02 secs
H2O cluster version: 18.104.22.168
H2O cluster version age: 10 days
H2O cluster name: H2O_from_python_Mancave_cd839j
H2O cluster total nodes: 1
H2O cluster free memory: 3.556 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 2
H2O cluster status: accepting new members, healthy
H2O connection url: //127.0.0.1:54321
H2O connection proxy:
H2O internal security: False
Python version: 3.5.2 final