After having the data, next step is to have a Hadoop cluster to "upload" the data and run the processes.
This cluster is for the poor and lazy because it is group of virtualized machines running Hadoop.
- There is no need to buy and build several machines
- All inside your own machine
- Running in minutes
- Slower than a real cluster
- Just for testing purposes
Building virtual machines with Vagrant
As its site says: "Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team."
tl;dr: Vagrant creates virtual machines using configuration files. It is possible to start, stop, create, destroy virtual machines easily.
And here is the magic recipe:
- Install VirtualBox for the virtual machines.
- Install Vagrant for create, start and stop.
Prepare the configuration files:
vagrant box add precise64 http://files.vagrantup.com/precise64.box
Start the machines!
vagrant up --provider=virtualbox