Posts

Showing posts from November, 2013

Simple log analysis with Apache Spark

Image
In this post we will try to learn to run some simple commands with Spark, some simple transformations and actions. We will do some simple log analysis using Spark. I will be using the local Spark cluster that i setup on my laptop. if you haven't read my first post on how to setup an Spark cluster on your local machine i recommend you read the post  How to set up a Apache Spark cluster in your local machine . First we will connect the the cluster with the following command. MASTER= spark://pulasthi-laptop:7077 ./spark-shell "spark://pulasthi-laptop:7077" is the URL of the master that can be found in the Spark web ui. After connecting successfully you should be able to see an Scala console where you can execute commands. Also you should be able to see your application listed in the web-ui under running applications. To run this scenarios i am using a set of log files generated from various WSO2 products as sample data. Any set of log files or even just a set of

How to set up a Apache Spark cluster in your local machine

Image
The past few days i grew some interest in Apache Spark and thought of playing around with it a little bit. If you haven't heard about it go an take a look its a pretty cool project it claims to be around 40x faster than Hadoop in some situation. The incredible increase in performance is gained by leveraging in-memory computing technologies. I want go into details about Apache Spark here if you want to get a better look at Spark just check out there web site -  Apache Spark . In this post we will be going through the steps to setup an Apache Spark cluster on your local machine. we will setup one master node and two worker nodes. If you are completely new to Spark i recommend you to go through  First Steps with Spark - Screencast #1  it will get you started with spark and tell you how to install Scala and other stuff you need. We will be using the  launch scripts that are provided by Spark to make our lives more easier. First of all there are a couple of configurations we need t