In this post, we will cover relational operators in Pig Latin. Relational operators are very important in defining Pig Latin’s nature as a data processing language. UNION AND SPLIT Union combines multiple relations and Split partitions a relation into multiple ones (Ref. Hadoop in Action – Chuck Lam). Say we have two CSV files. The […]
For this part, we will use another CSV file from BSE. We will use the CSV file that lists the daily price details of Hindustan Unilever (HUL) company for the year 2012. For simplicity, I have removed some columns and kept only those most useful for our exercise. File uploaded at - hul_2012 As previously noted, […]
In this post, we see how to run Pig and related topics. Pig Data Types Pig can work with structured, semi-structured, un-structured data. It does not need specific schema or metadata for the data. But can use it if present. Pig supports Simple Data Type Scalars – int (32-bit), float (32-bit), long (64-bit), double (64-bit) […]
For next few posts, I want to focus on Pig. What is Pig? Pig is basically 3 things comprised together. Platform to analyse large data sets Language that enables programming to process these large data sets Infrastructure to evaluate these programs The Pig compiler produces parallelisation of the programs in form of Map Reduce jobs. […]
I wanted to get back to my Hadoop setup and refresh my acquaintenance with Pig. But I started getting issues with the setup. The namenode would not start. I was always getting asked password when I wanted to start hadoop services. Even the formatting of namenode would not work. So clearly something was wrong with […]
In this post, I will walk you through the installation and getting started part of HBase. This is NOT a step-by-step installation guide, but more like an incremental and exploratory approach. My configuration Host – Mac OS X 10.8.1 Mountain Lion, 4 GB RAM, 2.53 GHz Intel Core 2 Duo VM – Ubuntu 12.04, 2 […]
My setup Host machine – OS X Mountain Lion, 4 GB RAM, 2.53 GHz Intel Core 2 Duo VM – Virtual Box image Ubuntu 12.04, 64 bit, RAM 1 GB, Java – Open JDK 1.6 (Later updated to 1.7 as explained in the post). Reference – Cassandra Home Page and Getting Started Page Cassandra is […]
In the last post, we saw how to download and install TestNG, how to quickly write and run and view result of a TestNG test case. This is all good stuff. But it is suitable only for a set of few small tests. In your application, soon you will have multiple tests, organised into different files, possible […]
In this post, I will walk you through a simple API test automation example. Let’s assume that we are writing a Java class to simulate the operation of a car. Here is the code for the class. package example; public class Car { private int speed; private boolean stopped; public Car() { this.setSpeed(0); this.setStopped(true); } […]
In this and coming posts, I will explain how to use TestNG for your Java API Test Automation. First, notice that there are 4 key phrases in the first sentence – TestNG, Java, API, Test Automation. Let’s go through them one by one. 1. TestNG – TestNG is an automation testing framework. It is created […]