Thursday, December 26, 2013

Big Data: What is HDFS?

HDFS stand for Hadoop Distributed File System. A file system is a technique of organizing files on physical media such as on hard disk, CDs, DVDs, flash drives etc.

HDFS is an important component of Hadoop Architecture.

Just like windows uses NTFS, FAT16 and FAT32 file system and Linux uses Ext2, Ext3, Isofs, Sysfs, Procfs file systems to store and organize files, the architecture of Hadoop uses the HDFS file system to organize files.

So how does HDFS store data?

Unlike traditional file system HDFS do not store data in one hard disk. HDFS stores data into blocks.

Unlike traditional file system a block size in Hadoop is very large. In traditional file system (NTFS or Linux file system) a block size is usually 4KB. In HDFS a block size is minimum of 64MB. A block size can be configured for 128MB or 256MB or 512MB or even 1GB.

HDFS stores three copies of every file. Each of the copy is stored in a different node. A node is a single computer in the Hadoop network.

For example a file with 20MB size is stored in three nodes.

A file which is larger than the configured block (64MB or 128MB or as configured in HDFS) is broken down as per block size. For example if there is a file of 300MB and the block size configured in HDFS is 64MB than this file will be stored into 5 blocks.

64MB 64MB 64MB 64MB 44MB

HDFS file system will distribute the data into five blocks which will be scattered into different racks and nodes. HDFS will store three copies of each block.

To understand how HDFS stores and organize files we must understand the fundamental architecture of Hadoop. Hadoop is a Master-Slave architecture. In this type of structure there is one Master and multiple slaves.

A Master contains NamedNode and JobTracker. A Slave contains DataNode and TaskTracker.

The NameNode manages the cluster metadata and DataNodes.

  • NameNode keeps track of number of blocks assigned to each file.
  • NameNode keep tracks of all blocks residing in each of the DataNodes.
  • It also contains list of active nodes in each of the rack.
  • It actively monitors the number of replicas of a block.

The JobTracker is responsible to receive client applications requests and assigning them to TaskScheduler. It try to distribute the task as close to data as possible. It means that the JobScheduler see what task is needed and where the data is residing for the concerned task. It assigns the task to the nearest node.

In the Slave architecture each of the hard disk should have ext3 or ext4 file system on it. The TaskScheduler accepts the task from JobScheduler. Each of the TaskScheduler has a fix slot which indicated number of task it can accept. Once the task is successfully processed or if it fails, it notifies to the JobTracker.

A DataNode in Slave architecture stores the actual data and interact with NameNode. DataNode can directly interact to client application once its location is provided by the NameNode.

Here is one of the video from YouTube where Sameer Farooqui has explained the HDFS in detail


Please visit the Big Data page to read all articles on Big Data

Monday, December 23, 2013

Big Data: What is Hadoop?

To understand Hadoop let us start with a simple question. Who is responsible to compute and process data? Answer is Computer.

How efficiently computer can process the data? Answer is it depends on how much data need to be processed and what is the computer configuration (Speed, space, number of processors, RAM etc.).

If data grows we can add additional hardware (hard disk, processors, RAM) to increase the efficiency of computer to process and compute data.

What if data grows to such extent that it can no longer be processed by a Single Computer? Today data is growing at a very high speed. Every day millions of tweet are generated, every hours billions of transactions are recorded by Wal-Mart, face book posts, comments, likes, news, videos, audios, images are getting recorded every minute across the world. Do you think a Single computer no matter how powerful configuration it has can handle this super massive data? The answer is No.

So what is the solution?

To process this massive data a very old and simple theory works “Divide and Conquer”. In Computer world we can say “Divide and Compute”. Yes, processing this ever growing data is not possible with one computer or one super powerful server. To process and compute this data, this data needs to be divided into small blocks and this small block of data needs to be processed by multiple computers concurrently. The processed data needs to be consolidated and return as one output. All this needs to done in real time.

This is where Hadoop comes into picture. Hadoop is a not a single software or hardware. Hadoop is a platform consists of set of tools and Technologies. The technologies which are core to the Hadoop are Google MapReduce and HDFS (Hadoop File System).

Google MapReduce is the technology develops by Google which performs the task of dividing the tasks into small sub-task. It distributes the sub-task to multiple computers called nodes. When all the nodes are done with their task; the Google MapReduce consolidate the result of all sub-task and combined them into one output. This one output is return to the calling application.

Google MapReduce consists of two programs – Map and Reduce. Map is responsible for dividing the task into small pieces (sub-task) and to distribute them to multiple computers (nodes) for processing. Reduce program is responsible to collect the output processed by individual nodes and consolidate them into one.

HDFS i.e. Hadoop Distributed File System is responsible to manage the storage of huge massive data. It does it by dividing the huge massive data into small block of data. Data is broken into small parts such as block of 128MB, 256MB, 512MB, 1GB etc. The data when distributed to multiple computers or nodes are complete data which need to be processed. A node or computer does not need to request or make additional round trip for data request. When a data is given to the computer for processing it is complete data which is required by the node/computer.

Hadoop is an open source framework to process large data sets and it is managed under Apache License. In addition to MapReduce and HDFS there are other tools which come under Hadoop umbrella. Each of these tools provide distinct feature. For example Chukwa is a data collection system for managing large distributing system. Pig is a data flow language for parallel computation.

Download Hadoop

To learn the Hadoop you can download it from Apache website. We can start with a Single Node setup and move to Cluster setup

Hadoop Videos








Please visit the Big Data page to read all articles on Big Data

Wednesday, December 18, 2013

Bank of PMP Questions

Recently I completed my PMP Certification with PMI. Preparation for PMP exam is itself a project. Practicing PMP questions is a part of every PMP aspirant preparation. I did practice many questions. Here I am putting up a list of 59 sources where you can go and practice for PMP exam questions.

Please keep this in mind not matter how many question you practice you are not going to get one single question from your practice test during PMP exam.

PMP Questions according to Knowledge Area

Integration
Management
Scope
Management
Time
Management
Cost
Management
Quality
Management
Communication
Management
Risk
Management
Human Resource
Management
Procurement
Management
Stakeholder
Management
Professional
and
social responsibility

Free 50 numerical questions of PMP

Also, if you know a source which is not listed here, please leave a comment and I will add it here in the list; by doing so we will definetely help other PMP aspirant.

Sr.PMP QuestionsURL
1PMI Sample Quesitons for PMPhttp://www.pmi.org/Certification/Project-Management-Professional-PMP/~/media/PDF/Certifications/PMP%20Sample%20Questions.ashx#!
275 PMP Exam Questionshttp://www.oliverlehmann.com/pmp-self-test/75-free-questions.htm
3175 PMP Exam Questionshttp://www.oliverlehmann.com/contents/free- downloads/175_PMP_Sample_Questions.pdf
4200 Head First exam questionshttp://www.headfirstlabs.com/books/hfpmp/hfpmp_ch15.pdf
5175 WizIQ PMP Questionshttp://www.wiziq.com/tutorial/119015-PMP-175-Sample-Questions
61000s of PMP Questionshttp://www.oliverlehmann.com/pmp-self-test/75-free- questions.htm#providers_
71 PMP Question per dayhttp://www.testprepreview.com/pmp_practice.htm
83 PMP Exam simulatorshttp://free.pm-exam-simulator.com/
92 Full length PMP Exam Testhttp://www.examcentral.net/pmp/practice-exam
10200 Question at Tutorials Pointhttp://www.tutorialspoint.com/pmp- exams/pmp_sample_questions.htm
1175 PMP Exam Questions at PreparePMhttp://www.preparepm.com/mock1.html
12200 PMP Mock Questions at Tutorials Pointhttp://www.tutorialspoint.com/pmp- exams/pmp_mock_exams.htm
131440 PMP Questionhttp://www.scribd.com/doc/12878997/PMP-Exam-Question-Bank
1415 Questions on Integration Managementhttp://jyotidahiya.wordpress.com/2013/04/26/practise-questions-for-pmp-project-integration-management/
1510 Question on Cost Managementhttp://jyotidahiya.wordpress.com/2013/02/02/practise-questions-for-pmp-project-cost-management/
1612 Question on Scope Managementhttp://jyotidahiya.wordpress.com/2012/11/07/practice-questions-for-pmp-project-scope-management/
17200 PMP Question from Pankaj Sharmahttp://pankajsharmapmp.blogspot.com/search/label/Sample%20PMP %20Exam
18200 PMP Questions from certchamp.comhttp://www.certchamp.com/pmp-sample-questions.jsp
1925 PMP Questionshttp://www.tests.com/practice/Project-Management-Professional-Test- sample
2015 PMP Question with explanation - P Ihttp://getpmpcertified.blogspot.com/2011/07/some- nice-questions-part-1.html
2115 PMP Question with explanation - P IIhttp://getpmpcertified.blogspot.com/2011/07/some- nice-questions-part-3.html
2215 PMP Question with explanation - P IIIhttp://getpmpcertified.blogspot.com/2011/07/some- nice-questions-part-2.html
2325 Question on Risk Managementhttp://pmp-tutorial-free-sample- questions.blogspot.com/2008/09/chapter-3-project-risk-management.html
2425 Question on HR Managementhttp://pmp-tutorial-free-sample- questions.blogspot.com/2008/09/chapter-9project-hr-management.html
2525 Question on Cost Managementhttp://pmp-tutorial-free-sample- questions.blogspot.com/2008/09/chapter-8project-cost-management.html
2624 Question on Communication Managementhttp://pmp-tutorial-free- sample-questions.blogspot.com/2008/09/chapter-6project-communication.html
2755 Question on Professional Responsibilityhttp://pmp-tutorial-free- sample-questions.blogspot.com/2008/09/chapter-6project-communication.html
2873 Question on Project Integration Managementhttp://pmp- tutorial-free-sample-questions.blogspot.com/2008/09/chapter-5project-integration-management.html
2967 Question on Scope Managementhttp://pmp-tutorial-free-sample- questions.blogspot.com/2008/09/chapter-4project-scope-managemnt.html
3025 Questions on Procurement Managementhttp://pmp-tutorial-free- sample-questions.blogspot.com/2008/09/chapter-4project-scope-managemnt.html
3110 PMP Questionshttp://pmp-tutorial-free-sample- questions.blogspot.com/2008/09/sample-exam-questions-answers-8.html
32260 PMP Questionshttps://sites.google.com/site/pmpbank/pmpquestionbank
3320 Question on NPVhttp://www.maxigrade.com/CorpFin1/corpfin1samplequestions2.php
3420 Question on NPVhttp://highered.mcgraw- hill.com/sites/dl/free/0072439749/36504/ros69749_ch09.pdf
3520 Question on NPVhttps://www.google.com/url?q=http://finance.wharton.upenn.edu/~acmack/Chapter%25206%2520Questions %2520V4.doc&usd=2&usg=AFQjCNE9zknrhS_P2TLYcV3c0GJvo7DuYQ
3612 Question on PMPhttp://targetpmp.wordpress.com/category/practice-questions/
37110 PMP Qeustionshttp://www.free-pm-exam-questions.com/
3810 PMP Questionshttp://www.pmconnection.com/modules.php? name=News&file=article&sid=31
39206 PMP Questionhttp://206-free-pmp-exam-questions.blogspot.com/p/questions-1- 20.html
40431 PMP Questionshttp://passtheprojectexam.com/wp-content/Execution.swf
4130 PMP Questionshttp://www.mosaicprojects.com.au/ftp/Free_PMP_Questions.xls
42280 PMP Questionshttp://excellentemmett.wikispaces.com/file/view/PMP_Exam_Question_Bank_new.pdf/2 04929174/PMP_Exam_Question_Bank_new.pdf
43200 Questionshttp://www.techfaq360.com/reguser.do?m=reg
4420 PMP Questionshttp://www.techfaq360.com/reguser.do?m=reg
45200 PMP Questionshttp://www.pmstudy.com/enroll.asp#PMP
4650 PMP Questionshttp://mad.ly/signups/88508/join
47160 PMP Questionshttp://www.bestsamplequestions.com/pmp-sample-questions/pmp-sample- questions.html
481 PMP Question per dayhttp://www.pmexam.com/18201.html
49400 PMP Questionshttp://pmhub.net/pmsuccess/Menu.htm
50165 PMP Questionshttp://www.passionatepm.com/free-pmp-exam-practice-test-questions
5120 PMP Questions from TestYourCandidatehttp://www.passionatepm.com/free-pmp-exam-practice-test-questions
52200 PMP Questions from TestYourCandidatehttp://www.passionatepm.com/free-pmp-exam-practice-test-questions
5325 PMP Questions from pinnacle3learninghttp://www.pinnacle3learning.com/pmp-assessment-quiz.html
54100 PMP Questions from pmpforsurehttp://pmpforsure.com/
5510 PMP Questions from pmtraining.comhttp://www.pmtraining.com/Public/PMPPracticeExams.aspx#!
5620 PMP Questions from corethoughts.co.inhttp://corethoughts.co.in/prepare-for-the-pmp-exam/#!
57553 PMP Questions from careeraddonshttp://www.careeraddons.com/
58204 PMP Questions from practicequiz.comhttp://www.practicequiz.com/PMI-PMP-Project-Management
5920 PMP Questions from CertMagichttp://www.certmagic.com/demo.php?exid=946

Please feel free to visit My Journey to PMP and Project Management section on this blog

Wednesday, December 11, 2013

Google App Script: How to read data from Google Spreadsheet?

Today, I am going to share Google App Script which you can use to read data from Google Spreadsheet. For demonstration purpose I have created a Google spreadsheet with employee records. Each of the rows pertain an employee name, DOB and title.

The purpose is to write the script which will eventually read each of the row and cell value. To start with we need to click on Tools->Script editor

If this is first time you are creating a Google App Script the Project window will appear on the screen; click on Blank Project.

This will open the Script editor window. The script editor window contains the default function myfunction.

We will delete all the code written by default by the Script editor and write following code snippet.

Once we are done with code snippet we will click on Save and save the project with a name.

After that we can click on the run button. GAS (Google App Script) will ask for authorization. Click on Continue to run the authorization.

In the next screen It will Request for Permission as the script is interacting with Google Drive and Google Spreadsheet. Click on Accept.

The Script will run successfully and it will print out the value of each of the cell in the Log. To see the log go to View->Log

Monday, December 9, 2013

Google App Script: How to pull all Google contacts in Google Spreadsheet?

Today I will share a simple Google App Script that we can use to pull all Google contacts in a Google Spreadsheet. You can use store this spreadsheet to your Google Drive as a backup of all your personal or professional contacts.

To do so let us create a Google Spreadsheet with the following four columns – Contact Name, Email Address, Phone Number and Address.

Next we need to go to Tools -> Script Editor. The Script editor window will open. We can delete all the code by default written on the script editor window. Let us add the following code in the Script editor.

function readAllContacts()
{
    var row =2;
    var mySpredSheetFile = SpreadsheetApp.getActiveSpreadsheet();    
  
    var allContacts=ContactsApp.getContacts();     
  
    for (var i = 0; i < allContacts.length; i++) 
    {
      
      mySpredSheetFile.getRange('A'+row).setValue(allContacts[i].getFullName());
      
      var emailAddress=allContacts[i].getEmails();
      for (var j in emailAddress) 
      {
        mySpredSheetFile.getRange('B'+row).setValue(emailAddress[j].getAddress());
        
      }
      
      var emailPhone=allContacts[i].getPhones();
      for (var j in emailPhone) 
      {
        mySpredSheetFile.getRange('C'+row).setValue(emailPhone[j].getPhoneNumber());
        
      }
      
      
      var emailAdd=allContacts[i].getAddresses();
      for (var j in emailAdd) 
      {
        mySpredSheetFile.getRange('D'+row).setValue(emailAdd[j].getAddress());
        
      }
      
         
      row=row+1; 
      
    }
  
}

From the Script editor window we need to click on Run. The script will run successfully and the spreadsheet will be populated with all the data from our Google Contacts.

For more Google App Script visit the Google App Script section.

Gmail Tips: How to Unsend your email once you hit the Send button?

Gmail the popular Email service from Google has added a new feature which we can use to Unsend any email message. Unsend means we can tell Gmail to stop sending the email message to the recipients email after we hit the Send button. This small feature gives 10 Seconds timeline to stop sending your mail.

To use this feature first we need to enable the Unsend tool. To do this we nee to go to Settings->Labs

We need to go to the Undo Send section and click on Enable radio button. Once done Click on Save Changes

Next time when you Compose and send an email the "Undo" link will appear on the screen along with the message "Your email has been sent". This will appear for 10 seconds and you can click on Undo and email message will appear in the compose window. You wil not loose any email content.

For more Gmail Tips visit the Tips section.

Saturday, December 7, 2013

Google App Script: How to set your personal assistant to send B'day wishes automatically?

Today we will explore a simple Google App Script which will send out the Birth day wishes to our friends; the beauty of the solution is that it will send out the Happy B'day messages, not matter whether we are online or not; we remember the b'day date or not; No matter we have internet connection or not; once you setup the Google App Script it is going to act as a personal assistant to send B'day wishes.

The idea is we will keep the name, DOB and email address of our contacts in the Google Spreadsheet. We will develop a Google App Script which will read the DOB of each of the contacts and if the DOB is same as today date it will sent out a birthday message to the contact’s email address. The Script will be setup as a job which will run automatically once every midnight.

For the demonstration purpose, I have taken some dummy name, DOB and email address. I have kept my email address in the spreadsheet so that all b’day messages are send out to my email address; in reality we will keep our friends name, DOB and their email address in the spreadsheet.

Once our Spreadsheet file is ready we need to go to Tools -> Script Editor. It will open the Script editor window. Delete everything written by default in the code editor and write following App Script.

function SendBdayWishes() {
  
  /* --- Set Bday Message --- */
  
  var Bdymsg="Wish You a very Happy Bday, God bless you always!!!";
  
  /* --- Reading Today's Date --- */
  
    var todayDate=new Date();
    var dayOfToday = Utilities.formatDate(todayDate, "GMT-2","MM");
    var monthOfToday = Utilities.formatDate(todayDate, "GMT-2","dd");
   
  
    var mySheet = SpreadsheetApp.getActiveSheet();   
    var frndData = mySheet.getDataRange().getValues();   
  
    var frndDay;
    var frndMonth;
  
  
  for (var i = 0; i < frndData.length; i++) 
  { 
    
   /* --- Reading Day and Month from Date of Birth Column  --- */
    
    frndDay = Utilities.formatDate(new Date(frndData[i][1]), "GMT-2","MM");
    frndMonth = Utilities.formatDate(new Date(frndData[i][1]), "GMT-2","dd");
    frndEmail=frndData[i][2];
    
    if((frndDay==dayOfToday) && (frndMonth==monthOfToday))
    {
      GmailApp.sendEmail(frndEmail, "Happy Bday" , "Dear " + frndData[i][0] + " " + Bdymsg)       
    }
    
  
  }
}

Once you are done with the writing Script you can click on run button to test your script. If the script runs successfully and there is a DOB today an email wishing Happy B'day will be sent to that email address.

Now we will set this script as a Job which will run every midnight and check to see if there is a b’day today and send the message subsequently.

To do so, click on the Current Project Trigger button (watch shape button).

It will show the list of project triggers setup for this project or spreadsheet. Click on the link “Click to add one now”.

This will take us to the project triggers window; select the function name, Event and time when you want to run this Google App Script. Click on Save to return.

That is all you need to do. The Google App Script will take care of running the function every midnight and sending the b’day wishes. Next time you have a new friend; just add his/her name in the spreadsheet. You can definitely call or wish your friend on facebook or twitter but this script will wish him even if you forget or got busy in doing so.

I setup the DOB for one record to be today date for testing purpose and I got the email wishing Happy B'day.

For more Google App Script visit the Google App Script section.

Friday, December 6, 2013

Google App Script: How to read your Google Drive statistics?

Google App Script is cloud based Scripting language. It can be used use to design solutions which can access and leverage the Google services such as Gmail, Spreadsheet, Google Drive, Google BigQuery, Google Calendar etc.

In this post we will learn how we can write a simple script to get statistics about our Google drive. How many files, folders etc we have stored in our Google Drive.

To start with we need to go to Google Apps Script. We need to select a Blank Project.

This will open the Google App Script editor. We will delete all the code and paste the following code.

function ReadGoogleDrive() {
  
  /* The Purpose of this App Script is read Google Drive statistics */
  
  var myTotalFiles=DocsList.getAllFiles().length;
  var myTotalFolders=DocsList.getAllFolders().length;

  var myTotlaSperadsheetFile=DocsList.getFilesByType('Spreadsheet').length;
  var myTotlaPresentationFile=DocsList.getFilesByType('Presentation').length;
  var myTotlaDrawFile=DocsList.getFilesByType('Drawing').length;
  var myTotlaFormFile=DocsList.getFilesByType('Form').length;
  var myTotlaDocumentFile=DocsList.getFilesByType('Document').length;
  
  
  Logger.log("Google Drive Statistics for " + Session.getActiveUser());
  Logger.log("All Files in Google Drive: " + myTotalFiles);
  Logger.log("All Folders in Google Drive: " + myTotalFolders);
  Logger.log("All Spreadsheet Files in Google Drive: " + myTotlaSperadsheetFile);
  Logger.log("All Presentation Files in Google Drive: " + myTotlaPresentationFile);
  Logger.log("All Drawing Files in Google Drive: " + myTotlaDrawFile);
  Logger.log("All Forms Files in Google Drive: " + myTotlaDrawFile);
  Logger.log("All Documents Files in Google Drive: " + myTotlaDrawFile);
  
  
}

Next we need to click on Run

For the first time it will ask for the Authorization to run the Script. We need to click on Continue.

Google will show the Permission window and request for permission so that it can pull data from Google services such as Google Drive,and Google Account.

Click on Accept. Again click on the Run button. This time the script will run and interact with the Google Drive and Google Account and pull out the Google Drive statistics. To see the result of your Google App Script go to View->Logs

For more Google App Script visit the Google App Script section.

Thursday, December 5, 2013

Project Management through Pictures
What is Conduct Procurement?

Conduct Procurement is the process in which the Buyer formally initiate the process of obtaining good or service. It includes the process of advertising the Request for Proposal (RFP) or Request for Quotatios (RFQ) or Tender notice in newspaper, Government website so that Sellers can have a look what is needed.

The Seller based on his or her interest prepare the Proposal or Bid and submit to Buyer. Buyer evaluate each of the bid and finally decide a Seller or select a group of sellers for further negotiation.

Conduct Procurement is part of Executing Process Group

The Conduct Procurement in pictures are following


What is a Project Charter?
What is a Project Management Plan?
What are Organizational Structure?
What is Brainstorming?
PM: Do you know your Project Cost?
PM: How accurate is your Project Cost Estimates?
How to calculate Team utilization index?
How to motivate your Project Team?
PM: Do you know your Communication Channel?
What is XMR Chart?
What is Plan Procurement Management?

Wednesday, December 4, 2013

Project Management through Pictures
What is Plan Procurement Management?

Procurement Management is one of the Knowledge Area in Project Management. Every organization need to acquire services or products at some point of time during the course of their action. One who Buy the service or product is called "Buyer" and one who provide the service or product is called "Seller". This relation is based on mutual understanding.

Procurement Management has four processes under it.

1. Plan Procurement
2. Conduct Procurement
3. Manage Procurement
4. Close Procurement

Today we will look at Plan Procurement through pictures.

At a high level during Plan Procurement process following things happen.

  • Performing Organization and Project team decide to either "Make" or "Buy" a service, equipment or product. The decision is based on well thought process.
  • If the decision is to Buy Procurement SOW is preparred by the Project Manage and Project Team in consulation with the Procurment Dept/Procument Manager.
  • Procurement SOW is passed to suppliers who in turn respond to the Procurment SOW

Next: What is Conduct Procurement


What is a Project Charter?
What is a Project Management Plan?
What are Organizational Structure?
What is Brainstorming?
PM: Do you know your Project Cost?
PM: How accurate is your Project Cost Estimates?
How to calculate Team utilization index?
How to motivate your Project Team?
PM: Do you know your Communication Channel?
What is XMR Chart?
What is Plan Procurement Management?

Popular Posts

Real Time Web Analytics