Discussion Forums



Thread: Hadoop+EC2

This question is answered. Helpful answers available: 1. Correct answers available: 1.

Welcome, Guest Help
Login Login


Permlink Replies: 5 - Pages: 1 - Last Post: Jan 17, 2008 11:39 AM by: Akshay Java
Akshay Java

Posts: 5
Registered: 1/16/08
Hadoop+EC2
Posted: Jan 16, 2008 2:55 PM PST
 
  Click to reply to this thread Reply

Hi Folks,

I am thinking of using Hadoop+EC2 for running some web crawlers. I amtrying to follow the instructions in [1] and set up basic Hadoop+EC2example shown in this Wiki. Howver, I get the following errors:

I first run
./bin/hadoop-ec2 launch-cluster
and get the ip address of the Master, which I use to set in
 bin/hadoop-ec2-env.sh script

Since I noticed that the script does not update the slaves file, Imanually edit it to ensure that the right slave nodes are present.

Now I start hadoop using
./bin/hadoop-ec2 start-hadoop

At this point when starting the cluster rsync uses the old IP addresspresent in the env file. So I login to the master node and edit/usr/local/hadoop-verion/conf/hadoop-env.sh to use the correct IPaddress of the master. The slaves files here contain the correctidentifiers.

I tried restarting hadoop once again and the script fails at
"starting namenode, logging to ....."

Restarting the script leads to an error msg:
"XXXX.compute-1.amazonaws.com: tasktracker running as process 1692. Stop it first."

I have a feeling that I am missing something important here and would appreciate any help / suggestion in this regard.

Thanks
Akshay

[1] http://wiki.apache.org/lucene-hadoop/AmazonEC2


enomaly

Posts: 444
Registered: 9/3/06
Re: Hadoop+EC2
Posted: Jan 16, 2008 3:25 PM PST   in response to: Akshay Java
 
  Click to reply to this thread Reply

Use the Public AMI's you'll be up and running in two about 10 minutes.

reuven
http://www.enomalylabs.com



Akshay Java

Posts: 5
Registered: 1/16/08
Re: Hadoop+EC2
Posted: Jan 16, 2008 3:48 PM PST   in response to: enomaly
 
  Click to reply to this thread Reply

Hi reuven
Thanks for your response to my question. As per my understanding, I amusing the public AMIs. Following line in launch-hadoop-cluster callsthe public AMI:

# Start a cluster
echo "Starting cluster with AMI $AMI_IMAGE"
RUN_INSTANCES_OUTPUT=`ec2-run-instances $AMI_IMAGE -n $NO_INSTANCES -g$GROUP -k $KEY_NAME -d "$NO_INSTANCES,$MASTER_HOST" | grep INSTANCE |awk '{print $2}\
'`

Thanks
Akshay


Akshay Java

Posts: 5
Registered: 1/16/08
Re: Hadoop+EC2
Posted: Jan 16, 2008 10:51 PM PST   in response to: Akshay Java
 
  Click to reply to this thread Reply

After going through the scripts again it seems like there is some problem when setting  the MASTER_HOST in the conf/hadoop-env.sh file on the master. Thereafter the slaves fail to rsync with the master.

I have also tried to manually set these values - but still cant get it to rsync.

Any suggestions or tips would be welcome,
Thanks
Akshay


Tom White

Posts: 4
Registered: 7/31/06
Re: Hadoop+EC2
Posted: Jan 17, 2008 7:16 AM PST   in response to: Akshay Java
Helpful
  Click to reply to this thread Reply

You need to get a DNS name for you master (from DynDNS for example), and set it in  hadoop-ec2-env.sh before running the launch script. On launch you will be prompted to set the DNS to the IP address that the master has been allocated on EC2.

Tom


Akshay Java

Posts: 5
Registered: 1/16/08
Re: Hadoop+EC2
Posted: Jan 17, 2008 11:39 AM PST   in response to: Tom White
 
  Click to reply to this thread Reply

Hi Tom,
Thanks for the help. It works like a charm using DynDNS. Now next setpto figure out how to get Nutch+EC2. This is really cool! Thanks..
Akshay



Point your RSS reader here for a feed of the latest messages in all forums