|
Discussion Forums
|
Thread: Hadoop+EC2
 |
This question is answered.
Helpful answers available: 1.
Correct answers available: 1.
|
|
|
|
Replies:
5
-
Pages:
1
-
Last Post:
Jan 17, 2008 11:39 AM
by: Akshay Java
|
|
|
Posts:
5
Registered:
1/16/08
|
|
|
|
Hadoop+EC2
Posted:
Jan 16, 2008 2:55 PM PST
|
|
|
Hi Folks,
I am thinking of using Hadoop+EC2 for running some web crawlers. I amtrying to follow the instructions in [1] and set up basic Hadoop+EC2example shown in this Wiki. Howver, I get the following errors:
I first run
./bin/hadoop-ec2 launch-cluster
and get the ip address of the Master, which I use to set in
bin/hadoop-ec2-env.sh script
Since I noticed that the script does not update the slaves file, Imanually edit it to ensure that the right slave nodes are present.
Now I start hadoop using
./bin/hadoop-ec2 start-hadoop
At this point when starting the cluster rsync uses the old IP addresspresent in the env file. So I login to the master node and edit/usr/local/hadoop-verion/conf/hadoop-env.sh to use the correct IPaddress of the master. The slaves files here contain the correctidentifiers.
I tried restarting hadoop once again and the script fails at
"starting namenode, logging to ....."
Restarting the script leads to an error msg:
"XXXX.compute-1.amazonaws.com: tasktracker running as process 1692. Stop it first."
I have a feeling that I am missing something important here and would appreciate any help / suggestion in this regard.
Thanks
Akshay
[1]
http://wiki.apache.org/lucene-hadoop/AmazonEC2
|
|
Posts:
444
Registered:
9/3/06
|
|
|
Posts:
5
Registered:
1/16/08
|
|
|
|
Re: Hadoop+EC2
Posted:
Jan 16, 2008 3:48 PM PST
in response to: enomaly
|
|
|
Hi reuven
Thanks for your response to my question. As per my understanding, I amusing the public AMIs. Following line in launch-hadoop-cluster callsthe public AMI:
# Start a cluster
echo "Starting cluster with AMI $AMI_IMAGE"
RUN_INSTANCES_OUTPUT=`ec2-run-instances $AMI_IMAGE -n $NO_INSTANCES -g$GROUP -k $KEY_NAME -d "$NO_INSTANCES,$MASTER_HOST" | grep INSTANCE |awk '{print $2}\
'`
Thanks
Akshay
|
|
Posts:
5
Registered:
1/16/08
|
|
|
|
Re: Hadoop+EC2
Posted:
Jan 16, 2008 10:51 PM PST
in response to: Akshay Java
|
|
|
After going through the scripts again it seems like there is some problem when setting the MASTER_HOST in the conf/hadoop-env.sh file on the master. Thereafter the slaves fail to rsync with the master.
I have also tried to manually set these values - but still cant get it to rsync.
Any suggestions or tips would be welcome,
Thanks
Akshay
|
|
Posts:
4
Registered:
7/31/06
|
|
|
|
Re: Hadoop+EC2
Posted:
Jan 17, 2008 7:16 AM PST
in response to: Akshay Java
|
 |
Helpful |
|
|
You need to get a DNS name for you master (from DynDNS for example), and set it in hadoop-ec2-env.sh
before
running the launch script. On launch you will be prompted to set the DNS to the IP address that the master has been allocated on EC2.
Tom
|
|
Posts:
5
Registered:
1/16/08
|
|
|
|
Re: Hadoop+EC2
Posted:
Jan 17, 2008 11:39 AM PST
in response to: Tom White
|
|
|
Hi Tom,
Thanks for the help. It works like a charm using DynDNS. Now next setpto figure out how to get Nutch+EC2. This is really cool! Thanks..
Akshay
|
|
|
|