Skip to content

Commit e6b4f4b

Browse files
committed
Improved Documentation for Hadoop
1 parent 2e59951 commit e6b4f4b

File tree

2 files changed

+44
-17
lines changed

2 files changed

+44
-17
lines changed

hadoop/README.md

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,36 @@ In order to run Hadoop in a pseudo-distributed fashion, we need to enable passwo
215215
<ol>
216216
<li>In the terminal, execute <code>ssh localhost</code> to test if you can open a <a href="https://en.wikipedia.org/wiki/Secure&#95;Shell">secure shell</a> connection to your current, local computer <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Setup&#95;passphraseless&#95;ssh">without needing a password</a>.
217217
</li>
218+
<li>It may say something like:
219+
<pre>ssh: connect to host localhost port 22: Connection refused</pre>.
220+
If it does say this, then do
221+
<pre>sudo apt-get install ssh</pre>
222+
and it may say something like
223+
<pre>
224+
Reading package lists... Done
225+
Building dependency tree
226+
Reading state information... Done
227+
The following extra packages will be installed:
228+
libck-connector0 ncurses-term openssh-server openssh-sftp-server
229+
ssh-import-id
230+
Suggested packages:
231+
rssh molly-guard monkeysphere
232+
The following NEW packages will be installed:
233+
libck-connector0 ncurses-term openssh-server openssh-sftp-server ssh
234+
ssh-import-id
235+
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
236+
Need to get 661 kB of archives.
237+
After this operation, 3,528 kB of additional disk space will be used.
238+
Do you want to continue? [Y/n] y
239+
...
240+
Setting up ssh-import-id (4.5-0ubuntu1) ...
241+
Processing triggers for ufw (0.34-2) ...
242+
Setting up ssh (1:6.9p1-2ubuntu0.2) ...
243+
Processing triggers for libc-bin (2.21-0ubuntu4.1) ...
244+
Processing triggers for systemd (225-1ubuntu9.1) ...
245+
Processing triggers for ureadahead (0.100.0-19) ...
246+
</pre>
247+
OK, now you've got SSH installed. Do <code>ssh localhost</code> again.</li>
218248
<li>It may ask you something like
219249
<pre>
220250
The authenticity of host 'localhost (127.0.0.1)' can't be established.
@@ -245,10 +275,10 @@ Are you sure you want to continue connecting (yes/no)?
245275
</pre>
246276
which you would answer with <code>yes</code> followed by a hit to the enter button. If, after that, you get a message like <code>0.0.0.0: packet&#95;write&#95;wait: Connection to 127.0.0.1: Broken pipe</code>, enter <code>sbin/stop-dfs.sh</code>, hit return, and do <code>sbin/start-dfs.sh</code> again.</li>
247277
<li>In your web browser, open <code>http://localhost:50070/</code>. It should display a web page giving an overview about the Hadoop system now running on your local computer.</li>
248-
<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code><userName></code> with your user/login name on your current machine.
278+
<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code>&lt;userName&gt;</code> with your user/login name on your current machine.
249279
<pre>
250280
bin/hdfs dfs -mkdir /user
251-
bin/hdfs dfs -mkdir /user/<userName>
281+
bin/hdfs dfs -mkdir /user/&lt;userName&gt;
252282
bin/hdfs dfs -put etc/hadoop input
253283
</pre></li>
254284
<li>We can now run the job via
@@ -269,18 +299,18 @@ cat output/*
269299
We now want to run one of the provided examples. Let us assume we want to run the <code>wordCount</code> example. For other examples, just replace <code>wordCount</code> with their names in the following text. I assume that the <code>distributedComputingExamples</code> repository is located in a folder <code>Y</code> on your machine.
270300
<ol>
271301
<li>Open a terminal and enter your Hadoop installation folder. I assume you installed Hadoop version <code>2.7.2</code> into a folder named <code>X</code>, so you would <code>cd</code> into <code>X/hadoop-2.7.2/</code>.</li>
272-
<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code><userName></code> with your local login/user name.
302+
<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code>&lt;userName&gt;</code> with your local login/user name.
273303
<pre>
274304
bin/hdfs namenode -format
275305
</pre>
276306
(answer with <code>Y</code> when asked whether to re-format the file system)
277307
<pre>
278308
sbin/start-dfs.sh
279309
bin/hdfs dfs -mkdir /user
280-
bin/hdfs dfs -mkdir /user/<userName>
310+
bin/hdfs dfs -mkdir /user/&lt;userName&gt;
281311
</pre>
282312
If you actually properly cleaned up the file system after running your last examples (see the second-to-last step here), you just need to do <code>sbin/start-dfs.sh</code> and do not need to format the HDFS.</li>
283-
<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>
313+
<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/hadoop/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>
284314
<li>Do <code>bin/hdfs dfs -ls input</code> to check if the files have properly been copied.</li>
285315
<li>You can now do <code>bin/hadoop jar Y/distributedComputingExamples/hadoop/wordCount/target/wordCount-full.jar input output</code>. This command will start the main class of the example, which resides in the fat jar <code>wordCount-full.jar</code>, with the parameters <code>input</code> and <code>output</code>. <code>input</code> here is the input folder, which we previously have copied to the Hadoop file system. <code>output</code> is the output folder to be created. If you execute this command, you will see lots of logging information.</li>
286316
<li>Do <code>bin/hdfs dfs -ls output</code>. You will see output like
@@ -332,13 +362,13 @@ Sometimes, you may try to copy some file or folder to HDFS and get an error that
332362

333363
<ol>
334364
<li>Execute <code>sbin/stop-dfs.sh</code></li>
335-
<li>Delete the folder <code>/tmp/hadoop-<userName></code>, where <code><userName></code> is to replaced with your local login/user name.</li>
365+
<li>Delete the folder <code>/tmp/hadoop-&lt;userName&gt;</code>, where <code>&lt;userName&gt;</code> is to replaced with your local login/user name.</li>
336366
<li>Now perform
337367
<pre>
338368
bin/hdfs namenode -format
339369
sbin/start-dfs.sh
340370
bin/hdfs dfs -mkdir /user
341-
bin/hdfs dfs -mkdir /user/<userName>
371+
bin/hdfs dfs -mkdir /user/&lt;userName&gt;
342372
</pre>
343373
</li><li>
344374
If you now repeat the operation that failed before, it should succeed.

hadoop/webFinder/src/main/java/webFinder/WebFinderDriver.java

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,18 +29,18 @@ public static void main(final String[] args) throws Exception {
2929

3030
@Override
3131
public int run(final String[] args) throws Exception {
32+
final Configuration conf;
33+
final Job job;
3234

33-
final Configuration conf = new Configuration();
34-
final Job job = Job.getInstance(conf, "Your job name");
35+
conf = new Configuration();
36+
job = Job.getInstance(conf, "Your job name");
3537

3638
job.setJarByClass(WebFinderDriver.class);
3739

3840
if (args.length < 2) {
3941
return 1;
4042
}
41-
42-
if (args.length > 2) {// set max depth
43-
// pass parameter to mapper
43+
if (args.length > 2) {// set max depth and pass parameter to mapper
4444
conf.setInt("maxDepth", Integer.parseInt(args[2]));
4545
}
4646

@@ -56,11 +56,8 @@ public int run(final String[] args) throws Exception {
5656
job.setInputFormatClass(TextInputFormat.class);
5757
job.setOutputFormatClass(TextOutputFormat.class);
5858

59-
final Path filePath = new Path(args[0]);
60-
FileInputFormat.setInputPaths(job, filePath);
61-
62-
final Path outputPath = new Path(args[1]);
63-
FileOutputFormat.setOutputPath(job, outputPath);
59+
FileInputFormat.setInputPaths(job, new Path(args[0]));
60+
FileOutputFormat.setOutputPath(job, new Path(args[1]));
6461

6562
job.waitForCompletion(true);
6663
return 0;

0 commit comments

Comments
 (0)