Découverte des jobs
Comment utiliser les jobs déjà présent dans Hadoop ?
Une fois Hadoop installé et lancé, il suffit de lancé les exemples déjà présent sous le nom de hadoop-mapreduce-examples-{version}.jar du répertoire /usr/lib/hadoop-mapreduce à l'aide de la commande :
sudo -u USER hadoop jar /usr/lib/hadoop-mapreduce/JAR ARGUMENT1 ARGUMENT2 ARGUMENT3
Premier Job : Calcul de PI
Ce premier job MapReduce permet d'estimer la valeur du nombre pi et ne requiert pas de fichiers.
Le programme utilise une méthode statistique Monte-Carlo pour faire l'estimation.[1]
Question
Lancer hadoop-mapreduce-examples-{version} avec comme argument pi, le nombre de map souhaitées (2) et le nombre d'échantillon souhaités par map (5).
Syntaxe : Commande
Simulation : Résultat
Number of Maps = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
...
14/01/29 23:35:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/01/29 23:35:46 INFO mapreduce.Job: Running job: job_local1171327146_0001
14/01/29 23:35:47 INFO mapreduce.Job: Job job_local1171327146_0001 running in uber mode : false
14/01/29 23:35:48 INFO mapreduce.Job: map 0% reduce 0%
14/01/29 23:35:49 INFO mapreduce.Job: map 100% reduce 0%
14/01/29 23:35:51 INFO mapreduce.Job: map 100% reduce 100%
14/01/29 23:35:51 INFO mapreduce.Job: Job job_local1171327146_0001 completed successfully
14/01/29 23:35:51 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=812626
FILE: Number of bytes written=1408734
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=590
HDFS: Number of bytes written=923
HDFS: Number of read operations=33
HDFS: Number of large read operations=0
HDFS: Number of write operations=15
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=294
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=0
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=128
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=435634176
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 10.98 seconds
Estimated value of Pi is 3.60000000000000000000
Number of Maps = 2 Samples per Map = 5 Wrote input for Map #0 Wrote input for Map #1 Starting Job ... 14/01/29 23:35:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 14/01/29 23:35:46 INFO mapreduce.Job: Running job: job_local1171327146_0001 14/01/29 23:35:47 INFO mapreduce.Job: Job job_local1171327146_0001 running in uber mode : false 14/01/29 23:35:48 INFO mapreduce.Job: map 0% reduce 0% 14/01/29 23:35:49 INFO mapreduce.Job: map 100% reduce 0% 14/01/29 23:35:51 INFO mapreduce.Job: map 100% reduce 100% 14/01/29 23:35:51 INFO mapreduce.Job: Job job_local1171327146_0001 completed successfully 14/01/29 23:35:51 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=812626 FILE: Number of bytes written=1408734 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=590 HDFS: Number of bytes written=923 HDFS: Number of read operations=33 HDFS: Number of large read operations=0 HDFS: Number of write operations=15 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=36 Map output materialized bytes=56 Input split bytes=294 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=0 Reduce input records=4 Reduce output records=0 Spilled Records=8 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=128 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=435634176 File Input Format Counters Bytes Read=236 File Output Format Counters Bytes Written=97 Job Finished in 10.98 seconds Estimated value of Pi is 3.60000000000000000000
Question
Ajouter le fichier NbMots.txt dans le système de fichier.
Lancer hadoop-mapreduce-examples-{version} avec comme argument wordcount le nom du fichier en input (NbMots.txt) et la sortie (out).
Ce fichier contient la phrase suivante :
Tout ce qui est rare est cher , un cheval bon marché est rare , donc un cheval bon marché est cher .
Syntaxe : Commande
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount NbMots.txt out
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount NbMots.txt out
Pour voir le fichier d'output :
Simulation : Résultats
En console on obtient :
[...]
17/01/15 16:23:16 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=592280
FILE: Number of bytes written=1166453
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=206
HDFS: Number of bytes written=83
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=1
Map output records=23
Map output bytes=195
Map output materialized bytes=141
Input split bytes=109
Combine input records=23
Combine output records=13
Reduce input groups=13
Reduce shuffle bytes=141
Reduce input records=13
Reduce output records=13
Spilled Records=26
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=58
Total committed heap usage (bytes)=335683584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=103
File Output Format Counters
Bytes Written=83
[...] 17/01/15 16:23:16 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=592280 FILE: Number of bytes written=1166453 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=206 HDFS: Number of bytes written=83 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=1 Map output records=23 Map output bytes=195 Map output materialized bytes=141 Input split bytes=109 Combine input records=23 Combine output records=13 Reduce input groups=13 Reduce shuffle bytes=141 Reduce input records=13 Reduce output records=13 Spilled Records=26 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=58 Total committed heap usage (bytes)=335683584 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=103 File Output Format Counters Bytes Written=83
Dans l'output on obtient :
Pour supprimer le répertoire d'output, il faut éxécuter la commande :
Question
Lancer hadoop-mapreduce-examples-{version} avec comme argument wordmean le nom du fichier en input (NbMots.txt) et la sortie (out).
Syntaxe : Commande
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordmean NbMots.txt out
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordmean NbMots.txt out
Pour voir le fichier d'output :
Simulation : Résultats
Dans la console, on obtient :
[...]
17/01/15 16:27:34 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=592076
FILE: Number of bytes written=1166147
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=206
HDFS: Number of bytes written=19
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=1
Map output records=46
Map output bytes=667
Map output materialized bytes=39
Input split bytes=109
Combine input records=46
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=39
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=51
Total committed heap usage (bytes)=335683584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=103
File Output Format Counters
Bytes Written=19
The mean is: 3.391304347826087
[...] 17/01/15 16:27:34 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=592076 FILE: Number of bytes written=1166147 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=206 HDFS: Number of bytes written=19 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=1 Map output records=46 Map output bytes=667 Map output materialized bytes=39 Input split bytes=109 Combine input records=46 Combine output records=2 Reduce input groups=2 Reduce shuffle bytes=39 Reduce input records=2 Reduce output records=2 Spilled Records=4 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=51 Total committed heap usage (bytes)=335683584 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=103 File Output Format Counters Bytes Written=19 The mean is: 3.391304347826087
Dans l'output, on obtient :
Question

Ajouter le fichier sudoku.dta dans le système de fichier.
Lancer hadoop-mapreduce-examples-{version} avec comme argument sudoku le nom du fichier en input (sudoku.dta).
Syntaxe : Commande
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sudoku sudoku.dta
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sudoku sudoku.dta
Simulation : Résultat
Dans la console, on obtient le sudoku résolu :
Solving sudoku.dta
8 5 1 3 9 2 6 4 7
4 3 2 6 7 8 1 9 5
7 9 6 5 1 4 3 8 2
6 1 4 8 2 3 7 5 9
5 7 8 9 6 1 4 2 3
3 2 9 4 5 7 8 1 6
9 4 7 2 8 6 5 3 1
1 8 5 7 3 9 2 6 4
2 6 3 1 4 5 9 7 8
Found 1 solutions
Solving sudoku.dta 8 5 1 3 9 2 6 4 7 4 3 2 6 7 8 1 9 5 7 9 6 5 1 4 3 8 2 6 1 4 8 2 3 7 5 9 5 7 8 9 6 1 4 2 3 3 2 9 4 5 7 8 1 6 9 4 7 2 8 6 5 3 1 1 8 5 7 3 9 2 6 4 2 6 3 1 4 5 9 7 8 Found 1 solutions