Découverte des jobs
Comment utiliser les jobs déjà présent dans Hadoop ?
Une fois Hadoop installé et lancé, il suffit de lancé les exemples déjà présent sous le nom de hadoop-mapreduce-examples-{version}.jar du répertoire /usr/lib/hadoop-mapreduce à l'aide de la commande :
sudo -u USER hadoop jar /usr/lib/hadoop-mapreduce/JAR ARGUMENT1 ARGUMENT2 ARGUMENT3
Premier Job : Calcul de PI
Ce premier job MapReduce permet d'estimer la valeur du nombre pi et ne requiert pas de fichiers.
Le programme utilise une méthode statistique Monte-Carlo pour faire l'estimation.[1]
Question
Lancer hadoop-mapreduce-examples-{version} avec comme argument pi, le nombre de map souhaitées (2) et le nombre d'échantillon souhaités par map (5).
Syntaxe : Commande
sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 2 5
Simulation : Résultat
Number of Maps = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
...
14/01/29 23:35:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/01/29 23:35:46 INFO mapreduce.Job: Running job: job_local1171327146_0001
14/01/29 23:35:47 INFO mapreduce.Job: Job job_local1171327146_0001 running in uber mode : false
14/01/29 23:35:48 INFO mapreduce.Job: map 0% reduce 0%
14/01/29 23:35:49 INFO mapreduce.Job: map 100% reduce 0%
14/01/29 23:35:51 INFO mapreduce.Job: map 100% reduce 100%
14/01/29 23:35:51 INFO mapreduce.Job: Job job_local1171327146_0001 completed successfully
14/01/29 23:35:51 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=812626
FILE: Number of bytes written=1408734
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=590
HDFS: Number of bytes written=923
HDFS: Number of read operations=33
HDFS: Number of large read operations=0
HDFS: Number of write operations=15
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=294
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=0
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=128
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=435634176
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 10.98 seconds
Estimated value of Pi is 3.60000000000000000000
Question
Ajouter le fichier NbMots.txt dans le système de fichier.
Lancer hadoop-mapreduce-examples-{version} avec comme argument wordcount le nom du fichier en input (NbMots.txt) et la sortie (out).
Ce fichier contient la phrase suivante :
Tout ce qui est rare est cher , un cheval bon marché est rare , donc un cheval bon marché est cher .
Pour l'ajout :
/usr/local/hadoop/bin/hadoop dfs -copyFromLocal NbMots.txt
/usr/local/hadoop/bin/hadoop dfs -ls
Syntaxe : Commande
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount NbMots.txt out
Pour voir le fichier d'output :
/usr/local/hadoop/bin/hadoop fs -cat /user/hduser/out/part-r-00000
Simulation : Résultats
En console on obtient :
[...]
17/01/15 16:23:16 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=592280
FILE: Number of bytes written=1166453
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=206
HDFS: Number of bytes written=83
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=1
Map output records=23
Map output bytes=195
Map output materialized bytes=141
Input split bytes=109
Combine input records=23
Combine output records=13
Reduce input groups=13
Reduce shuffle bytes=141
Reduce input records=13
Reduce output records=13
Spilled Records=26
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=58
Total committed heap usage (bytes)=335683584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=103
File Output Format Counters
Bytes Written=83
Dans l'output on obtient :
, 2
. 1
Tout 1
bon 2
ce 1
cher 2
cheval 2
donc 1
est 4
marché 2
qui 1
rare 2
un 2
Pour supprimer le répertoire d'output, il faut éxécuter la commande :
/usr/local/hadoop/bin/hadoop fs -rm -r /user/hduser/out
Question
Lancer hadoop-mapreduce-examples-{version} avec comme argument wordmean le nom du fichier en input (NbMots.txt) et la sortie (out).
Syntaxe : Commande
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordmean NbMots.txt out
Pour voir le fichier d'output :
/usr/local/hadoop/bin/hadoop fs -cat /user/hduser/out/part-r-00000
Simulation : Résultats
Dans la console, on obtient :
[...]
17/01/15 16:27:34 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=592076
FILE: Number of bytes written=1166147
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=206
HDFS: Number of bytes written=19
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=1
Map output records=46
Map output bytes=667
Map output materialized bytes=39
Input split bytes=109
Combine input records=46
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=39
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=51
Total committed heap usage (bytes)=335683584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=103
File Output Format Counters
Bytes Written=19
The mean is: 3.391304347826087
Dans l'output, on obtient :
count 23
length 78
Question
Ajouter le fichier sudoku.dta dans le système de fichier.
Lancer hadoop-mapreduce-examples-{version} avec comme argument sudoku le nom du fichier en input (sudoku.dta).
Pour l'ajout :
/usr/local/hadoop/bin/hadoop dfs -copyFromLocal sudoku.dta
/usr/local/hadoop/bin/hadoop dfs -ls
Syntaxe : Commande
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sudoku sudoku.dta
Simulation : Résultat
Dans la console, on obtient le sudoku résolu :
Solving sudoku.dta
8 5 1 3 9 2 6 4 7
4 3 2 6 7 8 1 9 5
7 9 6 5 1 4 3 8 2
6 1 4 8 2 3 7 5 9
5 7 8 9 6 1 4 2 3
3 2 9 4 5 7 8 1 6
9 4 7 2 8 6 5 3 1
1 8 5 7 3 9 2 6 4
2 6 3 1 4 5 9 7 8
Found 1 solutions