Run the above python program using following spark-submit command. N_words = lines.map(lambda line : len(line.split())) Lines = sc.textFile("/home/arjun/workspace/spark/sample.txt") In this example, we will map sentences to number of words in the sentence.Ĭonf = SparkConf().setAppName("Read Text to RDD - Python") 17/11/28 19:40:42 INFO DAGScheduler: ResultStage 0 (collect at /home/arjun/workspace/spark/spark-rdd-map-example-2.py:18) finished in 1.253 sġ7/11/28 19:40:42 INFO DAGScheduler: Job 0 finished: collect at /home/arjun/workspace/spark/spark-rdd-map-example-2.py:18, took 1.945158 sġ7/11/28 19:40:42 INFO SparkContext: Invoking stop() from shutdown hook Python Example 2 – Spark RDD.map() $ spark-submit spark-rdd-map-example-2.pyįollowing is the output of this Python Application in console. Run the following command to submit this Python program to run as Spark Application. Log_values = numbers.map(lambda n : math.log10(n)) # create Spark context with Spark configurationĬonf = SparkConf().setAppName("Map Numbers to their Log Values - Python") Spark-rdd-map-example-2.py import sys, mathįrom pyspark import SparkContext, SparkConf In this example, we will map integers in RDD to their logarithmic values using Python. We have successfully created a new RDD with strings transformed to number of words in it. 17/11/28 16:25:22 INFO DAGScheduler: ResultStage 0 (collect at RDDmapExample.java:24) finished in 0.568 sġ7/11/28 16:25:22 INFO DAGScheduler: Job 0 finished: collect at RDDmapExample.java:24, took 0.852748 sġ7/11/28 16:25:22 INFO SparkContext: Invoking stop() from shutdown hook Run the above Java Example, and you would get the following output in console. JavaRDD n_words = lines.map(x -> x.split(" ").length) įollowing is the input text file we used.ĭata/rdd/input/sample.txt Welcome to TutorialKart String path = "data/rdd/input/sample.txt" In this example, we will map an RDD of Strings to an RDD of Integers with each element in the mapped RDD representing the number of words in the input RDD. 17/11/28 16:31:11 INFO DAGScheduler: ResultStage 0 (collect at RDDmapExample2.java:23) finished in 0.373 sġ7/11/28 16:31:11 INFO DAGScheduler: Job 0 finished: collect at RDDmapExample2.java:23, took 1.067919 sġ7/11/28 16:31:11 INFO SparkContext: Invoking stop() from shutdown hook Java Example 2 – Spark RDD.map() Run this Spark Application and you would get the following output in the console. JavaRDD log_values = numbers.map(x -> Math.log(x)) map each line to number of words in the line JavaSparkContext sc = new JavaSparkContext(sparkConf) SparkConf sparkConf = new SparkConf().setAppName("Read Text to RDD") We shall then call map() function on this RDD to map integer items to their logarithmic values The item in RDD is of type Integer, and the output for each item would be Double.
In this example, we will an RDD with some integers. Examples Java Example 1 – Spark RDD Map Example Where is the transformation function for each of the element of source RDD.