Spark語法_SparkContext與SparkSession的初始化及RDD操作
https://link.springer.com/chapter/10.1007/978-1-4842-9380-5_1 下載PySpark與findspark的套件 pip install pyspark pip install findspark 引入套件並初始化Spark Context跟Spark Session import findspark findspark.init() # PySpark is the Spark API for Python. In this lab, we use PySpark to initialize the spark context. from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession # Creating a spark context class sc = SparkContext() # Creating a spark session spark = SparkSession \ .builder \ .appName( "Python Spark DataFrames basic example" ) \ .config( "spark.some.config.option" , "some-value" ) \ .getOrCreate() if 'spark' in locals () and isinstance (spark, SparkSession): print ( "SparkSession is active and ready to use." ) else : print ( "SparkSession is not active. Please create a SparkSession." ) 透過調用 sc.parallelize() 來創建一個 RDD 創建了一個包含從 1 到 30 的整數的 RDD。 data = range ( 1 , 30 ) # print...