sparksql: RDD ,DataFrame, DataSet 太过爱你忘了你带给我的痛 2022-02-27 11:28 252阅读 0赞 # 1, 基本概念 # * spark 1.5及以前 <table> <thead> <tr> <th>范围</th> <th align="left">类名</th> <th>创建来源</th> </tr> </thead> <tbody> <tr> <td>spark core</td> <td align="left">RDD: 封装基本数据(int,tuple )</td> <td>sparkContext.parallelize(1 to 3), sc.</td> </tr> <tr> <td>spark sql</td> <td align="left">DataFrame: RDD[Row]</td> <td>sQLContext.read.json(…)</td> </tr> <tr> <td>spark hive</td> <td align="left">DataFrame: RDD[Row]</td> <td>HiveContext</td> </tr> <tr> <td>spark streaming</td> <td align="left">DStream: [rdd,rdd,…]</td> <td>streamingContext.socketTextStream(…)</td> </tr> </tbody> </table> * spark1.6 - spark2.x: 引入DataSet <table> <thead> <tr> <th>范围</th> <th align="left">类名</th> <th>创建来源</th> </tr> </thead> <tbody> <tr> <td>spark sql</td> <td align="left">DataSet: RDD[任意类型] > DataFrame:RDD[Row]</td> <td>rdd.toDS()</td> </tr> </tbody> </table> * spark2.x 及以后 : 钨丝计划, 引入SparkSession val spark = SparkSession.builder().master("local").appName("test").getOrCreate() spark.sparkContext.textFile("a.txt")//rdd spark.sqlContext.sql("show databases")//DataFrame * RDD ,DataFrame, DataSet之间的转换 <table> <thead> <tr> <th align="left">类别变换</th> <th align="left">方法</th> </tr> </thead> <tbody> <tr> <td align="left">rdd-> df</td> <td align="left">tupRDD.toDF(“id”,“name”)</td> </tr> <tr> <td align="left">rdd-> ds</td> <td align="left">rdd.toDS()</td> </tr> <tr> <td align="left">ds <-> df</td> <td align="left">val df = ds.toDF() , val ds = <a href="http://df.as" rel="nofollow">df.as</a>[person]</td> </tr> <tr> <td align="left">df,ds -> rdd</td> <td align="left">val rowRDD=df.rdd, val personRDD=ds.rdd</td> </tr> </tbody> </table>
还没有评论,来说两句吧...