Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 从spark中的列中获取前10个单词_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Apache spark 从spark中的列中获取前10个单词

Apache spark 从spark中的列中获取前10个单词,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,在标题文本列中查找标题分类印度的前10个单词?您可以将标题拆分为单词,分解单词数组,按单词分组,然后计算单词数 date_range = mydata[mydata.headline_category=='india'].sort('publish_date') date_range.show() +-------------------+-----------------+--------------------+ | publish_date|headline_catego

在标题文本列中查找标题分类印度的前10个单词?

您可以将标题拆分为单词,分解单词数组,按单词分组,然后计算单词数

date_range = mydata[mydata.headline_category=='india'].sort('publish_date') 
date_range.show()

+-------------------+-----------------+--------------------+ 
|       publish_date|headline_category|       headline_text|
+-------------------+-----------------+--------------------+ 
|2001-01-04 00:00:00|            india|Dudhwa tiger died...| 
|2001-01-05 00:00:00|            india|MP best in forest...| 
|2001-05-28 00:00:00|            india|India-Bangladesh ...| 
|2001-05-28 00:00:00|            india|Govt to modernise...| 
|2001-05-28 00:00:00|            india|Priyanka is the C...| 
|2001-05-28 00:00:00|            india|MPs riling Relian...| 
|2001-05-28 00:00:00|            india|CBI probing A-I's...| 
|2001-05-28 00:00:00|            india|Gujarat braces as...| 
|2001-05-28 00:00:00|            india|Ayodhya may force...| 
|2001-05-28 00:00:00|            india|3 new frigates to...| 
|2001-05-28 00:00:00|            india|Plea in SC challe...| 
|2001-05-28 00:00:00|            india|Kashmiri Sikhs pr...| 
|2001-05-28 00:00:00|            india|Bengal to revamp ...| 
|2001-05-29 00:00:00|            india|Rs 280 cr sanctio...| 
|2001-05-29 00:00:00|            india|DD Metro is up fo...| 
|2001-05-29 00:00:00|            india|Govt employees' n...| 
|2001-05-29 00:00:00|            india|BMS; Left to oppo...| 
|2001-05-29 00:00:00|            india|CBI vetting paper...| 
|2001-05-29 00:00:00|            india|Indo-Pak ties: Fr...| 
|2001-05-29 00:00:00|            india|BJP; Samata to st...|
+-------------------+-----------------+--------------------+

您可以将标题拆分为单词,分解单词数组,按单词分组,并计算单词数

date_range = mydata[mydata.headline_category=='india'].sort('publish_date') 
date_range.show()

+-------------------+-----------------+--------------------+ 
|       publish_date|headline_category|       headline_text|
+-------------------+-----------------+--------------------+ 
|2001-01-04 00:00:00|            india|Dudhwa tiger died...| 
|2001-01-05 00:00:00|            india|MP best in forest...| 
|2001-05-28 00:00:00|            india|India-Bangladesh ...| 
|2001-05-28 00:00:00|            india|Govt to modernise...| 
|2001-05-28 00:00:00|            india|Priyanka is the C...| 
|2001-05-28 00:00:00|            india|MPs riling Relian...| 
|2001-05-28 00:00:00|            india|CBI probing A-I's...| 
|2001-05-28 00:00:00|            india|Gujarat braces as...| 
|2001-05-28 00:00:00|            india|Ayodhya may force...| 
|2001-05-28 00:00:00|            india|3 new frigates to...| 
|2001-05-28 00:00:00|            india|Plea in SC challe...| 
|2001-05-28 00:00:00|            india|Kashmiri Sikhs pr...| 
|2001-05-28 00:00:00|            india|Bengal to revamp ...| 
|2001-05-29 00:00:00|            india|Rs 280 cr sanctio...| 
|2001-05-29 00:00:00|            india|DD Metro is up fo...| 
|2001-05-29 00:00:00|            india|Govt employees' n...| 
|2001-05-29 00:00:00|            india|BMS; Left to oppo...| 
|2001-05-29 00:00:00|            india|CBI vetting paper...| 
|2001-05-29 00:00:00|            india|Indo-Pak ties: Fr...| 
|2001-05-29 00:00:00|            india|BJP; Samata to st...|
+-------------------+-----------------+--------------------+

anyway answer将在sql或spark dataframe中运行anyway answer将在sql或spark dataframe中运行