Apache spark 从spark中的列中获取前10个单词
在标题文本列中查找标题分类印度的前10个单词?您可以将标题拆分为单词,分解单词数组,按单词分组,然后计算单词数Apache spark 从spark中的列中获取前10个单词,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,在标题文本列中查找标题分类印度的前10个单词?您可以将标题拆分为单词,分解单词数组,按单词分组,然后计算单词数 date_range = mydata[mydata.headline_category=='india'].sort('publish_date') date_range.show() +-------------------+-----------------+--------------------+ | publish_date|headline_catego
date_range = mydata[mydata.headline_category=='india'].sort('publish_date')
date_range.show()
+-------------------+-----------------+--------------------+
| publish_date|headline_category| headline_text|
+-------------------+-----------------+--------------------+
|2001-01-04 00:00:00| india|Dudhwa tiger died...|
|2001-01-05 00:00:00| india|MP best in forest...|
|2001-05-28 00:00:00| india|India-Bangladesh ...|
|2001-05-28 00:00:00| india|Govt to modernise...|
|2001-05-28 00:00:00| india|Priyanka is the C...|
|2001-05-28 00:00:00| india|MPs riling Relian...|
|2001-05-28 00:00:00| india|CBI probing A-I's...|
|2001-05-28 00:00:00| india|Gujarat braces as...|
|2001-05-28 00:00:00| india|Ayodhya may force...|
|2001-05-28 00:00:00| india|3 new frigates to...|
|2001-05-28 00:00:00| india|Plea in SC challe...|
|2001-05-28 00:00:00| india|Kashmiri Sikhs pr...|
|2001-05-28 00:00:00| india|Bengal to revamp ...|
|2001-05-29 00:00:00| india|Rs 280 cr sanctio...|
|2001-05-29 00:00:00| india|DD Metro is up fo...|
|2001-05-29 00:00:00| india|Govt employees' n...|
|2001-05-29 00:00:00| india|BMS; Left to oppo...|
|2001-05-29 00:00:00| india|CBI vetting paper...|
|2001-05-29 00:00:00| india|Indo-Pak ties: Fr...|
|2001-05-29 00:00:00| india|BJP; Samata to st...|
+-------------------+-----------------+--------------------+
您可以将标题拆分为单词,分解单词数组,按单词分组,并计算单词数
date_range = mydata[mydata.headline_category=='india'].sort('publish_date')
date_range.show()
+-------------------+-----------------+--------------------+
| publish_date|headline_category| headline_text|
+-------------------+-----------------+--------------------+
|2001-01-04 00:00:00| india|Dudhwa tiger died...|
|2001-01-05 00:00:00| india|MP best in forest...|
|2001-05-28 00:00:00| india|India-Bangladesh ...|
|2001-05-28 00:00:00| india|Govt to modernise...|
|2001-05-28 00:00:00| india|Priyanka is the C...|
|2001-05-28 00:00:00| india|MPs riling Relian...|
|2001-05-28 00:00:00| india|CBI probing A-I's...|
|2001-05-28 00:00:00| india|Gujarat braces as...|
|2001-05-28 00:00:00| india|Ayodhya may force...|
|2001-05-28 00:00:00| india|3 new frigates to...|
|2001-05-28 00:00:00| india|Plea in SC challe...|
|2001-05-28 00:00:00| india|Kashmiri Sikhs pr...|
|2001-05-28 00:00:00| india|Bengal to revamp ...|
|2001-05-29 00:00:00| india|Rs 280 cr sanctio...|
|2001-05-29 00:00:00| india|DD Metro is up fo...|
|2001-05-29 00:00:00| india|Govt employees' n...|
|2001-05-29 00:00:00| india|BMS; Left to oppo...|
|2001-05-29 00:00:00| india|CBI vetting paper...|
|2001-05-29 00:00:00| india|Indo-Pak ties: Fr...|
|2001-05-29 00:00:00| india|BJP; Samata to st...|
+-------------------+-----------------+--------------------+
anyway answer将在sql或spark dataframe中运行anyway answer将在sql或spark dataframe中运行