Python 如何将文字保存在CSV文件中，该文件是从带有句子id号的文章中标记出来的？_Python_Pandas_Csv_Preprocessor

Python 如何将文字保存在CSV文件中，该文件是从带有句子id号的文章中标记出来的？

python pandas csv

Python 如何将文字保存在CSV文件中，该文件是从带有句子id号的文章中标记出来的？,python,pandas,csv,preprocessor,Python,Pandas,Csv,Preprocessor,我正在尝试从CSV文件中存储的文章中提取所有单词，并将句子id号和包含的单词写入新的CSV文件我到目前为止所做的 import pandas as pd from nltk.tokenize import sent_tokenize, word_tokenize df = pd.read_csv(r"D:\data.csv", nrows=10) row = 0; sentNo = 0 while( row < 1 ): sentences = tokenizer.tokeni

我正在尝试从CSV文件中存储的文章中提取所有单词，并将句子id号和包含的单词写入新的CSV文件

我到目前为止所做的

import pandas as pd
from nltk.tokenize import sent_tokenize, word_tokenize
df = pd.read_csv(r"D:\data.csv", nrows=10)

row = 0; sentNo = 0
while( row < 1 ):
    sentences = tokenizer.tokenize(df['articles'][row])
    for index, sents in enumerate(sentences):
        sentNo += 1
        words = word_tokenize(sents)
        print(f'{sentNo}: {words}')
    row += 1

我只取了

df['articles'][0]

，它给出如下输出：

1:['The', 'ultimate', 'productivity', 'hack', 'is', 'saying', 'no', '.']
2:['Not', 'doing', 'something', 'will', 'always', 'be', 'faster', 'than', 'doing', 'it', '.']
3:['This', 'statement', 'reminds', 'me', 'of', 'the', 'old', 'computer', 'programming', 'saying', ',', '“', 'Remember', 'that', 'there', 'is', 'no', 'code', 'faster', 'than', 'no', 'code', '.', '”']

如何以给定格式编写一个新的

output.csv

文件，其中包含

data.csv

文件中所有文章的所有句子：

Sentence No | Word
1             The
              ultimate
              productivity
              hack
              is
              saying
              no
              .
2             Not
              doing 
              something 
              will
              always
              be
              faster
              than
              doing
              it
              .
3             This 
              statement 
              reminds 
              me 
              of 
              the 
              old 
              computer 
              programming 
              saying
              , 
              “
              Remember
              that 
              there
              is
              no
              code
              faster
              than
              no
              code
              .
              ”

我是Python新手，在Jupyter笔记本上使用它

这是我关于堆栈溢出的第一篇文章。如果有什么不对劲，请纠正我，让我学习。谢谢。

只需反复阅读单词，并为每个单词写一行新词即可

将有点不可预测，因为你还有逗号作为“单词”——可能想考虑另一个定界符，或者从单词列表中删除逗号。编辑：这似乎是一个更干净的方法

import pandas as pd
from nltk.tokenize import sent_tokenize, word_tokenize

df = pd.read_csv(r"D:\data.csv", nrows=10)
sentences = tokenizer.tokenize(df['articles'[row]])
f = open('output.csv','w+')
stcNum = 1

for stc in sentences:
  for word in stc:
    prntLine = ','
    if word == stc[0]:
      prntLine = str(stcNum) + prntLine
    prntLine = prntLine + word + '\r\n'
    f.write(prntLine)
  stcNum += 1

f.close()

output.csv：

1,The
,ultimate
,productivity
,hack
,is
,saying
,no
,.
2,Not
,doing
,something
,will
,always
,be
,faster
,than
,doing
,it
,.
3,This
,statement
,reminds
,me
,of
,the
,old
,computer
,programming
,saying
,,     # <<< Most CSV parsers will see this as 3 empty columns
,“
,Remember
,that
,there
,is
,no
,code
,faster
,than
,no
,code
,.
,”

1，则
，终极
生产率
乱劈
是
说
不
,.
2，不是
，做
某物
，将
，总是
是
，更快
比
，做
信息技术
,.
3、这个
陈述
，提醒
我
属于
这个
古老的
，电脑
，编程
说
，，#谢谢@mikah barnett。我理解你的编辑，但我不想打印，我想把它们一起写在csv文件中。我被那个部分卡住了。编辑了我的答案，加入了一个更干净的版本，并输出到你选择的CSV文件。现在它完全符合我的目的了。我刚刚在我的数据集上尝到了甜头。您对逗号的理解非常正确，读取新创建的csv文件会产生问题。顺便说一句，非常感谢@mikah barnett:D
1,The
,ultimate
,productivity
,hack
,is
,saying
,no
,.
2,Not
,doing
,something
,will
,always
,be
,faster
,than
,doing
,it
,.
3,This
,statement
,reminds
,me
,of
,the
,old
,computer
,programming
,saying
,,     # <<< Most CSV parsers will see this as 3 empty columns
,“
,Remember
,that
,there
,is
,no
,code
,faster
,than
,no
,code
,.
,”