Python 计算每个标点符号_Python_Pandas_Csv

Python 计算每个标点符号

python pandas csv

Python 计算每个标点符号,python,pandas,csv,Python,Pandas,Csv,我有一个包含大量数据的CSV文件，我想计算每个标点的数量现在，我只知道如何计算文本的全部标点符号，但不知道如何将它们分开我想在CSV文件中保存每行标点符号的编号下面是我尝试获取每个标点符号的编号，但我得到了错误，如re.error:no to repeat at position 0' news=pd.read\u csvcluesnew.csv news['？']=news.string_column.str.count'？' news['[']=news.string_column.s

我有一个包含大量数据的CSV文件，我想计算每个标点的数量

现在，我只知道如何计算文本的全部标点符号，但不知道如何将它们分开

我想在CSV文件中保存每行标点符号的编号

下面是我尝试获取每个标点符号的编号，但我得到了错误，如re.error:no to repeat at position 0'

news=pd.read\u csvcluesnew.csv news['？']=news.string_column.str.count'？' news['[']=news.string_column.str.count'[' news[']']=news.string_column.str.count']' news['！']=news.string_column.str.count'！' news['；']=news.string_column.str.count'；' news['{']=news.string_column.str.count'{' news['}']=news.string_column.str.count'}' news['/']=news.string_column.str.count'/' news['-']=news.string_column.str.count'-' 新闻[''']=news.string\u column.str.count'' news['，']=news.string_column.str.count'，' news['.]=news.string_column.str.count'' news['：']=news.string_column.str.count'：' news['`]=news.string_column.str.count'' news['.']=news.string_column.str.count'.' news.to_csvcluesnew.csv cluesnew.csv的一些示例

ID string_column
1  In 2017 alone, death due to diabetes was recorded at 10.1 per cent.
2  12.4 per cent of the country's citizens have diabetes.

结果数据帧的示例如下：

string_column                                                         . , [ ] ! ` { ....
In 2017 alone, death due to diabetes was recorded at 10.1 per cent.   1 1 0 0 0 0 0 ....
12.4 per cent of the country's citizens have diabetes.                1 0 0 0 0 1 0 ....

非常感谢您的帮助，谢谢。

您不需要熊猫帮忙。标准库csv模块可以处理该任务

以下是Python 3中使用csv模块的可能解决方案：

从字符串导入标点符号从集合导入订单从pathlib导入路径导入csv source=路径“/Path/to/source.csv” 目标=路径“/Path/to/target.csv” 使用source.open作为src，target.open'w'作为tgt： csvreader=csv.readersrc，quoting=csv.QUOTE_ALL csvwriter=csv.writergt，quoting=csv.QUOTE_ALL 从源中跳过标题行下一代领导将标题写入目标 csvwriter.writerow['string\u column']+[[u代表标点符号] 计数器=订单数据通信对于csvreader中的字符串列*：重置每行的计数器柜台 string\u column=line.rstrip 计数标点符号对于标点符号中的c：计数器[c]=字符串\列.countc 写一行 csvwriter.writerow[string_column]+[counter[_]表示标点符号中的_] 之后，如果愿意，您可以通过以下方式轻松地将生成的target.csv读入pandas数据框：

import string
p = [ch for ch in s if ch in string.punctuation]

from itertools import groupby
counts = [len(list(punc)) for key, punc in groupby(p)]

>>>作为pd进口熊猫 >>>df=pd.read\u csv目标 >>>df.columns 索引['string_column'、'！'、'、'、'$'、'%'、'&'、'、'*'、'+'， ',', '-', '.', '/', ':', ';', '?', '@', '[', '\', ']', '^', '_', '`', '{', '|', '}', '~'], dtype='object' >>>

我希望它能有所帮助。

您可以像这样从字符串中获取所有标点符号：

import string
p = [ch for ch in s if ch in string.punctuation]

from itertools import groupby
counts = [len(list(punc)) for key, punc in groupby(p)]

然后，您可以计算在这种情况下每个字符标点遇到的次数，如下所示：

import string
p = [ch for ch in s if ch in string.punctuation]

from itertools import groupby
counts = [len(list(punc)) for key, punc in groupby(p)]

这是一种使用regex的方法

例：

输出：

你能展示你的数据帧样本吗？我假设标点符号是上面的列表？正则表达式提取匹配的组加上串长应该是最好的方法，但请提供一个样本dataframestring.Percentration and collections.Counter可以帮助你。向我们展示您迄今为止所做的尝试，我们可以帮助您进行开发。@strawberrylatte，刚刚更新了我的答案，现在您有了一个完整的代码，可以按照您想要的方式生成CSV。请检查一下。如果可能的话，发布cluesnew.csv内容的样本，用作输入。确实如此，但我更喜欢使用defaultdict来确保键的顺序始终相同。我知道，在所有版本的Python中，dict都不是这样。但是defaultdict已经在OrderedDict之上实现了，我更喜欢使用它。@accdias这对panda有用吗？因为我用熊猫来读csvfile@aws_apprentice，这是有意义的，因为OP想把它写在CSV文件上，为了完成这项工作，他需要所有列的顺序都相同，你不同意吗？@strawberrylatte，我想会的。您需要向我们展示更多您尝试过的内容，以便我们可以帮助您从中发展。我给你的那个罐子将把所有的线和各自的计数器放在数组中，以便以后使用。