Python 使用多个拆分选择文本_Python_Split

Python 使用多个拆分选择文本

python

Python 使用多个拆分选择文本,python,split,Python,Split,我已经开始学习python，并被一项关于操作文本数据的作业困住了。我需要操纵的文本行示例如下： From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008 我需要从每行中提取小时数（在本例中为09），然后找到发送电子邮件的最常见时间基本上，我需要做的是构建一个for循环，以冒号分隔每个文本 split(':') 然后按空间分割： split() 我已经试了好几个小时了，但似乎想不出来。到目前为止，我的代码是什么样子的： name =

我已经开始学习python，并被一项关于操作文本数据的作业困住了。我需要操纵的文本行示例如下：

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008

我需要从每行中提取小时数（在本例中为09），然后找到发送电子邮件的最常见时间

基本上，我需要做的是构建一个for循环，以冒号分隔每个文本

split(':')

然后按空间分割：

split()

我已经试了好几个小时了，但似乎想不出来。到目前为止，我的代码是什么样子的：

name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
counts = dict()
lst = list()
temp = list()
for line in handle:
    if not "From " in line: continue
    words = line.split(':')  
    for word in words:
        counts[word] = counts.get(word,0) + 1

for key, val in counts.items():
    lst.append( (val, key) )
lst.sort(reverse = True)

for val, key in lst:
print key, val

name=raw\u输入（“输入文件：”）
如果len（name）<1:name=“mbox short.txt”
句柄=打开（名称）
计数=dict（）
lst=列表（）
temp=list（）
对于线输入句柄：
如果行中没有“From”：继续
words=line.split（“：”）
用文字表示：
计数[字]=计数。获取（字，0）+1
对于键，值在counts.items（）中：
lst.append（（val，key））
lst.sort（反向=真）
对于val，输入lst：
打印键

上面的代码只进行了1次拆分，但我一直在尝试使用多种方法再次拆分文本。我不断得到一个列表属性错误，说“列表对象没有属性拆分”。如果您能在这方面提供帮助，我将不胜感激。再次感谢

首先

import re

然后替换

words = line.split(':')  
for word in words:
    counts[word] = counts.get(word,0) + 1

借

输入：

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 12:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 15:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 12:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 13:14:16 2008
From stephen.marquard@uct.ac.za Sat Jan  5 12:14:16 2008

输出：

使用与Marcel Jacques Machado相同的测试文件：

>>> from collections import Counter
>>> Counter(line.split(' ')[-2].split(':')[0] for line in open('input')).items()
[('12', 3), ('09', 4), ('15', 1), ('13', 1)]

这表明

出现4次，而

只出现一次

如果我们想要更漂亮的输出，我们可以做一些格式化。这显示了从最常见到最不常见的小时数及其计数：

>>> print('\n'.join('{} {}'.format(hh, n) for hh,n in Counter(line.split(' ')[-2].split(':')[0] for line in open('input')).most_common()))
09 4
12 3
15 1
13 1

line.split（“：”[0]。split（“”[-1]”

？通常，对于开发，特别是对于共享代码，将示例数据放在程序本身中。然后其他人可以运行和修改您的代码。在本例中，

handle=

，只有几行。FWIW，我相信@L3viathan代码片段会解决您的特定问题。谢谢您的帮助！然而，由于某种原因，代码只输出一位数字，这使得像1和0这样的数字在计数中显示很多（因为它们是第一位数字）。我怎么能让它数到两位数？我试图将其设置为行。拆分（“：”[0]。拆分（“”）（0:2），但出现了一个错误

>>> print('\n'.join('{} {}'.format(hh, n) for hh,n in Counter(line.split(' ')[-2].split(':')[0] for line in open('input')).most_common()))
09 4
12 3
15 1
13 1