Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用pandas从文本字段中提取数据_Python_Pandas - Fatal编程技术网

Python 使用pandas从文本字段中提取数据

Python 使用pandas从文本字段中提取数据,python,pandas,Python,Pandas,我试图通过从笔记中提取信息来创建一个熊猫数据框。我想要一些专栏 phonenumber | status | result | notation (999) 555-9898 Partial Generic VM VOICE MAIL LEFT 注: Event Notation Call Call to (Home) (999) 555-9898 ended. Partial – Generic VM --> - VOICE MAIL LEFT

我试图通过从笔记中提取信息来创建一个熊猫数据框。我想要一些专栏

phonenumber    | status   | result      | notation
(999) 555-9898  Partial    Generic VM   VOICE MAIL LEFT
注:

Event   Notation
Call    Call to (Home) (999) 555-9898 ended. Partial – Generic VM --> - VOICE MAIL LEFT 
Call    Call to (Work) (999) 555-9898 ended. Partial - Voice Mail, No Message left -->
Call    Call to (Work) (999) 555-9898 ended. Positive –  Spoke to Receptionist --> 
Call    Call to (Mobile) (999) 555-9898 ended. Partial – Generic VM --> - Unable to reach customer, voice message left and text sent
Procedure   Procedure 'Verify' is checked
Procedure   Procedure 'Duplicate Check' is checked
Procedure   Procedure 'Check Something' is checked
Procedure   Procedure 'Scenario' is checked
Procedure   Procedure 'Attempt' is checked
我将创建第二个数据帧,并尝试提取过程事件单引号中的单个单词

procedure
Verify
Duplicate Check
Check Something

为了给您一个想法,这里可能有一些东西可以开始(但是,请注意,这是我第一次使用正则表达式):

['Call Call to(Home)(999)555-9898结束。部分-通用虚拟机-->-语音邮件左侧',
'呼叫(工作)(999)555-9898已结束。部分-语音邮件,未留下任何消息-->',
“拨打(工作)电话(999)555-9898已结束。确认-已与接待员通话-->”,
'呼叫(移动电话)(999)555-9898已结束。部分-通用虚拟机-->-无法联系到客户,留下语音消息并发送文本',
“已选中“验证”程序”,
“已选中程序“重复检查”,
“已选中“检查某物”程序”,
“程序‘场景’已选中”,
“程序‘尝试’已选中”]

['(999)555-9898'],['(999)555-9898'],['(999)555-9898'],['(999)555-9898']
[['Partial'],['Partial'],['Positive'],['Partial']]

这个问题的焦点有点偏离了。这里最具挑战性的部分是按照一些不典型的模式从文本中提取数据字段。熊猫在这方面帮不了什么忙。与此相比,将结果存储在数据帧中,甚至只是一个dict列表中是非常简单的。我建议您阅读Yep,Valgur是正确的。使用正则表达式应该可以做到这一点
import re
data = []
with open('notes.txt', 'r') as f:
    next(f)
    for line in f:
        data.append(line.strip('\n'))
data
phone = []
status = []
for line in data:
    tmp = line.split(' ')
    if tmp[0] == 'Call':
        p_phone = re.compile('[(]\d{3}[)] \d{3}-\d{4}')
        p_status = re.compile('Partial|Positive')
        phone.append(p_phone.findall(line))
        status.append(p_status.findall(line))
    elif tmp[0] == "Procedure":
        pass
print(phone)
print(status)