Python 从一长串字符串列表获取数据帧

Python 从一长串字符串列表获取数据帧,python,pandas,Python,Pandas,我有一个pdf格式的价格表,我已经导入到一个列表中。该格式使我的列表如下面的示例所示 sample = ['model description price model description price 39A Bolt 25.00 21B valve 322.40 AB3003 Engine 5000\n20B Nut 1.50 25B LockNut 3.50', 'model description price model description price 44C

我有一个pdf格式的价格表,我已经导入到一个列表中。该格式使我的列表如下面的示例所示

sample = ['model description price model description price 39A Bolt 25.00 21B valve 322.40 AB3003 Engine  5000\n20B Nut 1.50 25B LockNut 3.50',
          'model description price  model description price 44C Spanner 100.00 01BC Pipe 3.10 ZZ010  Blade  345.44\n33J Tube 8.89 377GH CAM 44.20']
对于列表的每个元素,“模型描述价格”始终重复一次,然后是实际数据。对于列表中的每个元素,实际数据的型号/说明/价格序列多达约20个

我想得到的是一个简单的熊猫数据框,如下所示:

data={'model':['39A','21B','AB3003','20B','25B','44C','01BC','ZZ010','33J','377GH'], 
      'description': ['Bolt', 'valve', 'engine','nut','locknut', 'spanner','pipe','blade','tube','cam'],
     'price': [25.00,322.40,5000,1.50,3.50,100.00,3.10,345.44,8.89,44.20]}
pd.DataFrame(data)

我尝试了以下方法,至少将原始列表中的每个元素分隔成嵌套列表

sample2 = [i.split(' ') for i in sample]
pd.DataFrame(sample2)
但结果与我正在寻找的数据帧完全不同

谁能帮我一下吗


谢谢

这是我写的,适用于您提供的示例:

import re, pdb
import pandas as pd
sample = ['model description price model description price 39A Bolt 25.00 21B valve 322.40 AB3003 Engine  5000\n20B Nut 1.50 25B LockNut 3.50',
          'model description price  model description price 44C Spanner 100.00 01BC Pipe 3.10 ZZ010  Blade  345.44\n33J Tube 8.89 377GH CAM 44.20']

DF_list = []

#remove 'model description price'
remove_str = lambda sub_str, some_str: re.sub(r'\b' + sub_str + r'\b', '', some_str)
sample = [remove_str('model description price', x).strip() for x in sample]

#remove newline
sample = [re.split(r'[\n\t]', line) for line in sample]
sample = [item for sublist in sample for item in sublist]


#parse into data frame
for data_string in sample:
  data_list = data_string.split(' ')
  data_list = [x for x in data_list if x != '']
  small_lists = [data_list[x:x+3] for x in range(0, len(data_list), 3)]
  [DF_list.append(x) for x in small_lists]

DF = pd.DataFrame(DF_list)