Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将字符串转换为数据帧,并指定列数?_Python_String_Pandas_Dataframe - Fatal编程技术网

Python 如何将字符串转换为数据帧,并指定列数?

Python 如何将字符串转换为数据帧,并指定列数?,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我有一个字符串,如下所示: string = "entity precision recall f1-score support B-EXPERIENCE 0.578 0.488 0.529 244 I-EXPERIENCE 0.648 0.799 0.716 399 L-EXPERIENCE 0.850 0.697 0.766 244 U-EXPERIENCE 0.000 0.000 0.000 9 B-LANGUAGE 0.000 0.000 0.000 1 I-LANGUAGE 0.000

我有一个字符串,如下所示:

string = "entity precision recall f1-score support B-EXPERIENCE 0.578 0.488 0.529 244 I-EXPERIENCE 0.648 0.799 0.716 399 L-EXPERIENCE 0.850 0.697 0.766 244 U-EXPERIENCE 0.000 0.000 0.000 9 B-LANGUAGE 0.000 0.000 0.000 1 I-LANGUAGE 0.000 0.000 0.000 1 L-LANGUAGE 0.000 0.000 0.000 1 U-LANGUAGE 0.788 0.904 0.842 292 B-PROGRAMMING 0.480 0.433 0.455 141 I-PROGRAMMING 0.524 0.328 0.404 67 L-PROGRAMMING 0.261 0.255 0.258 141 U-PROGRAMMING 0.904 0.825 0.862 2010 micro_avg 0.785 0.746 0.765 3550 macro_avg 0.419 0.394 0.403 3550 weighted_avg 0.787 0.746 0.763 3550"
将其转换为以下格式的pandas数据帧的最简单方法是什么?我希望创建一个有5列的dataframe,第一列的标题可以用“entity”填充。第一列包含实体的名称

您可以尝试以下方法:

import pandas as pd
s1 = "entity precision recall f1-score support B-EXPERIENCE 0.578 0.488 0.529 244 I-EXPERIENCE 0.648 0.799 0.716 399 L-EXPERIENCE 0.850 0.697 0.766 244 U-EXPERIENCE 0.000 0.000 0.000 9 B-LANGUAGE 0.000 0.000 0.000 1 I-LANGUAGE 0.000 0.000 0.000 1 L-LANGUAGE 0.000 0.000 0.000 1 U-LANGUAGE 0.788 0.904 0.842 292 B-PROGRAMMING 0.480 0.433 0.455 141 I-PROGRAMMING 0.524 0.328 0.404 67 L-PROGRAMMING 0.261 0.255 0.258 141 U-PROGRAMMING 0.904 0.825 0.862 2010 micro_avg 0.785 0.746 0.765 3550 macro_avg 0.419 0.394 0.403 3550 weighted_avg 0.787 0.746 0.763 3550"

s = pd.Series(s1.split(' '))
df = pd.DataFrame(s[5:].to_numpy().reshape(-1,5), columns=s[:5])
输出:

           entity precision recall f1-score support
0    B-EXPERIENCE     0.578  0.488    0.529     244
1    I-EXPERIENCE     0.648  0.799    0.716     399
2    L-EXPERIENCE     0.850  0.697    0.766     244
3    U-EXPERIENCE     0.000  0.000    0.000       9
4      B-LANGUAGE     0.000  0.000    0.000       1
5      I-LANGUAGE     0.000  0.000    0.000       1
6      L-LANGUAGE     0.000  0.000    0.000       1
7      U-LANGUAGE     0.788  0.904    0.842     292
8   B-PROGRAMMING     0.480  0.433    0.455     141
9   I-PROGRAMMING     0.524  0.328    0.404      67
10  L-PROGRAMMING     0.261  0.255    0.258     141
11  U-PROGRAMMING     0.904  0.825    0.862    2010
12      micro_avg     0.785  0.746    0.765    3550
13      macro_avg     0.419  0.394    0.403    3550
14   weighted_avg     0.787  0.746    0.763    3550
           entity precision recall f1-score support
0    B-EXPERIENCE     0.578  0.488    0.529     244
1    I-EXPERIENCE     0.648  0.799    0.716     399
2    L-EXPERIENCE     0.850  0.697    0.766     244
3    U-EXPERIENCE     0.000  0.000    0.000       9
4      B-LANGUAGE     0.000  0.000    0.000       1
5      I-LANGUAGE     0.000  0.000    0.000       1
6      L-LANGUAGE     0.000  0.000    0.000       1
7      U-LANGUAGE     0.788  0.904    0.842     292
8   B-PROGRAMMING     0.480  0.433    0.455     141
9   I-PROGRAMMING     0.524  0.328    0.404      67
10  L-PROGRAMMING     0.261  0.255    0.258     141
11  U-PROGRAMMING     0.904  0.825    0.862    2010
12      micro_avg     0.785  0.746    0.765    3550
13      macro_avg     0.419  0.394    0.403    3550
14   weighted_avg     0.787  0.746    0.763    3550
详情:

使用
split
以空格作为分隔符来拆分字符串,因此请求更改列标题命名以从列标题中删除空格


使用构造函数创建pd.Series,然后使用构造函数和索引切片创建pd.DataFrame
to_numpy
创建一个numpy数组,然后
使用-1表示行数,5表示列数来重塑数组。

如果要调整最后三个条目中的字符串并删除空格(例如,用破折号替换),则以下代码可以工作,也可以扩展到更多行:

my_list = string.split(' ') # split the string along the whitespaces

my_dict = {}
num_cols = 5
# convert the string to a dictionary with appropriate keys
for i in range(0,num_cols):
    my_dict.update({my_list[i]:my_list[num_cols+i::num_cols]})

# Convert dict to pandas DataFrame
df = pd.DataFrame(my_dict)

我会使用numpy重塑:

data = np.array(string.split())
data = data.reshape(len(data)//5, 5)
df = pd.DataFrame(data[1:], columns=data[0]).set_index('entity').rename_axis('')
print(df)
给出:

              precision recall f1-score support

B-EXPERIENCE      0.578  0.488    0.529     244
I-EXPERIENCE      0.648  0.799    0.716     399
L-EXPERIENCE      0.850  0.697    0.766     244
U-EXPERIENCE      0.000  0.000    0.000       9
B-LANGUAGE        0.000  0.000    0.000       1
I-LANGUAGE        0.000  0.000    0.000       1
L-LANGUAGE        0.000  0.000    0.000       1
U-LANGUAGE        0.788  0.904    0.842     292
B-PROGRAMMING     0.480  0.433    0.455     141
I-PROGRAMMING     0.524  0.328    0.404      67
L-PROGRAMMING     0.261  0.255    0.258     141
U-PROGRAMMING     0.904  0.825    0.862    2010
micro_avg         0.785  0.746    0.765    3550
macro_avg         0.419  0.394    0.403    3550
weighted_avg      0.787  0.746    0.763    3550

另一种方法是使用
yield
将字符串平均分成5个列表,返回到上次迭代时的状态:

cols = string.split()[:5]
vals = string.split()[5:]

# Define function to make evenly chunks of your words
def divide_chunks(l, n): 

    for i in range(0, len(l), n):  
        yield l[i:i + n]
现在我们可以定义我们的数据帧:

df = pd.DataFrame(list(divide_chunks(vals, 5)), columns=cols)
输出:

           entity precision recall f1-score support
0    B-EXPERIENCE     0.578  0.488    0.529     244
1    I-EXPERIENCE     0.648  0.799    0.716     399
2    L-EXPERIENCE     0.850  0.697    0.766     244
3    U-EXPERIENCE     0.000  0.000    0.000       9
4      B-LANGUAGE     0.000  0.000    0.000       1
5      I-LANGUAGE     0.000  0.000    0.000       1
6      L-LANGUAGE     0.000  0.000    0.000       1
7      U-LANGUAGE     0.788  0.904    0.842     292
8   B-PROGRAMMING     0.480  0.433    0.455     141
9   I-PROGRAMMING     0.524  0.328    0.404      67
10  L-PROGRAMMING     0.261  0.255    0.258     141
11  U-PROGRAMMING     0.904  0.825    0.862    2010
12      micro_avg     0.785  0.746    0.765    3550
13      macro_avg     0.419  0.394    0.403    3550
14   weighted_avg     0.787  0.746    0.763    3550
           entity precision recall f1-score support
0    B-EXPERIENCE     0.578  0.488    0.529     244
1    I-EXPERIENCE     0.648  0.799    0.716     399
2    L-EXPERIENCE     0.850  0.697    0.766     244
3    U-EXPERIENCE     0.000  0.000    0.000       9
4      B-LANGUAGE     0.000  0.000    0.000       1
5      I-LANGUAGE     0.000  0.000    0.000       1
6      L-LANGUAGE     0.000  0.000    0.000       1
7      U-LANGUAGE     0.788  0.904    0.842     292
8   B-PROGRAMMING     0.480  0.433    0.455     141
9   I-PROGRAMMING     0.524  0.328    0.404      67
10  L-PROGRAMMING     0.261  0.255    0.258     141
11  U-PROGRAMMING     0.904  0.825    0.862    2010
12      micro_avg     0.785  0.746    0.765    3550
13      macro_avg     0.419  0.394    0.403    3550
14   weighted_avg     0.787  0.746    0.763    3550

5列还是4列?我在您的示例5列中看到4。我稍微编辑了一下描述。你能不能更改该字符串的输入,使“macro_avg”变成“macro_avg”,用下划线替换行索引中的空格?或者,你的分隔符是值选项卡而不是空格?我按照你的要求更改了输入!谢谢这是最有用的,因为我实际上更喜欢空白空间。伟大的快乐编码!