python-在不知道列键的情况下从输出生成数据帧
如果我想从CLI输出创建dataframe,但不知道其中的列键,python-在不知道列键的情况下从输出生成数据帧,python,dataframe,Python,Dataframe,如果我想从CLI输出创建dataframe,但不知道其中的列键, 我只知道键结束的原始值(从原始值0开始)在哪里,我知道这些键之间的分隔符(\s+) 在这种情况下,有没有快速而好的方法从输出中查找(生成)dataframe的列键 例如: MODIFIED CORE SERVER ACTIVE
我只知道键结束的原始值(从原始值0开始)在哪里,我知道这些键之间的分隔符(
\s+
)
在这种情况下,有没有快速而好的方法从输出中查找(生成)dataframe的列键
例如:
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
------------------ ------------------------------ ----------------------------- --------
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
它应该生成以下键列表:
核心服务器、主动、被动、修改包
假设我理解正确,您可以拆分包含由空格分隔的所有列名的字符串输入,然后使用列表理解构建字典,然后从中生成空数据帧
import pandas as pd
string="""
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
------------------ ------------------------------ ----------------------------- --------
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
"""
string = string.split("-")[0]
col_names ={name:[ ] for name in re.split(r"\s\s+", string)
if name is not ""}
df = pd.DataFrame(col_names)
print(col_names)
print(df)
# with output below:
{'MODIFIED': [], 'CORE SERVER': [], 'ACTIVE': [], 'PASSIVE': [], 'PACKAGES': []}
Empty DataFrame
Columns: [MODIFIED, CORE SERVER, ACTIVE, PASSIVE, PACKAGES]
Index: []
正则表达式拆分的文档如下:如果您想使用正则表达式
因为您有行输出,可以在行中加倍,但似乎有连字符指示列大小,您可以使用类似于:
import re
import pandas as pd
string="""
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
------------------ ------------------------------ ----------------------------- --------
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
"""
rows = [row for row in re.split(r"\n|\r", string)]
for row in rows:
if "---" in row:
# get all of the splits below columns
indices = [i for i,j in enumerate(row) if j.isspace()]
# After you find the column width stop checking rows.
break
indices.insert(0, 0)
matrix = [ ]
for row in rows:
# from your output, hyphens show where headers stop
if "---" in row:
break
matrix.append([row[i:j] for i,j in zip(indices, indices[1:]+[None])])
n = (len(indices))
col_names = [""]*n
for i in range(n):
for row in matrix:
col_names[i] += row[i]
col_names[i] = col_names[i].strip()
df = pd.DataFrame(columns=[c for c in col_names if c is not ''])
print(df)
# with output:
Empty DataFrame
Columns: [CORE SERVER, ACTIVE, PASSIVE, MODIFIED PACKAGES]
Index: []
这段代码不是有史以来最有效的,但它完成了任务,并且不需要添加许多函数 假设我理解正确,您可以拆分包含由空格分隔的所有列名的字符串输入,然后使用列表理解构建一个字典,然后从中生成一个空数据框
import pandas as pd
string="""
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
------------------ ------------------------------ ----------------------------- --------
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
"""
string = string.split("-")[0]
col_names ={name:[ ] for name in re.split(r"\s\s+", string)
if name is not ""}
df = pd.DataFrame(col_names)
print(col_names)
print(df)
# with output below:
{'MODIFIED': [], 'CORE SERVER': [], 'ACTIVE': [], 'PASSIVE': [], 'PACKAGES': []}
Empty DataFrame
Columns: [MODIFIED, CORE SERVER, ACTIVE, PASSIVE, PACKAGES]
Index: []
正则表达式拆分的文档如下:如果您想使用正则表达式
因为您有行输出,可以在行中加倍,但似乎有连字符指示列大小,您可以使用类似于:
import re
import pandas as pd
string="""
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
------------------ ------------------------------ ----------------------------- --------
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
"""
rows = [row for row in re.split(r"\n|\r", string)]
for row in rows:
if "---" in row:
# get all of the splits below columns
indices = [i for i,j in enumerate(row) if j.isspace()]
# After you find the column width stop checking rows.
break
indices.insert(0, 0)
matrix = [ ]
for row in rows:
# from your output, hyphens show where headers stop
if "---" in row:
break
matrix.append([row[i:j] for i,j in zip(indices, indices[1:]+[None])])
n = (len(indices))
col_names = [""]*n
for i in range(n):
for row in matrix:
col_names[i] += row[i]
col_names[i] = col_names[i].strip()
df = pd.DataFrame(columns=[c for c in col_names if c is not ''])
print(df)
# with output:
Empty DataFrame
Columns: [CORE SERVER, ACTIVE, PASSIVE, MODIFIED PACKAGES]
Index: []
这段代码不是有史以来最有效的,但它完成了任务,并且不需要添加许多函数 好的
我创建了自己的函数来生成它,并且它可以工作
def auto_generate_dataframe_columns(output, raw_separtor="---", col_seperator=r'\s{2,}'):
"""Automatically generate dataframe columns.
:param output: output for generating columns from
:param raw_separtor: keys raw seperator symbol
:param col_seperator: columns raw separator symbol (default separator is double or more spaces)
:return: list with generated keys in case of success otherwise None
"""
if output is None:
return None
keys_lines_list = []
pattern = re.compile(col_seperator)
for line in output.splitlines():
if raw_separtor in line:
break
curr_line = line.lstrip()
curr_line = pattern.split(curr_line)
if not is_line_empty(curr_line):
curr_line = [x for x in curr_line if x]
keys_lines_list = merge_two_lists(keys_lines_list, curr_line)
return keys_lines_list
def merge_two_lists(list1, list2):
"""Function for merging to lists to one list
:param list1: first list item
:param list2: second list item
:return: merged list
"""
max_len = list2.__len__() if [list1.__len__() < list2.__len__()] else list1.__len__()
rev_new_list = []
for index in range(max_len):
index1 = list1.__len__() - index - 1
index2 = list2.__len__() - index - 1
if index1 < 0:
rev_new_list.append(list2[max_len - index - 1])
elif index2 < 0:
rev_new_list.append(list1[max_len - index - 1])
else:
rev_new_list.append(list1[list1.__len__() - index - 1] + " " + list2[list2.__len__() - index - 1])
return rev_new_list[::-1]
def is_line_empty(line):
"""Function for checking if line is empty
:param line: given line
:return: True if line is empty otherwise False
"""
if not line or (len(line) == 0) or (len(line) == 1 and line[0] == ""):
return True
return False
def auto_generate_dataframe_列(输出,raw_separator=“--”,col_separator=r'\s{2,}):
“”“自动生成数据帧列。
:param output:用于从生成列的输出
:param raw_separator:键原始分隔符符号
:param col_separator:columns原始分隔符符号(默认分隔符为两倍或更多空格)
:return:如果成功,则列出生成的键,否则无
"""
如果输出为无:
一无所获
键\u行\u列表=[]
模式=重新编译(列分隔符)
对于输出中的行。拆分行():
如果原始分离器在一条直线上:
打破
curr_line=line.lstrip()
当前线=图案分割(当前线)
如果不是,则为空(当前行):
当前行=[x表示当前行中的x,如果x]
关键字行列表=合并两个列表(关键字行列表,当前行)
返回键\u行\u列表
def合并两个列表(列表1、列表2):
“”“用于将列表合并到一个列表的函数
:param list1:第一个列表项
:param list2:第二个列表项
:return:合并列表
"""
max_len=list2.\u len_uuuuu()如果[list1.\u len_uuuuu()
好的
我创建了自己的函数来生成它,并且它可以工作
def auto_generate_dataframe_columns(output, raw_separtor="---", col_seperator=r'\s{2,}'):
"""Automatically generate dataframe columns.
:param output: output for generating columns from
:param raw_separtor: keys raw seperator symbol
:param col_seperator: columns raw separator symbol (default separator is double or more spaces)
:return: list with generated keys in case of success otherwise None
"""
if output is None:
return None
keys_lines_list = []
pattern = re.compile(col_seperator)
for line in output.splitlines():
if raw_separtor in line:
break
curr_line = line.lstrip()
curr_line = pattern.split(curr_line)
if not is_line_empty(curr_line):
curr_line = [x for x in curr_line if x]
keys_lines_list = merge_two_lists(keys_lines_list, curr_line)
return keys_lines_list
def merge_two_lists(list1, list2):
"""Function for merging to lists to one list
:param list1: first list item
:param list2: second list item
:return: merged list
"""
max_len = list2.__len__() if [list1.__len__() < list2.__len__()] else list1.__len__()
rev_new_list = []
for index in range(max_len):
index1 = list1.__len__() - index - 1
index2 = list2.__len__() - index - 1
if index1 < 0:
rev_new_list.append(list2[max_len - index - 1])
elif index2 < 0:
rev_new_list.append(list1[max_len - index - 1])
else:
rev_new_list.append(list1[list1.__len__() - index - 1] + " " + list2[list2.__len__() - index - 1])
return rev_new_list[::-1]
def is_line_empty(line):
"""Function for checking if line is empty
:param line: given line
:return: True if line is empty otherwise False
"""
if not line or (len(line) == 0) or (len(line) == 1 and line[0] == ""):
return True
return False
def auto_generate_dataframe_列(输出,raw_separator=“--”,col_separator=r'\s{2,}):
“”“自动生成数据帧列。
:param output:用于从生成列的输出
:param raw_separator:键原始分隔符符号
:param col_separator:columns原始分隔符符号(默认分隔符为两倍或更多空格)
:return:如果成功,则列出生成的键,否则无
"""
如果输出为无:
一无所获
键\u行\u列表=[]
模式=重新编译(列分隔符)
对于输出中的行。拆分行():
如果原始分离器在一条直线上:
打破
curr_line=line.lstrip()
当前线=图案分割(当前线)
如果不是,则为空(当前行):
当前行=[x表示当前行中的x,如果x]
关键字行列表=合并两个列表(关键字行列表,当前行)
返回键\u行\u列表
def合并两个列表(列表1、列表2):
“”“用于将列表合并到一个列表的函数
:param list1:第一个列表项
:param list2:第二个列表项
:return:合并列表
"""
max_len=list2.\u len_uuuuu()如果[list1.\u len_uuuuu()
您是否尝试过pandasread\u csv
方法?您应该给出示例代码,说明输入可能是什么样子。此外,如果你已经尝试过任何东西,你也应该把它放好。