Python 3.x 在python中将多索引熊猫的字符串表示形式转换为多索引熊猫_Python 3.x_Pandas_Multi Index

Python 3.x 在python中将多索引熊猫的字符串表示形式转换为多索引熊猫

python-3.x pandas

Python 3.x 在python中将多索引熊猫的字符串表示形式转换为多索引熊猫,python-3.x,pandas,multi-index,Python 3.x,Pandas,Multi Index,下面我有一个多索引的字符串表示 iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']] df = pd.MultiIndex.from_product(iterables, names=['first', 'second']) df = str(df) 我想将表示为df的字符串转换回熊猫多索引类。熊猫中是否有任何直接功能可用于相同用途例外输出： print(df) MultiIndex(levels=[['bar', 'baz',

下面我有一个多索引的字符串表示

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
df = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = str(df)

我想将表示为df的字符串转换回熊猫多索引类。熊猫中是否有任何直接功能可用于相同用途
例外输出：

print(df) MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])

提前感谢。
多索引的字符串表示几乎是可执行代码，因此您可以使用
eval
对其进行评估，如下所示：

eval(df, {}, {'MultiIndex': pd.MultiIndex}) # MultiIndex(levels=[[u'bar', u'baz', u'foo', u'qux'], [u'one', u'two']], # labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], # names=[u'first', u'second'])
请注意，您可以控制传递给
eval
的字符串，因为它可能会导致计算机崩溃和/或运行任意代码（请参阅和）
或者，这里有一种安全、简单但有点脆弱的方法：

import ast # convert df into a literal string defining a dictionary dfd = ( "{" + df[11:-1] + "}" .replace("levels=", "'levels':") .replace("labels=", "'labels':") .replace("names=", "'names':") ) # convert it safely into an actual dictionary args = ast.literal_eval(dfd) # use the dictionary as arguments to pd.MultiIndex pd.MultiIndex(**args)
有了这段代码，任意字符串都无法使您的计算机崩溃，因为
ast.literal\u eval（）
不允许任何运算符，只允许文本表达式
这是一个安全的版本，不需要预先指定参数名称，但它更复杂：

import ast, tokenize from cStringIO import StringIO tokens = [ # make a list of mutable tokens list(t) for t in tokenize.generate_tokens(StringIO('{' + df[11:-1] + '}').readline) ] for t, next_t in zip(tokens[:-1], tokens[1:]): # convert `identifier=` to `'identifier':` if t[0] == 1 and next_t[0] == 51 and next_t[1] == '=': t[0] = 3 # switch type to quoted string t[1] = "'" + t[1] + "'" # put quotes around identifier next_t[1] = ':' # convert '=' to ':' args = ast.literal_eval(tokenize.untokenize(tokens)) pd.MultiIndex(**args)

请注意，如果
df
格式不正确或包含“identifier=…”作为较低级别的代码（不在字符串中），则此代码将引发异常。但我不认为这会发生在
str（多索引）
上。如果这是一个问题，您可以为原始
df
字符串生成
ast
树，然后提取参数并以编程方式将其转换为
dict
（
{x:y}
，而不是
dict（x=y）
）的文字定义，然后使用
ast.literal\u eval
对其进行评估。
您可以显示您的预期输出吗？多索引（级别=[[bar]、[baz]、[foo]、[qux]、[one]、[two']、labels=[[0,0,1,1,2,2,3,3]、[0,1,0,1]、名称=[[first]、[second]）