Python 如何访问数据帧中存储为列的集合计数器元素，以便在CountVectorizer中使用_Python_Pandas_Collections

Python 如何访问数据帧中存储为列的集合计数器元素，以便在CountVectorizer中使用

python pandas collections

Python 如何访问数据帧中存储为列的集合计数器元素，以便在CountVectorizer中使用,python,pandas,collections,Python,Pandas,Collections,dataframe中的一列采用以下格式 Row 1 : Counter({'First': 3, 'record': 2}) Row 2 : Counter({'Second': 2, 'record': 1}). 我想创建一个具有以下值的新列： Row 1 : First First First record record Row 2 : Second Second record 使用iter值为计数器的应用，并用空格连接-首先重复值，然后一起使用： import ast #con

dataframe中的一列采用以下格式

Row 1 : 
Counter({'First': 3, 'record': 2})
Row 2 : 
Counter({'Second': 2, 'record': 1}).

我想创建一个具有以下值的新列：

Row 1 :
First First First record record
Row 2 : 
Second Second record

使用iter值为

计数器的应用
，并用空格连接-首先重复值，然后一起使用：
import ast

#convert values to dictionaries
df['col'] = df['col'].str.extract('\((.+)\)', expand=False).apply(ast.literal_eval)

df['new'] = df['col'].apply(lambda x: ' '.join(' '.join([k] * v) for k, v in x.items()))
print (df)
                          col                              new
0   {'First': 3, 'record': 2}  First First First record record
1  {'Second': 2, 'record': 1}             Second Second record

或列表理解：
df['new'] = [' '.join(' '.join([k] * v) for k, v in x.items()) for x in df['col']]

我可以通过下面的代码自己解决这个问题。它与正则表达式有很大关系
def transform_word_count(text):
    words = re.findall(r'\'(.+?)\'',text)
    n = re.findall(r"[0-9]",text)
    result = []
    for i in range(len(words)):
        for j in range(int(n[i])):
            result.append(words[i])
    return result

df['new'] = df.apply(lambda row: transform_word_count(row['old']), axis=1)

我编辑了你的问题和标签-它与regex
imo无关。这些数据来自哪里？也许更好的办法是解决河上的问题。我可以通过下面的代码自己解决这个问题。它与正则表达式有很大关系。def transform\u word\u count（text）：words=re.findall（r'\'（.+？）\''，text）n=re.findall（r“[0-9]”，text）result=[]对于范围内的i（len（words））：对于范围内的j（int（n[i]）：result.append（words[i]）返回结果df['new]=df.apply（lambda行：transform\u word\u count（行['old'）），axis=1）感谢您的帮助，但我收到以下错误：“AttributeError:'str'对象没有属性'items'。我想这是因为虽然我的列是{}格式的，但它不是一个列表，而是一个字符串。如果您有任何更新，请告诉我。@Vincent-使用df['col']=df['col'].str.extract（'\（.+）\'），expand=False）。apply（ast.literal\u eval）
-答案已编辑。