Python 将新的dataFrame列添加到pandas中的同一dataFrame_Python_Pandas_Dataframe_Warnings

Python 将新的dataFrame列添加到pandas中的同一dataFrame

python pandas dataframe

Python 将新的dataFrame列添加到pandas中的同一dataFrame,python,pandas,dataframe,warnings,Python,Pandas,Dataframe,Warnings,问题：获取带有复制警告的设置试图在数据帧切片的副本上设置值。尝试改为使用.loc[row\u indexer，col\u indexer]=值目标：将列数据分隔为单独的列，所有列都位于同一数据帧中输入：具有两列的数据帧。第一列是电子邮件地址，第二列包含以分号分隔的日期列表代码： for dt in lunch_dates: roulette_data[dt] = roulette_data['date'].str.contains(dt).map(bool_conversi

问题：获取带有复制警告的设置

试图在数据帧切片的副本上设置值。尝试改为使用.loc[row\u indexer，col\u indexer]=值

目标：将列数据分隔为单独的列，所有列都位于同一数据帧中

输入：具有两列的数据帧。第一列是电子邮件地址，第二列包含以分号分隔的日期列表

代码：

for dt in lunch_dates:
    roulette_data[dt] = roulette_data['date'].str.contains(dt).map(bool_conversion)

我希望这段代码做什么（它确实做了）：为“起始日期”列中找到的每个日期（dt）添加一个新列

问题：在这种情况下，如何使用iloc来确保我没有处理内存中数据帧的可能副本？

您的示例如果没有数据进行测试，我无法测试它，但以下内容应该可以工作（将您的“email\u column\u name”替换为email column的名称）：

以下是一个玩具示例：我们首先将索引设置为

col1

，然后选择

col2

，这样我们就可以使用它的

.str.split

方法将行拆分为单个单词

df.set_index('col1')['col2'].str.split(expand=True)
#            0     1     2     3       4
#col1                                   
#record1  this    is  good  text    None
#record2   but  this    is  even  better

然后我们使用

stack

更改形状，并使用

reset\u index

删除不必要的索引级别

df.set_index('col1')['col2'].str.split(expand=True)\
            .stack().reset_index(level=1, drop=True) 
#col1
#record1      this
#record1        is
#record1      good
#record1      text
#record2       but
#record2      this
#record2        is
#record2      even
#record2    better
#dtype: object

我们用pd.get_dummies（）来包装整个表达式

最终结果最后，我们将

reset_index

（即

col1

或电子邮件列），

groupby

重置

col1

并对其求和

pd.get_dummies(
               df.set_index('col1')['col2']\
               .str.split(expand=True)\
               .stack().reset_index(level=1, drop=True)
              )\
              .reset_index().groupby('col1').sum()
#         better  but  even  good  is  text  this
#col1                                            
#record1       0    0     0     1   1     1     1
#record2       1    1     1     0   1     0     1

你能分享一些你的数据和预期的结果吗？

df.set_index('col1')['col2'].str.split(expand=True)\
            .stack().reset_index(level=1, drop=True) 
#col1
#record1      this
#record1        is
#record1      good
#record1      text
#record2       but
#record2      this
#record2        is
#record2      even
#record2    better
#dtype: object

pd.get_dummies(df.set_index('col1')['col2'].str.split(expand=True).stack().reset_index(level=1, drop=True))

#         better  but  even  good  is  text  this
#col1                                            
#record1       0    0     0     0   0     0     1
#record1       0    0     0     0   1     0     0
#record1       0    0     0     1   0     0     0
#record1       0    0     0     0   0     1     0
#record2       0    1     0     0   0     0     0
#record2       0    0     0     0   0     0     1
#record2       0    0     0     0   1     0     0
#record2       0    0     1     0   0     0     0
#record2       1    0     0     0   0     0     0

pd.get_dummies(
               df.set_index('col1')['col2']\
               .str.split(expand=True)\
               .stack().reset_index(level=1, drop=True)
              )\
              .reset_index().groupby('col1').sum()
#         better  but  even  good  is  text  this
#col1                                            
#record1       0    0     0     1   1     1     1
#record2       1    1     1     0   1     0     1