Python 重新排列行值_Python_Pandas_Csv

Python 重新排列行值

python pandas csv

Python 重新排列行值,python,pandas,csv,Python,Pandas,Csv,我有一个csv文件 1 , name , 1012B-Amazon , 2044C-Flipcart , Bosh27-Walmart 2 , name , Kelvi20-Flipcart, LG-Walmart 3, name , Kenstar-Walmart, Sony-Amazon , Kenstar-Flipcart 4, name , LG18-Walmart, Bravia-Amazon 我需要的行被重新排列的网站，即后的部分- 1, name , 1012B-Ama

我有一个csv文件

1 , name , 1012B-Amazon , 2044C-Flipcart , Bosh27-Walmart
2 , name , Kelvi20-Flipcart, LG-Walmart   
3,  name , Kenstar-Walmart, Sony-Amazon , Kenstar-Flipcart
4, name ,  LG18-Walmart, Bravia-Amazon

我需要的行被重新排列的网站，即后的部分

1, name , 1012B-Amazon , 2044C-Flipcart , Bosh27-Walmart
2, name ,              , Kelv20-Flipcart, LG-Walmart
3, name , Sony-Amazon,  Kenstar-Flipcart ,Kenstar-Walmart
4, name , Bravia-Amazon,                 ,LG18-Walmart

有可能使用熊猫吗？找到sting的存在并重新排列它，遍历所有行并对下一个字符串重复此操作？我查阅了

Series.str.contains

和

str.extract

的文档，但找不到解决方案

使用

排序

和

键

df.iloc[:,1:].apply(lambda x : sorted(x,key=lambda y: (y=='',y)),1)
     2    3    4    5
1  ABC  DEF  GHI  JKL
2  ABC  DEF  GHI     
3  ABC  DEF  GHI  JKL
#df.iloc[:,1:]=df.iloc[:,1:].apply(lambda x : sorted(x,key=lambda y: (y=='',y)),1)

既然你提到了

reindex

，我想

get\u dummies

就行了

s=pd.get_dummies(df.iloc[:,1:],prefix ='',prefix_sep='')
s=s.drop('',1)
df.iloc[:,1:]=s.mul(s.columns).values
df
      1    2    3    4    5
1  name  ABC  DEF  GHI  JKL
2  name  ABC  DEF  GHI     
3  name  ABC  DEF  GHI  JKL

假设空值为

np.nan

：

# Fill in the empty values with some string to allow sorting
df.fillna('NaN', inplace=True)

# Flatten the dataframe, do the sorting and reshape back to a dataframe
pd.DataFrame(list(map(sorted, df.values)))

更新

鉴于问题的更新和样本数据如下

df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
                   'b': ['1012B-Amazon', 'Kelvi20-Flipcart', 'Kenstar-Walmart', 'LG18-Walmart'],
                   'c': ['2044C-Flipcart', 'LG-Walmart', 'Sony-Amazon', 'Bravia-Amazon'],
                   'd': ['Bosh27-Walmart', np.nan, 'Kenstar-Flipcart', np.nan]})

一个可能的解决办法是

def foo(df, retailer):

    # Find cells that contain the name of the retailer
    mask = df.where(df.apply(lambda x: x.str.contains(retailer)), '')

    # Squash the resulting mask into a series
    col = mask.max(skipna=True, axis=1)

    # Optional: trim the name of the retailer
    col = col.str.replace(f'-{retailer}', '')
    return col

导致

    name  Amazon  Walmart Flipcart
0  name1   1012B   Bosh27    2044C
1  name2               LG  Kelvi20
2  name3    Sony  Kenstar  Kenstar
3  name4  Bravia     LG18

问题更新后编辑：

这是abc csv：

1,name,ABC,GHI,DEF,JKL
2,name,GHI,DEF,ABC,
3,name,JKL,GHI,ABC,DEF

这是公司csv（有必要仔细观察逗号）：

这是密码

import pandas as pd
import numpy as np


#These solution assume that each value that is not empty is not repeated
#within each row. If that is not the case for your data, it would be possible
#to do some transformations that the non empty values are unique for each row.    

#"get_company" returns the company if the value is non-empty and an
#empty value if the value was empty to begin with:
def get_company(company_item):
    if pd.isnull(company_item):
        return np.nan
    else:
        company=company_item.split('-')[-1]
        return company

#Using the "define_sort_order" function, one can retrieve a template to later
#sort all rows in the sort_abc_rows function. The template is derived from all
#values, aside from empty values, within the matrix when "by_largest_row" = False.
#One could also choose the single largest row to serve as the
#template for all other rows to follow. Both options work similarly when
#all rows are subsets of the largest row i.e. Every element in every
#other row (subset) can be found in the largest row (or set)

#The difference relates to, when the items contain unique elements,
#Whether one wants to create a table with all sorted elements serving
#as the columns, or whether one wants to simply exclude elements
#that are not in the largest row when at least one non-subset row does not exist 

#Rather than only having the application of returning the original data rows,
#one can get back a novel template with different values from that of the
#original dataset if one uses a function to operate on the template

def define_sort_order(data,by_largest_row = False,value_filtering_function = None):
    if not by_largest_row: 
        if value_filtering_function:
            data = data.applymap(value_filtering_function)
        #data.values returns a numpy array                 
        #with rows and columns. .flatten()
        #puts all elements in a 1 dim array
        #set gets all unique values in the array
        filtered_values = list(set((data.values.flatten())))
        filtered_values = [data_value for data_value in filtered_values if not_empty(data_value)]
        #sorted returns a list, even with np.arrays as inputs

        model_row = sorted(filtered_values)
    else:
        if value_filtering_function:
            data = data.applymap(value_filtering_function)
        row_lengths = data.apply(lambda data_row: data_row.notnull().sum(),axis = 1)
        #locates the numerical index for the row with the most non-empty elements:
        model_row_idx = row_lengths.idxmax()
    #sort and filter the row with the most values:
        filtered_values = list(set(data.iloc[model_row_idx]))

        model_row = [data_value for data_value in sorted(filtered_values) if not_empty(data_value)] 

    return model_row

#"not_empty" is used in the above function in order to filter list models that
#they no empty elements remain
def not_empty(value):
    return pd.notnull(value) and value not in ['','  ',None]

#Sorts all element in each _row within their corresponding position within the model row.
#elements in the model row that are missing from the current data_row are replaced with np.nan

def reorder_data_rows(data_row,model_row,check_by_function=None):
    #Here, we just apply the same function that we used to find the sorting order that
    #we computed when we originally #when we were actually finding the ordering of the model_row.
    #We actually transform the values of the data row temporarily to determine whether the
    #transformed value is in the model row. If so, we determine where, and order #the function
    #below in such a way.
    if check_by_function: 
        sorted_data_row = [np.nan]*len(model_row) #creating an empty vector that is the
                          #same length as the template, or model_row

        data_row = [value for value in data_row.values if not_empty(value)]

        for value in data_row:
            value_lookup = check_by_function(value)
            if value_lookup in model_row:
                idx = model_row.index(value_lookup)
                #placing company items in their respective row positions as indicated by
        #the model_row                #
                sorted_data_row[idx] = value    
    else:
        sorted_data_row = [value if value in data_row.values else np.nan for value in model_row]
    return pd.Series(sorted_data_row)

##################### ABC ######################
#Reading the data:
#the file will automatically include the header as the first row if this the  
#header = None option is not included. Note: "name" and the 1,2,3 columns are not in the index.
abc = pd.read_csv("abc.csv",header = None,index_col = None)
# Returns a sorted, non-empty list. IF you hard code the order you want,
# then you can simply put the hard coded order in the second input in model_row and avoid
# all functions aside from sort_abc_rows.
model_row = define_sort_order(abc.iloc[:,2:],False)

#applying the "define_sort_order" function we created earlier to each row before saving back into
#the original dataframe
#lambda allows us to create our own function without giving it a name.
#it is useful in this circumstance in order to use two inputs for sort_abc_rows


abc.iloc[:,2:] = abc.iloc[:,2:].apply(lambda abc_row: reorder_data_rows(abc_row,model_row),axis = 1).values

#Saving to a new csv that won't include the pandas created indices (0,1,2)
#or columns names (0,1,2,3,4):

abc.to_csv("sorted_abc.csv",header = False,index = False)
################################################


################## COMPANY #####################
company = pd.read_csv("company.csv",header=None,index_col=None)

model_row = define_sort_order(company.iloc[:,2:],by_largest_row = False,value_filtering_function=get_company)
#the only thing that changes here is that we tell the sort function what specific
#criteria to use to reorder each row by. We're using the result from the
#get_company function to do so. The custom function get_company, takes an input
#such as Kenstar-Walmart, and outputs Walmart (what's after the "-").
#we would then sort by the resulting list of companies. 

#Because we used the define_sort_order function to retrieve companies rather than company items in order,
#We need to use the same function to reorder each element in the DataFrame
company.iloc[:,2:] = company.iloc[:,2:].apply(lambda companies_row: reorder_data_rows(companies_row,model_row,check_by_function=get_company),axis=1).values
company.to_csv("sorted_company.csv",header = False,index = False)
#################################################

以下是排序的abc.csv的第一个结果：

1  name  ABC  DEF  GHI  JKL
2  name  ABC  DEF  GHI  NaN
3  name  ABC  DEF  GHI  JKL

将代码修改为所查询的后续表单后，以下是运行剧本

我希望有帮助

你已经有数据帧了吗？没有，我添加了csv、pandas和numpy，并将其读取到DF可能的副本：@YOLO抱歉，我不希望它们按列重新排列，但按行数据重新排列。在你的第二行中，你只有5列，是格式还是最后一列是空字符串？我得到了错误

TypeError:(“@AnoopD您对df=df.fillna（“”）没有任何操作（“”）df1=df.iloc[：，1:：.apply（lambda x:sorted（x，key=lambda y:（y=“”，y）），1）

`df1`

0[ABC，DEF，]

dtype:object

值错误：无法将输入数组从形状（2,5）广播到形状（2,3）

。我使用了第一个df。@AnoopD这对我来说是可行的。也许可以尝试将您的数据帧准备成熊猫并向我们显示数据帧？很抱歉，这样不行，给出的数据只是虚拟数据，没有任何可排序的顺序。我需要的是使用

regex

我必须找到每行中出现的数据并重新排序。如果不是按字母顺序排序的话，你想在那时重新排列你的数据的规则是什么？想想熊猫

系列。str.contains

会起作用，但我不确定……你能说得更具体些吗？找到正则表达式中出现的内容吗？如何准确地重新排序行？

1，W/M，1012B亚马逊，2044C Flipcart，Bosh27沃尔玛

2，R/F、Kelvi20 Flipcart、LG沃尔玛

3、E/O、健星沃尔玛、索尼亚马逊、健星Flipcart

我需要将这些重新订购为

1、W/M、1012B亚马逊、2044C Flipcart、Bosh27沃尔玛

2，R/F，，Kelvi20 Flipcart，LG沃尔玛

3，e/O，索尼亚马逊，Kenstar Flipcart

，Kenstar沃尔玛`感谢您的解决方案，确切的事情是可行的，但是如果您将第二行更改为

2，name，DEF，GHI，JKL

它将不起作用。它现在已被修改。希望这对您有用。

1,name,1012B-Amazon,2044C-Flipcart,Bosh27-Walmart
2,name,Kelvi20-Flipcart,LG-Walmart,
3,name,Kenstar-Walmart,Sony-Amazon,Kenstar-Flipcart
4,name,LG18-Walmart,Bravia-Amazon,

import pandas as pd
import numpy as np


#These solution assume that each value that is not empty is not repeated
#within each row. If that is not the case for your data, it would be possible
#to do some transformations that the non empty values are unique for each row.    

#"get_company" returns the company if the value is non-empty and an
#empty value if the value was empty to begin with:
def get_company(company_item):
    if pd.isnull(company_item):
        return np.nan
    else:
        company=company_item.split('-')[-1]
        return company

#Using the "define_sort_order" function, one can retrieve a template to later
#sort all rows in the sort_abc_rows function. The template is derived from all
#values, aside from empty values, within the matrix when "by_largest_row" = False.
#One could also choose the single largest row to serve as the
#template for all other rows to follow. Both options work similarly when
#all rows are subsets of the largest row i.e. Every element in every
#other row (subset) can be found in the largest row (or set)

#The difference relates to, when the items contain unique elements,
#Whether one wants to create a table with all sorted elements serving
#as the columns, or whether one wants to simply exclude elements
#that are not in the largest row when at least one non-subset row does not exist 

#Rather than only having the application of returning the original data rows,
#one can get back a novel template with different values from that of the
#original dataset if one uses a function to operate on the template

def define_sort_order(data,by_largest_row = False,value_filtering_function = None):
    if not by_largest_row: 
        if value_filtering_function:
            data = data.applymap(value_filtering_function)
        #data.values returns a numpy array                 
        #with rows and columns. .flatten()
        #puts all elements in a 1 dim array
        #set gets all unique values in the array
        filtered_values = list(set((data.values.flatten())))
        filtered_values = [data_value for data_value in filtered_values if not_empty(data_value)]
        #sorted returns a list, even with np.arrays as inputs

        model_row = sorted(filtered_values)
    else:
        if value_filtering_function:
            data = data.applymap(value_filtering_function)
        row_lengths = data.apply(lambda data_row: data_row.notnull().sum(),axis = 1)
        #locates the numerical index for the row with the most non-empty elements:
        model_row_idx = row_lengths.idxmax()
    #sort and filter the row with the most values:
        filtered_values = list(set(data.iloc[model_row_idx]))

        model_row = [data_value for data_value in sorted(filtered_values) if not_empty(data_value)] 

    return model_row

#"not_empty" is used in the above function in order to filter list models that
#they no empty elements remain
def not_empty(value):
    return pd.notnull(value) and value not in ['','  ',None]

#Sorts all element in each _row within their corresponding position within the model row.
#elements in the model row that are missing from the current data_row are replaced with np.nan

def reorder_data_rows(data_row,model_row,check_by_function=None):
    #Here, we just apply the same function that we used to find the sorting order that
    #we computed when we originally #when we were actually finding the ordering of the model_row.
    #We actually transform the values of the data row temporarily to determine whether the
    #transformed value is in the model row. If so, we determine where, and order #the function
    #below in such a way.
    if check_by_function: 
        sorted_data_row = [np.nan]*len(model_row) #creating an empty vector that is the
                          #same length as the template, or model_row

        data_row = [value for value in data_row.values if not_empty(value)]

        for value in data_row:
            value_lookup = check_by_function(value)
            if value_lookup in model_row:
                idx = model_row.index(value_lookup)
                #placing company items in their respective row positions as indicated by
        #the model_row                #
                sorted_data_row[idx] = value    
    else:
        sorted_data_row = [value if value in data_row.values else np.nan for value in model_row]
    return pd.Series(sorted_data_row)

##################### ABC ######################
#Reading the data:
#the file will automatically include the header as the first row if this the  
#header = None option is not included. Note: "name" and the 1,2,3 columns are not in the index.
abc = pd.read_csv("abc.csv",header = None,index_col = None)
# Returns a sorted, non-empty list. IF you hard code the order you want,
# then you can simply put the hard coded order in the second input in model_row and avoid
# all functions aside from sort_abc_rows.
model_row = define_sort_order(abc.iloc[:,2:],False)

#applying the "define_sort_order" function we created earlier to each row before saving back into
#the original dataframe
#lambda allows us to create our own function without giving it a name.
#it is useful in this circumstance in order to use two inputs for sort_abc_rows


abc.iloc[:,2:] = abc.iloc[:,2:].apply(lambda abc_row: reorder_data_rows(abc_row,model_row),axis = 1).values

#Saving to a new csv that won't include the pandas created indices (0,1,2)
#or columns names (0,1,2,3,4):

abc.to_csv("sorted_abc.csv",header = False,index = False)
################################################


################## COMPANY #####################
company = pd.read_csv("company.csv",header=None,index_col=None)

model_row = define_sort_order(company.iloc[:,2:],by_largest_row = False,value_filtering_function=get_company)
#the only thing that changes here is that we tell the sort function what specific
#criteria to use to reorder each row by. We're using the result from the
#get_company function to do so. The custom function get_company, takes an input
#such as Kenstar-Walmart, and outputs Walmart (what's after the "-").
#we would then sort by the resulting list of companies. 

#Because we used the define_sort_order function to retrieve companies rather than company items in order,
#We need to use the same function to reorder each element in the DataFrame
company.iloc[:,2:] = company.iloc[:,2:].apply(lambda companies_row: reorder_data_rows(companies_row,model_row,check_by_function=get_company),axis=1).values
company.to_csv("sorted_company.csv",header = False,index = False)
#################################################

1  name  ABC  DEF  GHI  JKL
2  name  ABC  DEF  GHI  NaN
3  name  ABC  DEF  GHI  JKL

1  name    1012B-Amazon    2044C-Flipcart   Bosh27-Walmart
2  name             NaN  Kelvi20-Flipcart       LG-Walmart
3  name     Sony-Amazon  Kenstar-Flipcart  Kenstar-Walmart
4  name   Bravia-Amazon               NaN     LG18-Walmart