Python 使用文件计算函数_Python_Function_Numpy_For Loop

Python 使用文件计算函数

python function numpy for-loop

Python 使用文件计算函数,python,function,numpy,for-loop,Python,Function,Numpy,For Loop,我已经为使用以下两个数据集的某些计算编写了一个python函数。我想使用数据_1的第1行、第2行、第3行、第4行、第5行为数据_2中的每个数据计算z。但是，由于我是python新手，所以我尝试编写了一个python函数，但在编写过程中失败。请帮助。谢谢 data_1 data_2 file a b c d x fi

我已经为使用以下两个数据集的某些计算编写了一个python函数。我想使用数据_1的第1行、第2行、第3行、第4行、第5行为数据_2中的每个数据计算z。但是，由于我是python新手，所以我尝试编写了一个python函数，但在编写过程中失败。请帮助。谢谢

       data_1                                      data_2        
file    a    b    c    d                             x
file1  0.5  0.6  0.8  0.3                           0.5
file1  0.2  0.2  0.4  0.1                           0.8
file1  0.1  0.4  0.5  0.2                           0.9

我的代码如下：

import numpy as np
file1=np.loadtxt('data_1',skiprows=1,usecols=(1,2,3))
file2=np.loadtxt('data_2',skiprows=1,usecols=(0))

def calculation(a,b,c,x):
    z=(a+b+c)*x
    return z

for value in file2:
    print(value)
    calculation

我的预期输出应该是

   data_3                                            
file    a    b    c    d       z                          
file1  0.5  0.6  0.8  0.3      -
file1  0.5  0.6  0.8  0.3      -                     
file1  0.5  0.6  0.8  0.3      -                     
file1  0.2  0.2  0.4  0.1      -
file1  0.2  0.2  0.4  0.1      -                     
file1  0.2  0.2  0.4  0.1      -                        
file1  0.1  0.4  0.5  0.2      -
file1  0.1  0.4  0.5  0.2      -                       
file1  0.1  0.4  0.5  0.2      -

Python是一种动态语言，

numpy

倾向于覆盖普通操作符，将操作应用于整个数据集合。通常，如果您有一个for循环，您就没有利用它

numpy

数组只能保存一种数据类型，但列0中有一个字符串

pandas

包装

numpy

，使多种数据类型更易于处理。因此，我转而阅读

pandas.DataFrame

对象，而不是数组

看起来您需要

file2[“x”]

的笛卡尔积和

file1

中的行。一种方法是在具有匹配值的两个数据帧中创建一个虚拟列，然后合并。对

a+b+c

使用

sum

方法，然后与

相乘，得到结果

import pandas as pd

# read space separated tables
file1=pd.read_table('data_1', sep=r"\s+")
file2=pd.read_table('data_2', sep=r"\s+")

# we want (a+b+c)*x, for each value in file2["x"]. Do the sum, then
# use `merge` with a temporary key to create the cartesian product 
# with x. For each `x`, merge will create a row for each matching
# key and since all keys match, we've got a cartesian product.
# Finally, multiply.
file1["_tmpsums"] = file1[["a", "b", "c"]].sum(axis=1)
file1["_tmpmergekey"] = file2["_tmpmergekey"] = 1
file1 = pd.merge(file1, file2, on="_tmpmergekey")
file1["z"] = file1["_tmpsums"] * file1["x"]
file1 = file1.drop(["_tmpsums", "_tmpmergekey", "x"], axis=1)

print("   data_3")
print(file1.to_string(col_space=6, index=False, justify="center"))

结果

   data_3
 file     a      b      c      d      z  
 file1   0.5    0.6    0.8    0.3   0.95 
 file1   0.5    0.6    0.8    0.3   1.52 
 file1   0.5    0.6    0.8    0.3   1.71 
 file1   0.2    0.2    0.4    0.1   0.40 
 file1   0.2    0.2    0.4    0.1   0.64 
 file1   0.2    0.2    0.4    0.1   0.72 
 file1   0.1    0.4    0.5    0.2   0.50 
 file1   0.1    0.4    0.5    0.2   0.80 
 file1   0.1    0.4    0.5    0.2   0.90

使用熊猫如下

import pandas as pd

# Load Data
data_1 = pd.read_csv('data_1.txt', delimiter = r"\s+")
data_2 = pd.read_csv('data_2.txt', delimiter = r"\s+")

# Compute the cartesian product of data_1 with data_2
# since for each row in data_1, we need sequence of rows in data_2
# We do this using DataFrame merge by injecting a key that is repeated for each row
# i.e. 'merge_key'
data_1['merge_key'] = pd.Series([1]*len(data_1))
data_2['merge_key'] = pd.Series([1]*len(data_2))
df = pd.merge(data_1, data_2, on = 'merge_key')
# Drop merge key from result
df.drop('merge_key', axis = 'columns', inplace = True)

# DataFrame df now has columns File, a, b, c, d, x
# We can apply function calulation to each row using apply
# and specifying the columns to send to calculation
df['z'] = df.apply(lambda row: calculation(row['a'], row['b'], row['c'], row['x']), axis = 'columns')

# Drop x column
df.drop('x', axis = 'columns', inplace = True)

# Write to CSV file
df.to_csv('data_3.txt', index=False, sep = " ")

输出

数据帧测向

    file    a   b   c   d   z
0   file1   0.5 0.6 0.8 0.3 0.95
1   file1   0.5 0.6 0.8 0.3 1.52
2   file1   0.5 0.6 0.8 0.3 1.71
3   file1   0.2 0.2 0.4 0.1 0.40
4   file1   0.2 0.2 0.4 0.1 0.64
5   file1   0.2 0.2 0.4 0.1 0.72
6   file1   0.1 0.4 0.5 0.2 0.50
7   file1   0.1 0.4 0.5 0.2 0.80
8   file1   0.1 0.4 0.5 0.2 0.90

CSV文件数据_3.txt

file a b c d z
file1 0.5 0.6 0.8 0.3 0.9500000000000001
file1 0.5 0.6 0.8 0.3 1.5200000000000002
file1 0.5 0.6 0.8 0.3 1.7100000000000002
file1 0.2 0.2 0.4 0.1 0.4
file1 0.2 0.2 0.4 0.1 0.6400000000000001
file1 0.2 0.2 0.4 0.1 0.7200000000000001
file1 0.1 0.4 0.5 0.2 0.5
file1 0.1 0.4 0.5 0.2 0.8
file1 0.1 0.4 0.5 0.2 0.9

基本Python

import numpy as np
file1=np.loadtxt('data_1.txt',skiprows=1,usecols=(1,2,3, 4))
file2=np.loadtxt('data_2.txt',skiprows=1,usecols=(0))

with open('data_3.txt', 'w') as f:
    # Write header
    f.write("file a b c d z\n")
    
    # Double loop to through the values of file1 and file2
    for val1 in file1:
        for val2 in file2:
            # Only use first 3 values (val1[:3] to only use first 3 value so ignore d)
            z = calculation(*val1[:3], val2)  # *val[:3] is unpacking values to go into calculation
            # Write result
            # map(str, val1) - converts values to string
            # str(z) converts z to string
            #' '.join([*map(str, val1), str(z)] - creates a space separated string
            f.write(' '.join([*map(str, val1), str(z)]) + "\n")

相同输出

# Get data from first file
with open('data_1.txt', 'r') as f:
    # first file header
    header1 = f.readline()
    
    # Let's get the lines of data
    data_1 = []
    for line in f:
        new_data = line.rstrip().split()  # strip '\n' and split on parens
        for i in range(1, len(new_data)):
            new_data[i] = float(new_data[i])  # convert columns after file to float
        data_1.append(new_data)
  
# Get data from second file
with open('data_2.txt', 'r') as f:
    # second file header
    header2 = f.readline()
    
    # Let's get the lines of data
    data_2 = []
    for line in f:
        new_data = float(line.rstrip())  # only one value per line
        data_2.append(new_data)


with open('data_3.txt', 'w') as f:
    # Output file
    # Write Header
    f.write("file a b c d z\n")
    
    # Use double loop to loop through all rows of data_2 for each row in data_1
    for v1 in data_1:
        # For each row in data_1
        file, a, b, c, d = v1  # unpacking the values in v1 to individual variables
        for v2 in data_2:
            # for each row in data_2
            x = v2  # data2 just has a single value per row
           
            # Calculation using posted formula
            z = calculation(a, b, c, x)
            
            # Write result
            f.write(f"{file} {a} {b} {c} {d} {z}\n")

Numpy版本

import numpy as np
file1=np.loadtxt('data_1.txt',skiprows=1,usecols=(1,2,3, 4))
file2=np.loadtxt('data_2.txt',skiprows=1,usecols=(0))

with open('data_3.txt', 'w') as f:
    # Write header
    f.write("file a b c d z\n")
    
    # Double loop to through the values of file1 and file2
    for val1 in file1:
        for val2 in file2:
            # Only use first 3 values (val1[:3] to only use first 3 value so ignore d)
            z = calculation(*val1[:3], val2)  # *val[:3] is unpacking values to go into calculation
            # Write result
            # map(str, val1) - converts values to string
            # str(z) converts z to string
            #' '.join([*map(str, val1), str(z)] - creates a space separated string
            f.write(' '.join([*map(str, val1), str(z)]) + "\n")

1）删除for循环中的“each”，2）为文件中的每个值调用计算时没有参数

，2:

在python中无效。文件2中的值应该是

：

请告诉我，如果可能，我更新了代码，但没有得到预期的输出您有混合数据类型（第一列中的字符串），在

pandas

中比直接在

numpy

中更容易处理。你能使用pandas软件包吗？@tdelaney是的，我的输出与我的有很大的不同……你能纠正你的代码，使输出保持一致吗？我不知道有多大不同。我调整了输出格式。希望这很接近。仍然不同…我需要计算每个z值文件的函数a b c d z file1 0.5 0.6 0.8 0.3-file1 0.5 0.6 0.8 0.3-file1 0.5 0.6 0.8 0.3-file1 0.5 0.6 0.8 0.3-file1 0.20.2 0.4 0.1-文件10.2 0.2 0.4 0.1-文件10.2 0.2 0.4 0.1-文件10.1 0.4 0.5 0.2-文件10.1 0.4 0.5 0.2-文件10.1 0.4 0.5 0.2-哦，我明白了。我将不得不对此进行一点修改。是否可以使用numpy请让我知道…对meits来说很难理解很好，但是计算sameis的函数在哪里？使用numpy可以让我知道…对meits来说很难理解me@anonymossi--

df['z']=df.apply（lambda行：计算（行['a']，行['b']，行['c']，行['x']，轴='columns'）

使用您的计算函数。既然你说你刚开始使用Python，我将使用更简单的函数制作另一个版本。@anonymossi——用一个使用基本Python的版本进行了更新。这有意义吗？