Python 为随机采样数据帧的脚本编写PyTest单元测试？_Python_Python 3.x_Unit Testing_Pytest_Random Testing

Python 为随机采样数据帧的脚本编写PyTest单元测试？

python python-3.x unit-testing

Python 为随机采样数据帧的脚本编写PyTest单元测试？,python,python-3.x,unit-testing,pytest,random-testing,Python,Python 3.x,Unit Testing,Pytest,Random Testing,我正在为我的一个处理数据的项目编写单元测试。然而，我有一些脚本，它们接受CSV，将它们与Pandas连接起来，然后随机对它们进行采样，为机器学习任务创建训练/开发/测试集我正在编写单元测试，生成一些随机数据CSV，从中进行测试。但是，如何为我尝试测试的脚本返回的内容创建引用数据呢 # Example of my test setup: @pytest.fixture def create_reference_input_data(): # Create some random CSV str

我正在为我的一个处理数据的项目编写单元测试。然而，我有一些脚本，它们接受CSV，将它们与Pandas连接起来，然后随机对它们进行采样，为机器学习任务创建训练/开发/测试集

我正在编写单元测试，生成一些随机数据CSV，从中进行测试。但是，如何为我尝试测试的脚本返回的内容创建引用数据呢

# Example of my test setup:

@pytest.fixture
def create_reference_input_data():
# Create some random CSV strings and make some test input data CSVs

@pytest.fixture
def create_reference_output_data():
# create some fake output data from the data that was created in create_reference_input_data()
# this output data should be like what I am expecting from the script I am testing
# I will be using this data to assert to what is produced from the script I am testing.
return reference_train_df, reference_test_df, reference_dev_df

def test_collect_data(create_reference_output_data):
# Run the script that I am testing for. It generates randomly sampled data from concatenated CSV datas like what would be created in create_reference_input_data() fixture.
# CSV data to make train/test/dev splitted CSV data.
test_data = collect_data(input_path, output_path, test_split = .10, dev_split = .20)

for file1_row, file2_row in zip(reference_output_data, test_data):
    assert file1_row == file2_row # assert lines of text are the same in reference and test

希望这个伪代码有意义。我懂得播种什么的。但是，如何手动创建脚本应该生成的测试数据，并断言它是调用该脚本时实际生成的数据？

您遇到的是一个多部分问题。您是否看过使用

Faker

创建测试数据的过程？您是否检查了如何写入tmpdir？您遇到的是一个多部分问题。您是否看过使用

Faker

创建测试数据的过程？您是否检查了如何写入tmpdir？