在起始字符上每隔N个字符拆分并解析(到新文件)字符串-python

在起始字符上每隔N个字符拆分并解析(到新文件)字符串-python,python,loops,parsing,Python,Loops,Parsing,我在一篇文章中提出了一种更通用的方法来解决这个问题,但我一直在尝试将结果解析为单个文件。我想迭代一个长字符串,从位置1(python 0)开始,每100个字符打印一次。然后,我想移动一个字符,从位置2(python 1)开始,重复这个过程,直到到达最后100个字符。我想将每个“100”行块解析为一个新文件。以下是我目前的工作内容: seq = 7524 # I get this number from a raw_input read_num=100 for raw_reads

我在一篇文章中提出了一种更通用的方法来解决这个问题,但我一直在尝试将结果解析为单个文件。我想迭代一个长字符串,从位置1(python 0)开始,每100个字符打印一次。然后,我想移动一个字符,从位置2(python 1)开始,重复这个过程,直到到达最后100个字符。我想将每个“100”行块解析为一个新文件。以下是我目前的工作内容:

seq = 7524       # I get this number from a raw_input 
read_num=100

for raw_reads in range(100):
    def nlength_parts(seq,read_num):
        return map(''.join,zip(*[seq[i:] for i in range(read_num)]))


f = open('read' + str(raw_reads), 'w')
f.write("read" '\n')
f.write(nlength_parts(seq,read_num))
f.close
我现在经常犯的错误是

f.write(nlength_parts(seq,read_num))
TypeError: expected a character buffer object
如果遇到一些问题,我们将非常感谢您的帮助


经过一些帮助,我做了一些更改,但仍然无法正常工作:

seq = 7524       # I get this number from a raw_input 
read_num=100

def nlength_parts(seq,read_num):
    return map(''.join,zip(*[seq[i:] for i in range(read_num)]))

for raw_reads in range(100):   # Should be gene length - 100
    f = open('read' + str(raw_reads), 'w')
    f.write("read" + str(raw_reads))
    f.write(nlength_parts)
    f.close

我可能遗漏了一些重要的变量和定义,以保持我的文章简短,但这造成了混乱。我已经在下面粘贴了我的全部代码

#! /usr/bin/env python

import sys,os
import random
import string

raw = raw_input("Text file: " )

with open(raw) as f:
    joined = "".join(line.strip() for line in f)
    f = open(raw + '.txt', 'w')
    f.write(joined)
    f.closed

seq = str(joined)
read_num = 100

def nlength_parts(seq,read_num):
    return map(''.join,zip(*[seq[i:] for i in range(read_num)]))

for raw_reads in range(100):   # ideally I want range to be len(seq)-100
    f = open('read' + str(raw_reads), 'w')
    f.write("read" + str(raw_reads))
    f.write('\n')
    f.write(str(nlength_parts))
    f.close
有几件事:

  • 在全局范围内定义变量
    seq
    read_num
    ,然后在函数中使用相同的参数。您应该做的是使函数定义中的参数名称不同,然后在调用函数时将这两个变量传递给函数
  • 调用nlength_parts时,既不传递定义它的任何参数,也缺少()。将其与#1一起修复
  • 您似乎没有定义要切片的字符串。在函数中切片
    seq
    ,但
    seq
    在代码中是一个整数。seq是您在评论中提到的文件的处理输出吗?如果是这样的话,它在实际代码中是否要大得多
  • 也就是说,我相信这段代码将实现您希望它实现的功能:

    def nlength_parts(myStr, length, paddingChar=" "):
        if(len(myStr) < length):
            myStr += paddingChar * (length - len(myStr))
        sequences = []
        for i in range(0, len(myStr)-length + 1):
        sequences.append(myStr[i:i+length])
        return(sequences)
    foo = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    nlengthfoo = nlength_parts(foo, 10)
    for x in range(0, length(nlengthfoo):
        with open("read" + (x+1), "w") as f:
            f.write(nlengthfoo[x])
    
    def nlength_parts(myStr,length,paddingChar=”“):
    如果(len(myStr)<长度):
    myStr+=paddingChar*(长度-len(myStr))
    序列=[]
    对于范围内的i(0,len(myStr)-长度+1):
    sequences.append(myStr[i:i+length])
    返回(序列)
    foo=“abcdefghijklmnopqrstuvxyz”
    nlengthfoo=nlength_零件(foo,10)
    对于范围(0,长度(nlengthfoo))内的x:
    打开(“读取”+(x+1),“w”)作为f:
    f、 写入(nlengthfoo[x])
    
    编辑:抱歉,根据您的评论更改了我的代码。

    有几件事:

  • 在全局范围中定义变量
    seq
    read_num
    ,然后在函数中使用相同的参数。您应该做的是使函数定义中的参数名称不同,然后在调用函数时将这两个变量传递给函数
  • 调用nlength#u parts时,既不传递定义它所用的任何参数,也不传递()。请结合#1解决此问题
  • 您似乎没有定义要切片的字符串。您在函数中切片
    seq
    ,但
    seq
    在代码中是一个整数。seq是您在注释中提到的文件的处理输出吗?如果是,在实际代码中它是否大得多
  • 也就是说,我相信这段代码将实现您希望它实现的功能:

    def nlength_parts(myStr, length, paddingChar=" "):
        if(len(myStr) < length):
            myStr += paddingChar * (length - len(myStr))
        sequences = []
        for i in range(0, len(myStr)-length + 1):
        sequences.append(myStr[i:i+length])
        return(sequences)
    foo = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    nlengthfoo = nlength_parts(foo, 10)
    for x in range(0, length(nlengthfoo):
        with open("read" + (x+1), "w") as f:
            f.write(nlengthfoo[x])
    
    def nlength_parts(myStr,length,paddingChar=”“):
    如果(len(myStr)<长度):
    myStr+=paddingChar*(长度-len(myStr))
    序列=[]
    对于范围内的i(0,len(myStr)-长度+1):
    sequences.append(myStr[i:i+length])
    返回(序列)
    foo=“abcdefghijklmnopqrstuvxyz”
    nlengthfoo=nlength_零件(foo,10)
    对于范围(0,长度(nlengthfoo))内的x:
    打开(“读取”+(x+1),“w”)作为f:
    f、 写入(nlengthfoo[x])
    
    编辑:抱歉,更改了我的代码以回应您的评论。

    编辑以回应澄清的评论: 基本上,您需要字符串的滚动窗口。说
    long\u string=“012345678901234567890123456789…”
    ,总长度为100

    In [18]: long_string
    Out[18]: '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'
    
    In [19]: window = 10
    
    In [20]: for i in range(len(long_string) - window +1):
       .....:     chunk = long_string[i:i+window]
       .....:     print(chunk)
       .....:     with open('chunk_' + str(i+1) + '.txt','w') as f:
       .....:         f.write(chunk)
       .....:         
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    
    最后,

    In [21]: ls
    chunk_10.txt  chunk_20.txt  chunk_30.txt  chunk_40.txt  chunk_50.txt  chunk_60.txt  chunk_70.txt  chunk_80.txt  chunk_90.txt
    chunk_11.txt  chunk_21.txt  chunk_31.txt  chunk_41.txt  chunk_51.txt  chunk_61.txt  chunk_71.txt  chunk_81.txt  chunk_91.txt
    chunk_12.txt  chunk_22.txt  chunk_32.txt  chunk_42.txt  chunk_52.txt  chunk_62.txt  chunk_72.txt  chunk_82.txt  chunk_9.txt
    chunk_13.txt  chunk_23.txt  chunk_33.txt  chunk_43.txt  chunk_53.txt  chunk_63.txt  chunk_73.txt  chunk_83.txt
    chunk_14.txt  chunk_24.txt  chunk_34.txt  chunk_44.txt  chunk_54.txt  chunk_64.txt  chunk_74.txt  chunk_84.txt
    chunk_15.txt  chunk_25.txt  chunk_35.txt  chunk_45.txt  chunk_55.txt  chunk_65.txt  chunk_75.txt  chunk_85.txt
    chunk_16.txt  chunk_26.txt  chunk_36.txt  chunk_46.txt  chunk_56.txt  chunk_66.txt  chunk_76.txt  chunk_86.txt
    chunk_17.txt  chunk_27.txt  chunk_37.txt  chunk_47.txt  chunk_57.txt  chunk_67.txt  chunk_77.txt  chunk_87.txt
    chunk_18.txt  chunk_28.txt  chunk_38.txt  chunk_48.txt  chunk_58.txt  chunk_68.txt  chunk_78.txt  chunk_88.txt
    chunk_19.txt  chunk_29.txt  chunk_39.txt  chunk_49.txt  chunk_59.txt  chunk_69.txt  chunk_79.txt  chunk_89.txt
    chunk_1.txt   chunk_2.txt   chunk_3.txt   chunk_4.txt   chunk_5.txt   chunk_6.txt   chunk_7.txt   chunk_8.txt
    
    原始响应 我只想把字符串当作一个文件来处理,这样可以避免任何切片的麻烦,而且非常简单,因为文件API可以让你轻松地“读取”成块

    In [1]: import io
    
    In [2]: long_string = 'a'*100 + 'b'*100 + 'c'*100 + 'e'*88
    
    In [3]: print(long_string)
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    
    In [4]: string_io = io.StringIO(long_string)
    
    In [5]: chunk = string_io.read(100)
    
    In [6]: chunk_no = 1
    
    In [7]: while chunk:
       ....:     print(chunk)
       ....:     with open('chunk_' + str(chunk_no) + '.txt','w') as f:
       ....:         f.write(chunk)
       ....:     chunk = string_io.read(100)
       ....:     chunk_no += 1
       ....:     
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    
    注意,我使用的是ipython终端,因此您可以在解释器会话中使用终端命令

    In [8]: ls chunk*
    chunk_1.txt  chunk_2.txt  chunk_3.txt  chunk_4.txt
    
    In [9]: cat chunk_1.txt
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    In [10]: cat chunk_2.txt
    bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    In [11]: cat chunk_3.txt
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    In [12]: cat chunk_4.txt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    In [13]: 
    
    针对澄清意见进行编辑: 基本上,您需要字符串的滚动窗口。说
    long\u string=“012345678901234567890123456789…”
    ,总长度为100

    In [18]: long_string
    Out[18]: '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'
    
    In [19]: window = 10
    
    In [20]: for i in range(len(long_string) - window +1):
       .....:     chunk = long_string[i:i+window]
       .....:     print(chunk)
       .....:     with open('chunk_' + str(i+1) + '.txt','w') as f:
       .....:         f.write(chunk)
       .....:         
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    1234567890
    2345678901
    3456789012
    4567890123
    5678901234
    6789012345
    7890123456
    8901234567
    9012345678
    0123456789
    
    最后,

    In [21]: ls
    chunk_10.txt  chunk_20.txt  chunk_30.txt  chunk_40.txt  chunk_50.txt  chunk_60.txt  chunk_70.txt  chunk_80.txt  chunk_90.txt
    chunk_11.txt  chunk_21.txt  chunk_31.txt  chunk_41.txt  chunk_51.txt  chunk_61.txt  chunk_71.txt  chunk_81.txt  chunk_91.txt
    chunk_12.txt  chunk_22.txt  chunk_32.txt  chunk_42.txt  chunk_52.txt  chunk_62.txt  chunk_72.txt  chunk_82.txt  chunk_9.txt
    chunk_13.txt  chunk_23.txt  chunk_33.txt  chunk_43.txt  chunk_53.txt  chunk_63.txt  chunk_73.txt  chunk_83.txt
    chunk_14.txt  chunk_24.txt  chunk_34.txt  chunk_44.txt  chunk_54.txt  chunk_64.txt  chunk_74.txt  chunk_84.txt
    chunk_15.txt  chunk_25.txt  chunk_35.txt  chunk_45.txt  chunk_55.txt  chunk_65.txt  chunk_75.txt  chunk_85.txt
    chunk_16.txt  chunk_26.txt  chunk_36.txt  chunk_46.txt  chunk_56.txt  chunk_66.txt  chunk_76.txt  chunk_86.txt
    chunk_17.txt  chunk_27.txt  chunk_37.txt  chunk_47.txt  chunk_57.txt  chunk_67.txt  chunk_77.txt  chunk_87.txt
    chunk_18.txt  chunk_28.txt  chunk_38.txt  chunk_48.txt  chunk_58.txt  chunk_68.txt  chunk_78.txt  chunk_88.txt
    chunk_19.txt  chunk_29.txt  chunk_39.txt  chunk_49.txt  chunk_59.txt  chunk_69.txt  chunk_79.txt  chunk_89.txt
    chunk_1.txt   chunk_2.txt   chunk_3.txt   chunk_4.txt   chunk_5.txt   chunk_6.txt   chunk_7.txt   chunk_8.txt
    
    原始响应 我只想把字符串当作一个文件来处理,这样可以避免任何切片的麻烦,而且非常简单,因为文件API可以让你轻松地“读取”成块

    In [1]: import io
    
    In [2]: long_string = 'a'*100 + 'b'*100 + 'c'*100 + 'e'*88
    
    In [3]: print(long_string)
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    
    In [4]: string_io = io.StringIO(long_string)
    
    In [5]: chunk = string_io.read(100)
    
    In [6]: chunk_no = 1
    
    In [7]: while chunk:
       ....:     print(chunk)
       ....:     with open('chunk_' + str(chunk_no) + '.txt','w') as f:
       ....:         f.write(chunk)
       ....:     chunk = string_io.read(100)
       ....:     chunk_no += 1
       ....:     
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    
    注意,我使用的是ipython终端,因此您可以在解释器会话中使用终端命令

    In [8]: ls chunk*
    chunk_1.txt  chunk_2.txt  chunk_3.txt  chunk_4.txt
    
    In [9]: cat chunk_1.txt
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    In [10]: cat chunk_2.txt
    bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    In [11]: cat chunk_3.txt
    cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
    In [12]: cat chunk_4.txt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    In [13]: 
    

    你为什么要在for循环中定义你的函数。这要么毫无意义,要么我今天没有喝足够的咖啡。我承认这可能不是最好的决定。我正在尝试解析一个新文件的每个NLENGHT_部分解决方案,但找不到最好的方法。我的意思是这看起来像胡言乱语。你在尝试什么ng通过在循环中定义它来完成,因为它只会做与从循环中取出它完全相同的事情。此外,您似乎正在用参数混淆全局变量…是的,当您第二次引用它时,原始读取已超出范围。我的代码一团糟…我移动了for循环,但仍然无法将其打印出来我的nlength_部分的输出…下面是代码:为什么要在for循环中定义函数。这要么毫无意义,要么我今天没有喝足够的咖啡。我承认,这可能不是最好的决定。我正在尝试将每个nlength_部分解决方案解析到一个新文件中,但找不到最好的方法。我的意思是就是这样,它看起来像是胡言乱语。通过在循环中定义它,你想做什么,因为它只会做与从循环中取出它完全相同的事情。而且你似乎在混淆你的全局变量