Python 将多行文本添加到csv中的单个单元格中，然后u scraping uu一个站点_Python_Csv_Web Scraping_Beautifulsoup

Python 将多行文本添加到csv中的单个单元格中，然后u scraping uu一个站点

python csv web-scraping

Python 将多行文本添加到csv中的单个单元格中，然后u scraping uu一个站点,python,csv,web-scraping,beautifulsoup,Python,Csv,Web Scraping,Beautifulsoup,正如标题所示，我正在努力找出如何使多行文本块可以容纳在单个单元格中。至于我所做的一些上下文，我正在使用Beauty Soup提取mtDNA序列以及站点上的其他数据，并将这些值放入csv中我曾尝试使用str.strip（'\n'）将文本转换为一行，但没有成功，文本也流到了下一行。下面是我的程序代码 import requests theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&a

正如标题所示，我正在努力找出如何使多行文本块可以容纳在单个单元格中。至于我所做的一些上下文，我正在使用Beauty Soup提取mtDNA序列以及站点上的其他数据，并将这些值放入csv中

我曾尝试使用

str.strip（'\n'）

将文本转换为一行，但没有成功，文本也流到了下一行。下面是我的程序代码

import requests

theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2]

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + ", " + haplogroup.replace(",", "|") + ", " + mtDNA_sequence + "\n")

如果能帮上忙，我们将不胜感激。

问题是dna序列中有换行符。因此，您必须替换换行符

import requests
theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&ext
rafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=10
00000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2].strip().replace("\n","")

f = open("a.csv","w")
genbank_ID = "hi"
haplogroup = "world"

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + ", " + haplogroup.replace(",", "|") + ", \"" + mtDNA_sequence + "\"\n")
f.close()

拆分后剥离

\n

mtDNA\u序列=dna\u序列。分割（拆分）[2]。剥离（）

并将序列括在引号中，以便csv能够理解它是单个序列。当序列表示为变量时，您是否知道我应该如何将整个序列括在引号中？您希望在csv中存储哪些信息？我想存储genbank样本ID、单倍体组和整个序列。我对前两个没有问题，只是序列把一切都搞乱了，因为序列会跨越多行，弄乱了电子表格，而不是全部放在一个单元格中。