Python/Pandas-通过分隔符将文本拆分为列；并创建一个csv文件_Python_Pandas_Dataframe_Split_Delimiter

Python/Pandas-通过分隔符将文本拆分为列；并创建一个csv文件

python pandas dataframe

Python/Pandas-通过分隔符将文本拆分为列；并创建一个csv文件,python,pandas,dataframe,split,delimiter,Python,Pandas,Dataframe,Split,Delimiter,我有一个很长的文本，在其中插入了分隔符“；”正是我希望将文本拆分为不同列的位置。到目前为止，每当我尝试将文本拆分为'ID'和'ADText'时，我只得到第一行。但是，两列中应该有1439行/行我的文本如下所示： 1234; 文本由多行多句组成，直到下一个ID在2345上书写；然后新的广告文本开始，直到下一个ID 3456；等等我想用这个；将我的文本拆分为两列，一列带有ID，另一列带有广告文本 #read the text file into python: jobads= pd.read

我有一个很长的文本，在其中插入了分隔符“；”正是我希望将文本拆分为不同列的位置。到目前为止，每当我尝试将文本拆分为'ID'和'ADText'时，我只得到第一行。但是，两列中应该有1439行/行

我的文本如下所示： 1234; 文本由多行多句组成，直到下一个ID在2345上书写；然后新的广告文本开始，直到下一个ID 3456；等等

我想用这个；将我的文本拆分为两列，一列带有ID，另一列带有广告文本

#read the text file into python: 
jobads= pd.read_csv("jobads.txt", header=None)
print(jobadsads)

#create dataframe 
df=pd.DataFrame(jobads, index=None, columns=None)
type(df)
print(df)
#name column to target it for split 
df = df.rename(columns={0:"Job"})
print(df)

#split it into two columns. Problem: I only get the first row.
print(pd.DataFrame(dr.Job.str.split(';',1).tolist(),
                   columns=['ID','AD']))

不幸的是，这只适用于第一个条目，然后它就停止了。输出如下所示：

               ID                                                 AD
0            1234                                   text in written from with ...

t = """1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on1234; text in written from with multiple """

我哪里做错了？如有任何建议，我将不胜感激谢谢大家!

示例文本：

FullName;ISO3;ISO1;molecular_weight
Alanine;Ala;A;89.09
Arginine;Arg;R;174.20
Asparagine;Asn;N;132.12
Aspartic_Acid;Asp;D;133.10
Cysteine;Cys;C;121.16

基于“；”分隔符创建列：

import pandas as pd
f = "aminoacids"
df = pd.read_csv(f,sep=";")

编辑：考虑到评论，我假设文本看起来更像这样：

               ID                                                 AD
0            1234                                   text in written from with ...

t = """1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on1234; text in written from with multiple """

在这种情况下，类似这样的正则表达式会将字符串拆分为ID和文本，然后可以使用它们生成数据帧

import re
r = re.compile("([0-9]+);")
re.split(r,t)

输出：

['',
 '1234',
 ' text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon ',
 '2345',
 ' then the new Ad-Text begins until the next ID ',
 '3456',
 ' and so on',
 '1234',
 ' text in written from with multiple ']

编辑2：这是对提问者在评论中补充问题的回答：如何将此字符串转换为包含两列的数据帧：ID和文本

为什么不使用“pd.read\u csv”的“sep”属性？非常感谢您的回答。我已经试过了，它给了我0行和19090列，因为我的文本不像你的例子那样排序。我的ID不是很好地写在每行前面，而是在文本中自由流动。啊，我明白了，所以你甚至没有新的行？它只是一个很长的单行线？是的，它是一个单行线sadly@Nina你检查过我答案的编辑了吗？它会有帮助吗？或者有没有其他它没有捕捉到的场景？非常感谢您的时间和回答！这很有帮助，而且很有效！真棒，谢谢！