Python 如何通过它将数据帧分割成块';什么弦?

Python 如何通过它将数据帧分割成块';什么弦?,python,list,python-3.x,pandas,Python,List,Python 3.x,Pandas,我有一个pandas数据框,它是通过添加一系列列表生成的,主要由带有分隔符(“”\n“”)的字符串组成,如下所示: content 0 American Regent/Luitpold (Reverified 10/26/2016)\nCompany Contact Information:\n800-645-1706\n\nPresentation Availability and Estimated Shortage Duration Related Information Sh

我有一个pandas数据框,它是通过添加一系列列表生成的,主要由带有分隔符(“
”\n“
”)的字符串组成,如下所示:

   content

0   American Regent/Luitpold (Reverified 10/26/2016)\nCompany Contact Information:\n800-645-1706\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\n2 mL single-dose vial, package of 10 (NDC 00517-2502-10) Available for NDC 00517-2502-10. Demand increase for the drug
1   Amphastar Pharmaceuticals, Inc./IMS (Reverified 08/18/2016)\nCompany Contact Information:\n800-423-4136\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\nCalcium Chloride Inj. USP, 10%, 10mL Luer-Jet Prefilled Syringe, (NDC 0548-3304-00), new (NDC 76329-3304-1) Product available Demand increase for the drug\nHospira, Inc. (Reverified 10/21/2016)
2   American Regent/Luitpold (Reverified 10/26/2016)\nCompany Contact Information:\n800-645-1706\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\n10%, 50 mL vial; Calcium (0.465 mEq/mL), Preservative Free (NDC 0517-3950-25) Unavailable for NDC 00517-3950-25. No product available for release. No plan to manufacture. American Regent is currently not releasing Calcium Gluconate 50 mL vial (NDC 00517-3950-25). Other\n10%, 100 mL vial; Calcium (0.465 mEq/mL), Preservative Free (NDC 0517-3900-25) Unavailable for NDC 00517-3900-25. American Regent is currently not releasing Calcium Gluconate 100 mL vial (NDC 0517-3900-25). Other\nFresenius Kabi USA, LLC (Revised 11/01/2016)
 .......
n   Apotex Corp. (Revised 05/16/2016)\nCompany Contact Information:\n800-706-5575\n\nPresentation\n1gm; (25 Vials) (NDC 60505-0749-5)\n1gm; (25 Vials)(NDC 60505-6093-5)\n10 gm; (10 Vials) (NDC 60505-0769-0)\n10 gm; (10 Vials) (NDC 60505-6094-0)\nNote:\nAvailable\nB. Braun Medical Inc. (Revised 05/16/2016)\n\n\nBaxter Healthcare (Revised 05/16/2016)\n\n\nFresenius Kabi USA, LLC (Revised 05/16/2016)\n\n\nHospira, Inc. (Revised 05/16/2016)\n\n\nSagent Pharmaceuticals (Revised 05/16/2016)\n\n\nSandoz (Revised 05/16/2016)\n\n\nWest-Ward Pharmaceuticals (Revised 05/16/2016)\n\n\nWG Critical Care (Revised 05/16/2016)
n-1 Apotex Corp. (Reverified 10/26/2016)\nCompany Contact Information:\n800-706-5575\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\nCefepime for Injection, USP 1 gm (10 Vials) (NDC 60505-6030-4) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for Injection, USP 2 gm (10 Vials)(NDC 60505-6031-4) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 1 gm (10 Vials) (NDC 60605-0834-04) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 2 gm (10 Vials) (NDC 60505-0681-4) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 1 gm (1 Vial) (NDC 60505-0834-00) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 2 gm (10 Vials) (NDC 60505-0681-0) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nB. Braun Medical Inc. (New 07/22/2015)\n\n\nBaxter Healthcare (Reverified 10/25/2016)\n\n\nFresenius Kabi USA, LLC (Revised 11/01/2016)\n\n\nHospira, Inc. (Reverified 10/21/2016)\n\n\nSagent Pharmaceuticals (Revised 08/29/2016)\n\n\nWG Critical Care (Revised 06/08/2016)
如何通过新行
\n
在更多列中分隔数据帧的内容:

   col1              col2        col3        col4
0  Shire US Inc. (Reverified 07/01/2016)   and so  on.... 
1  Hospira, Inc. (Reverified 10/21/2016)   and so  on....  
2  Mission Pharmacal (Reverified 01/21/2015)   and so  on....  
....
n  Mission Pharmacal (Reverified 01/21/2015)   and so  on....  
我试图:

df['col'] = df['content'].str.split('\n', expand = true)
显然,我得到了一个错误的项目数量,通过45,位置意味着1。此外,因为我正在做:

df = pd.DataFrame(lis, columns = ['content'])
我不能使用类似的问题

编辑 在这里讨论之后,更新的代码将把多个文件加载到一个数据帧中:

allFiles_df = None
for it, currFile in enumerate(files):

    df = pd.read_csv(currFile, sep = '\n', header = None)
    df.columns = ['data']

    splitFunc = lambda x: pd.Series([i for i in reversed(x.split('\\n'))])

    df = df['data'].apply(splitFunc)
    df = df.stack().to_frame().reset_index().drop(['level_1'],axis = 1)
    df = df[df[0].str.len() >2]
    df['fileNo'] = it

    allFiles_df = pd.concat([allFiles_df,rev])

allFiles_df.columns = ['rowNo','text','fileNo']
需要注意的关键事项: “\n”是原始数据中的文本,因此它作为“\\n”读入python。read_csv中的sep关键字不允许在多个字符上分隔,这就是为什么您会遇到这样的问题


这将输出每个字符串所在的文件和行号。它假定files变量包含一个带有路径的文件名列表。

加载pandas数据帧时,您可以选择定义分隔符,在这种情况下,您可以使用“\n”。如果有帮助,请告诉我。我已准备好尝试执行sep='\n',但没有成功@BernardL
lis
是一个列表,其中列出了guysthanks!,它实际上是一个嵌套列表。不是一根绳子!上面的示例看起来像是一个系列或1d数据帧。请发布反映您的原始数据,类似于我为用户创建df的方式。在未来,创建用户易于使用的原始数据将获得最佳/最快的结果。我尝试:
normalize=lambda x:pd.Series([I for I In reversed(x.split('\n'))])newdf=df['0','1')。应用(normalize)print(newdf)
并获得
keyrerror('0','1'))
。在上面编辑您的数据,以便我们可以查找您正在处理的内容。像我做的那样,创建一个反映原始数据的df。编辑我们不需要查看您的全部数据,就像3行与您的格式相似的假数据一样,可以让您将结果应用到您的完整数据集。如果它让您能够回答上述问题,请对其进行绿色检查。
allFiles_df = None
for it, currFile in enumerate(files):

    df = pd.read_csv(currFile, sep = '\n', header = None)
    df.columns = ['data']

    splitFunc = lambda x: pd.Series([i for i in reversed(x.split('\\n'))])

    df = df['data'].apply(splitFunc)
    df = df.stack().to_frame().reset_index().drop(['level_1'],axis = 1)
    df = df[df[0].str.len() >2]
    df['fileNo'] = it

    allFiles_df = pd.concat([allFiles_df,rev])

allFiles_df.columns = ['rowNo','text','fileNo']