Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/289.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将多个txt文件读入Dict数据帧_Python_Pandas_Dataframe_Nlp - Fatal编程技术网

Python 将多个txt文件读入Dict数据帧

Python 将多个txt文件读入Dict数据帧,python,pandas,dataframe,nlp,Python,Pandas,Dataframe,Nlp,我正在尝试将多个txt文件加载到dataframe中。我知道如何加载URL、csv和excel,但我找不到任何关于如何将多个txt文件加载到dataframe并与dictionary或viceversa匹配的参考资料 文本文件不是逗号或制表符分隔的,只是包含纯文本歌词的纯文本 我检查了熊猫文件,欢迎任何帮助 理想情况下,数据帧 我希望实现的数据框架类似于此示例 |

我正在尝试将多个txt文件加载到dataframe中。我知道如何加载URL、csv和excel,但我找不到任何关于如何将多个txt文件加载到dataframe并与dictionary或viceversa匹配的参考资料

文本文件不是逗号或制表符分隔的,只是包含纯文本歌词的纯文本

我检查了熊猫文件,欢迎任何帮助

理想情况下,数据帧

我希望实现的数据框架类似于此示例

                 |                                                        lyrics
    -------------+-----------------------------------------------------------------------------------------
    bonjovi      |    some text from the text files HiHello! WelcomeThank you Thank you for coming.
    -------------+---------------------------------------------------------------------------------------
    lukebryan    |    some other text from the text files.Hi.Hello WelcomeThank you Thank you for coming. 
    -------------+-----------------------------------------------------------------------------------------
    johnprine    |    yet some text from the text files. Hi.Hello WelcomeThank you Thank you for coming. 

基本示例 文件夹结构/歌词/

urls = 

    'lyrics/bonjovi.txt',
    'lyrics/lukebryan.txt',
    'lyrics/johnprine.txt',
    'lyrics/brunomars.txt',
    'lyrics/methodman.txt',
    'lyrics/bobmarley.txt',
    'lyrics/nickcannon.txt',
    'lyrics/weeknd.txt',
    'lyrics/dojacat.txt',
    'lyrics/ladygaga.txt',
    'lyrics/dualipa.txt',
    'lyrics/justinbieber.txt',]
麝香名字

打开文本文件 这些文件位于我运行Jupyter笔记本的目录/中

for i, c in enumerate(bands):
     with open("lyrics/" + c + ".txt", "wb") as file:
         pickle.dump(lyrics[i], file)
仔细检查以确保数据已正确加载 希望得到这样的结果

录音键['bonjovi','lukebryan','johnprine','brunomars','methodman','bobmarley','nickcannon','weeknd','dojacat','ladygaga','dualipa','justinbieber']

# Combine it!
data_combined = {key: [combine_text(value)] for (key, value) in data.items()}


# We are going to change this to key: artist, value: string format
def combine_text(list_of_text):
    '''Takes a list of text and combines them into one large chunk of text.'''
    combined_text = ' '.join(list_of_text)
    return combined_text
我们可以将其保留为字典格式,也可以将其放入数据框中 作为pd进口熊猫

pd.set_option('max_colwidth',150)

data_df = pd.DataFrame.from_dict(data_combined).transpose()
data_df.columns = ['lyrics']
data_df = data_df.sort_index()
data_df

我会这样做的。注意,我推广了文件操作,因此我不必担心手动创建密钥列表,并确保所有内容都匹配。

工作正常,从txt中提取名称也是一种更好的解决方案!!。谢谢你简洁的评论,这有助于我理解你的代码。再次感谢您抽出时间。
# Combine it!
data_combined = {key: [combine_text(value)] for (key, value) in data.items()}


# We are going to change this to key: artist, value: string format
def combine_text(list_of_text):
    '''Takes a list of text and combines them into one large chunk of text.'''
    combined_text = ' '.join(list_of_text)
    return combined_text
pd.set_option('max_colwidth',150)

data_df = pd.DataFrame.from_dict(data_combined).transpose()
data_df.columns = ['lyrics']
data_df = data_df.sort_index()
data_df
import os
import re
import pandas as pd

#get full path of txt file
filePath = []
for file in os.listdir("./lyrics"):
    filePath.append(os.path.join("./lyrics", file))

#pull file name from text file with regex, capturing the text before the .txt   
fileName = re.compile('\\\\(.*)\.txt')

#make empty dict Data with the key as the file name, and the value as the words in the file.
data = {}
for file in filePath:
    #capturing file name
    key = fileName.search(file)
    with open(file, "r") as readFile:
        # note that key[1] is the capture group from our search, and that the text is put into a list.
        data[key[1]] = [readFile.read()]

#make dataframe from dict, and rename columns.
df = pd.DataFrame(data).T.reset_index().rename(columns = {'index':'bands', 0:'lyrics'})