Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将输出转换为数据帧_Python_Pandas_Dataframe - Fatal编程技术网

Python 将输出转换为数据帧

Python 将输出转换为数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我正在跟踪来自马士基的货船,并希望实现流程自动化。到目前为止,我可以得到数据,但这是清洁的一部分,杀死了我 我使用BS4 from bs4 import BeautifulSoup import pandas as pd import requests import time header = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0" #gets the data def g

我正在跟踪来自马士基的货船,并希望实现流程自动化。到目前为止,我可以得到数据,但这是清洁的一部分,杀死了我

我使用BS4

from bs4 import BeautifulSoup
import pandas as pd
import requests
import time

header = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"

#gets the data
def get_data(x):
    soup = BeautifulSoup(requests.get(url, headers={"User-Agent":header}).text, 'lxml')
    data = soup.find_all("td")
    list_of_prices = [x.text for x in data]
    return list_of_prices

#convert to a dictionary that can easily be converted to a pandas dataframe
def Convert(a):
    pts = get_data(a)
    it = iter(pts) 
    res_dct = dict(zip(it, it)) 
    return res_dct 

# makes it a dataframe with the required columns
def make_df():
    todf = Convert(get_data(url))
    df = pd.DataFrame((todf), index=[0])
    keep_flag = df[['Flag']]
    keep_ETA = df[['ETA']]
    keep_speed = df[['Course / Speed']]
    keep_report = df[['Last report ']]
    new_df = pd.concat([keep_flag, keep_ETA, keep_speed, keep_report], axis = 1).T
    #date = pd.Timestamp.today()
    return new_df

# how I print    
urls = {
    "EMMA MAERSK": "https://www.vesselfinder.com/vessels/EMMA-MAERSK-IMO-9321483-MMSI-220417000",
    "MANILA MAERSK": "https://www.vesselfinder.com/vessels/MANILA-MAERSK-IMO-9780469-MMSI-219038000"
    }
for ele, url in urls.items():
    print(ele, make_df())
输出如下:


EMMA MAERSK                                       0
Flag                            Denmark
ETA                       Nov 24, 00:01
Course / Speed         232.0° / 11.7 kn
Last report      Nov 22, 2019 08:10 UTC
MANILA MAERSK                                       0
Flag                            Denmark
ETA                       Nov 23, 11:30
Course / Speed         182.4° / 13.4 kn
Last report      Nov 22, 2019 08:31 UTC
一个很好的格式,但我很好奇如何将其转换成数据帧

我试过这个:

new_df = []
for ele, url in urls.items():
    data = ele, make_df()
    ddf = new_df.append(data)

appended_data = pd.DataFrame(new_df)
appended_data.to_excel('appended.xlsx')
但是它没有给我想要的输出

我希望两个柱子并排,而不是在另一个下面。所以艾玛·马士基和马尼拉·马士基是并排的


谢谢大家!

使用您自己的功能:

dictionary_list = []
for ele, url in urls.items():
    values_dict = Convert(get_data(url))
    values_dict["Name"] = ele
    dictionary_list.append(values_dict)
字典列表创建字典

pd.DataFrame(dictionary_list)[["Name", "Flag", "ETA", "Course / Speed", "Last report "]]
返回:

Name    Flag    ETA Course / Speed  Last report
0   EMMA MAERSK Denmark Nov 24, 00:01   240.5° / 11.9 kn    Nov 22, 2019 08:59 UTC
1   MANILA MAERSK   Denmark Nov 23, 11:30   179.6° / 14.1 kn    Nov 22, 2019 09:01 UTC

然后,您可以根据需要使用重命名列名。

您只需将所有数据添加到一个位置,然后转换为dataframe

from bs4 import BeautifulSoup
import pandas as pd
import requests
import time

header = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"

#gets the data
def get_data(x):
    soup = BeautifulSoup(requests.get(url, headers={"User-Agent":header}).text, 'lxml')
    data = soup.find_all("td")
    list_of_prices = [x.text for x in data]
    return list_of_prices

#convert to a dictionary that can easily be converted to a pandas dataframe
def Convert(a):
    pts = get_data(a)
    it = iter(pts) 
    res_dct = dict(zip(it, it))
    data.append({'flag' : res_dct.get('Flag',''),
    'ETA' : res_dct.get('ETA',''),
    'Course / Speed' : res_dct.get('Course / Speed',''),
    'Last report' : res_dct.get('Last report ','')})



# how I print    
urls = {
    "EMMA MAERSK": "https://www.vesselfinder.com/vessels/EMMA-MAERSK-IMO-9321483-MMSI-220417000",
    "MANILA MAERSK": "https://www.vesselfinder.com/vessels/MANILA-MAERSK-IMO-9780469-MMSI-219038000"
    }
data = []
for ele, url in urls.items():
    Convert(get_data(url))

df = pd.DataFrame(data)
输出:

    flag    ETA Course / Speed  Last report
0   Denmark Nov 24, 00:01   241.6° / 12.0 kn    Nov 22, 2019 09:04 UTC
1   Denmark Nov 23, 11:30   184.8° / 13.9 kn    Nov 22, 2019 09:07 UTC

正是我需要的,还有船名。谢谢