Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-将表数据重新排列为一行_Python_Pandas - Fatal编程技术网

Python-将表数据重新排列为一行

Python-将表数据重新排列为一行,python,pandas,Python,Pandas,我们有一段代码可以从iframe中提取数据,谢谢Cody: import requests from bs4 import BeautifulSoup s = requests.Session() r = s.get("https://www.aliexpress.com/store/feedback-score/1665279.html") soup = BeautifulSoup(r.content, "html.parser") iframe_src = soup.select_one

我们有一段代码可以从iframe中提取数据,谢谢Cody:

import requests
from bs4 import BeautifulSoup

s = requests.Session()
r = s.get("https://www.aliexpress.com/store/feedback-score/1665279.html")

soup = BeautifulSoup(r.content, "html.parser")
iframe_src = soup.select_one("#detail-displayer").attrs["src"]

r = s.get(f"https:{iframe_src}")

soup = BeautifulSoup(r.content, "html.parser")
for row in soup.select(".history-tb tr"):
    print("\t".join([e.text for e in row.select("th, td")]))
退回此邮件:

Feedback    1 Month 3 Months    6 Months
Positive (4-5 Stars)    154 562 1,550
Neutral (3 Stars)   8   19  65
Negative (1-2 Stars)    8   20  57
Positive feedback rate  95.1%   96.6%   96.5%
我们需要这个输出,全部在一行中:

我们怎么做?

只需设置索引并取消堆栈:

df:

然后:

或者您可以使用pivot:

两者的性能大致相同:

取消堆叠:3次运行的3.61 ms±186µs每个回路平均值±标准偏差,每次1000个回路


枢轴:3次循环的3.59 ms±114µs平均值±标准偏差,每个循环1000个

以下是完成此工作的完整代码

import pandas as pd
import requests
from bs4 import BeautifulSoup

pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', 50)

url = "https://www.aliexpress.com/store/feedback-score/1665279.html"
s = requests.Session()
r = s.get(url)

soup = BeautifulSoup(r.content, "html.parser")
iframe_src = soup.select_one("#detail-displayer").attrs["src"]

r = s.get(f"https:{iframe_src}")

soup = BeautifulSoup(r.content, "html.parser")
rows = []
for row in soup.select(".history-tb tr"):
    print("\t".join([e.text for e in row.select("th, td")]))
    rows.append([e.text for e in row.select("th, td")])
print

df = pd.DataFrame.from_records(
    rows,
    columns=['Feedback', '1 Month', '3 Months', '6 Months'],
)

# remove first row with column names
df = df.iloc[1:]
df['Shop'] = url.split('/')[-1].split('.')[0]

pivot = df.pivot(index='Shop', columns='Feedback')
pivot.columns = [' '.join(col).strip() for col in pivot.columns.values]

column_mapping = dict(
    zip(pivot.columns.tolist(), [col[:12] for col in pivot.columns.tolist()]))
# column_mapping
# {'1 Month Negative (1-2 Stars)': '1 Month Nega',
#  '1 Month Neutral (3 Stars)': '1 Month Neut',
#  '1 Month Positive (4-5 Stars)': '1 Month Posi',
#  '1 Month Positive feedback rate': '1 Month Posi',
#  '3 Months Negative (1-2 Stars)': '3 Months Neg',
#  '3 Months Neutral (3 Stars)': '3 Months Neu',
#  '3 Months Positive (4-5 Stars)': '3 Months Pos',
#  '3 Months Positive feedback rate': '3 Months Pos',
#  '6 Months Negative (1-2 Stars)': '6 Months Neg',
#  '6 Months Neutral (3 Stars)': '6 Months Neu',
#  '6 Months Positive (4-5 Stars)': '6 Months Pos',
#  '6 Months Positive feedback rate': '6 Months Pos'}
pivot.columns = [column_mapping[col] for col in pivot.columns]

pivot.to_excel('Report.xlsx')

您可能需要手动对pivot.columns进行排序,因为它们是按字母顺序排序的,1个月负1-2星在“1个月中性3星”之前。有了这些列的映射,您只需要为它们中的每一个选择一个合适的名称,然后它们就会被映射,这样您就不必在每次决定切换中性和负位置时对它们重新排序。这要感谢字典查找。

听起来像是df的工作。pivot@Yuca很好,谢谢。你知道如何先把for循环变成pandas吗?很遗憾,我没有很多时间做适当的研究,否则我会给你一个更好的答案。我相信很快会有人来的!非常酷@Alex Tereshenkov,非常感谢。订单可以稍后在Excel中管理,所以这没有问题。但是商店的名称很重要,因为我想在URL列表中运行它。您知道如何从原始URL文件名中获取数字吗?IE:1665279.html=>1665279没问题,很乐意帮忙!我提取了一个url变量,然后使用html页面名称进行拆分。关于列,您可以通过访问pivot.columns、手动重新排序和设置,在Python脚本中定义一次正确的顺序,然后甚至执行简单的映射,例如col_mapping={'1个月负1-2星':'1M Neg','1个月中性3星':'1M Neu'}若要在导出到Excel之前替换列,请省去一些手工工作:我添加了更多关于列映射的代码-希望这能节省您一些时间!@FrankC,抱歉,刚刚打开页面,看到给出了答案,很高兴您一切顺利!
df = df[~df['Feedback'].str.contains('Positive feedback rate')]
new = df.set_index(['store', 'Feedback']).unstack(level=1)
# use f-strings with list comprehension
new.columns = new.columns = [f'{x} {y[:3]}' for x,y in new.columns]
df = df[~df['Feedback'].str.contains('Positive feedback rate')]
new = df.pivot('store', 'Feedback')
new.columns = new.columns = [f'{x} {y[:3]}' for x,y in new.columns]
import pandas as pd
import requests
from bs4 import BeautifulSoup

pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', 50)

url = "https://www.aliexpress.com/store/feedback-score/1665279.html"
s = requests.Session()
r = s.get(url)

soup = BeautifulSoup(r.content, "html.parser")
iframe_src = soup.select_one("#detail-displayer").attrs["src"]

r = s.get(f"https:{iframe_src}")

soup = BeautifulSoup(r.content, "html.parser")
rows = []
for row in soup.select(".history-tb tr"):
    print("\t".join([e.text for e in row.select("th, td")]))
    rows.append([e.text for e in row.select("th, td")])
print

df = pd.DataFrame.from_records(
    rows,
    columns=['Feedback', '1 Month', '3 Months', '6 Months'],
)

# remove first row with column names
df = df.iloc[1:]
df['Shop'] = url.split('/')[-1].split('.')[0]

pivot = df.pivot(index='Shop', columns='Feedback')
pivot.columns = [' '.join(col).strip() for col in pivot.columns.values]

column_mapping = dict(
    zip(pivot.columns.tolist(), [col[:12] for col in pivot.columns.tolist()]))
# column_mapping
# {'1 Month Negative (1-2 Stars)': '1 Month Nega',
#  '1 Month Neutral (3 Stars)': '1 Month Neut',
#  '1 Month Positive (4-5 Stars)': '1 Month Posi',
#  '1 Month Positive feedback rate': '1 Month Posi',
#  '3 Months Negative (1-2 Stars)': '3 Months Neg',
#  '3 Months Neutral (3 Stars)': '3 Months Neu',
#  '3 Months Positive (4-5 Stars)': '3 Months Pos',
#  '3 Months Positive feedback rate': '3 Months Pos',
#  '6 Months Negative (1-2 Stars)': '6 Months Neg',
#  '6 Months Neutral (3 Stars)': '6 Months Neu',
#  '6 Months Positive (4-5 Stars)': '6 Months Pos',
#  '6 Months Positive feedback rate': '6 Months Pos'}
pivot.columns = [column_mapping[col] for col in pivot.columns]

pivot.to_excel('Report.xlsx')