我有一个带有urls列表的dataframe,我想提取几个值。然后,应该将返回的键/值添加到原始的dataframe中,其中键作为新列并相应的值。
我以为这会神奇地发生在result_type='expand'身上,而它显然不是。
df5["data"] = df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')最后,我的结果都在一个数据列中:
[{'title': ['Python Notebooks: Connect to Google Search Console API and Extract Data - Adapt'], 'description': []}]我的目标是一个具有以下3列的Dataframe:
| URL| Title | Description|这是我的代码:
import requests
from requests_html import HTMLSession
import pandas as pd
from urllib import parse
ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}
df5 = pd.DataFrame(ex_dic)
df5
def request_function(url):
try:
found_results = []
r = session.get(url)
title = r.html.xpath('//title/text()')
description = r.html.xpath("//meta[@name='description']/@content")
found_results.append({ 'title': title, 'description': description})
return found_results
except requests.RequestException:
print("Connectivity error")
except (KeyError):
print("anoter error")
df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')发布于 2019-04-04 14:18:49
ex_dic应该是dict列表,这样您就可以更新应用的属性。
import requests
from requests_html import HTMLSession
import pandas as pd
from urllib import parse
ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}
ex_dic['url'] = [{'url': item} for item in ex_dic['url']]
df5 = pd.DataFrame(ex_dic)
session = HTMLSession()
def request_function(url):
try:
print(url)
r = session.get(url['url'])
title = r.html.xpath('//title/text()')
description = r.html.xpath("//meta[@name='description']/@content")
url.update({ 'title': title, 'description': description})
return url
except requests.RequestException:
print("Connectivity error")
except (KeyError):
print("anoter error")
df6 = df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')
print df6发布于 2019-04-04 14:20:17
实际上,如果您的函数只返回一本字典,而不是字典列表,它实际上就会像您预期的那样工作。此外,在键中只提供一个字符串,而不是列表。那就像你期望的那样起作用了。参见我的示例代码:
import requests
import pandas as pd
from urllib import parse
ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}
df5 = pd.DataFrame(ex_dic)
#rint(df5)
def request_function(url):
return {'title': 'Python Notebooks: Connect to Google Search Console API and Extract Data - Adapt',
'description': ''}
df6 = df5.apply(lambda x: request_function(x['url']), axis=1, result_type='expand')
df7 = pd.concat([df5,df6],1)
df7给你这个:

您还可以调整lambda函数:
df6 = df5.apply(lambda x: request_function(x['url'])[0], axis=1, result_type='expand')但是您仍然需要确保键值是字符串,而不是列表。
https://stackoverflow.com/questions/55517891
复制相似问题