Python3 -- 基于Splinter工具爬取网页资源
我这篇:博客中的内容,就是通过以下python代码,爬取的。参考资源:https://hackernoon.com/mastering-python-web-scraping-get-your-data-back-e9a5cc653d88引用2个python第三方模块- splinter- pandas#!/usr/bin/python# -*- coding: utf-8
·
我这篇:博客中的内容,就是通过以下python代码,爬取的。
参考资源:https://hackernoon.com/mastering-python-web-scraping-get-your-data-back-e9a5cc653d88
引用2个python第三方模块
- splinter
- pandas
#!/usr/bin/python
# -*- coding: utf-8 -*-
from splinter import Browser
import pandas as pd
# open a browser
browser = Browser('chrome')
browser.visit('https://medium.mybridge.co/python-top-45-tutorials-for-the-past-year-v-2018-1b4d46c9e857')
# I recommend using single quotes
# search_bar_xpath = '//*[@id="lst-ib"]'
# search_bar = browser.find_by_xpath(search_bar_xpath)[0]
#
# search_bar.fill("CodingStartups.com")
# search_button_xpath = '//*[@id="tsf"]/div[2]/div[3]/center/input[1]'
# search_button = browser.find_by_xpath(search_button_xpath)[0]
# search_button.click()
# //*[@id="rso"]/div/div/div[1]/div/div/h3/a
search_results_xpath = '//*[@class="graf graf--p graf-after--figure"]/a' # simple, right?
search_results = browser.find_by_xpath(search_results_xpath)
scraped_data = []
for search_result in search_results:
title = search_result.text # trust me
print(title)
link = search_result["href"]
scraped_data.append((title, link)) # put in tuples
df = pd.DataFrame(data=scraped_data, columns=["Title", "Link"])
df.to_csv("links.csv", encoding='utf_8_sig')
更多推荐
已为社区贡献1条内容
所有评论(0)