Python3 -- 基于Splinter工具爬取网页资源

我这篇：博客中的内容，就是通过以下python代码，爬取的。参考资源：https://hackernoon.com/mastering-python-web-scraping-get-your-data-back-e9a5cc653d88引用2个python第三方模块- splinter- pandas#!/usr/bin/python# -*- coding: utf-8

GP0071

3154人浏览 · 2018-02-07 14:30:06

GP0071 · 2018-02-07 14:30:06 发布

我这篇：博客中的内容，就是通过以下python代码，爬取的。
参考资源：https://hackernoon.com/mastering-python-web-scraping-get-your-data-back-e9a5cc653d88

引用2个python第三方模块
- splinter
- pandas

#!/usr/bin/python
# -*- coding: utf-8 -*-


from splinter import Browser
import pandas as pd

# open a browser
browser = Browser('chrome')
browser.visit('https://medium.mybridge.co/python-top-45-tutorials-for-the-past-year-v-2018-1b4d46c9e857')

# I recommend using single quotes
# search_bar_xpath = '//*[@id="lst-ib"]'
# search_bar = browser.find_by_xpath(search_bar_xpath)[0]
#
# search_bar.fill("CodingStartups.com")

# search_button_xpath = '//*[@id="tsf"]/div[2]/div[3]/center/input[1]'
# search_button = browser.find_by_xpath(search_button_xpath)[0]
# search_button.click()

# //*[@id="rso"]/div/div/div[1]/div/div/h3/a
search_results_xpath = '//*[@class="graf graf--p graf-after--figure"]/a'  # simple, right?
search_results = browser.find_by_xpath(search_results_xpath)

scraped_data = []
for search_result in search_results:
    title = search_result.text  # trust me
    print(title)
    link = search_result["href"]
    scraped_data.append((title, link))  # put in tuples

df = pd.DataFrame(data=scraped_data, columns=["Title", "Link"])
df.to_csv("links.csv", encoding='utf_8_sig')