入门款爬虫分享

发布于 2021-10-07  78 次阅读


最近换了块27寸2k屏,原来的壁纸换起来很模糊。

所以写了个爬虫抓2k图

Ps. 推荐一个好用的图网

https:wallhaven.cc

这个爬虫花了我国庆四天大概三小时的零碎时间,由于网上的教程都是python,Java也不太会。

所以只好用python写啦:)

🐂🐂🐂🐂🐂🐂🐂🐂 🐂🐂🐂🐂🐂🐂🐂🐂

代码如下

import requests
import urllib
from bs4 import BeautifulSoup


res = requests.get('https://wallhaven.cc/search?categories=111&purity=100&resolutions=2560x1440&ratios=landscape&sorting=views&order=desc&page=2')
soup = BeautifulSoup(res.text,'html.parser')


temp = 0
for link in soup.select("a"):
  
    str1 = link.attrs.get('href')
    if link.attrs.get('href') != None :
        if len(str1) == 29:
            if link.attrs.get('href') != 'https://wallhaven.cc/untagged' :
              
              red = requests.get(link.attrs.get('href'))
              soup1 = BeautifulSoup(red.text, 'html.parser')

            
              for pic in soup1.select('img'):
                 if len(pic.attrs.get('src')) == 51:
               
                  temp += 1;
                  fileurl = "F:/4k/" + str(temp) + ".jpg"
                
                  r = requests.get(pic.attrs.get("src"))
                  with open(fileurl, 'wb') as f:
                      f.write(r.content)
                      print(str(temp) + ' is ok')

爬虫的逻辑很简单,就是解析html代码两次,筛选一下想要的图片url的长度,然后保存到本地就OK了

(BeautifulSoup真是个好东西)