# 爬虫案例 ## 需求 - 壁纸获取的网站:https://www.bizhihui.com/dongman/1/?order=time - 网址中的1,表示第一页,如果改成别的数字,就是其他页数 - 写一个程序,执行的时候询问需要获取几页的图片,然后自动爬取页面上的图片到images文件夹里面 ## 参考代码 ```python from urllib.request import urlopen, Request, build_opener, install_opener, HTTPSHandler import re, os import ssl # 创建一个HTTPS处理器 context = ssl._create_unverified_context() # 创建一个opener对象 opener = build_opener(HTTPSHandler(context=context)) # 安装opener install_opener(opener) tmp = [] pages = input("你要下载多少页:") if pages.isdigit(): pages = int(pages) else: raise "你输入不是数字!" for i in range(1, pages + 1): user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' url = f'https://www.bizhihui.com/dongman/{i}/?order=time' req = Request(url, headers={'User-Agent': user_agent}) content = urlopen(req).read().decode('utf-8') link = re.findall('(.*?)', content) tmp.extend(link) path = os.getcwd() + os.sep + 'images' if not os.path.isdir(path): os.makedirs(path) for i in tmp: jpgname = i[-1] jpgname = jpgname.replace(' ', '_') req = Request(i[0], headers={'User-Agent': user_agent}) data = urlopen(req).read() with open(f'{path}{os.sep}{jpgname}.jpg', 'wb') as f: f.write(data) print(jpgname, "已下载") input("下载完成,按回车结束....") ``` >作业9.1提交的内容 > >- 理解程序的运行逻辑 >- 程序运行成功的截图,单独发送给组长 >- 爬虫抓取到的壁纸也要截图发给组长看下