爬取图片

痛定思痛。 2022-06-07 02:54 367阅读 0赞

-——

爬取百度贴吧的图片:

# coding=utf-8

import urllib

import re

def getHtml(url):

page = urllib.urlopen(url)

html = page.read()

return html

def getImg(html):

reg = r’src=”(.+?\.jpg)” pic_ext’

imgre = re. compile(reg)

imglist = re. findall(imgre,html)

x = 0

for imgurl in imglist:

urllib.urlretrieve(imgurl,’%s.jpg’ % x)

x+=1

html = getHtml(“http://tieba.baidu.com/p/2460150866“)

print getImg(html)

#————————————————————————————————————————————————————-

爬取豆瓣的图片:

#!/usr/bin/python

# encoding:utf-8

import urllib2

import re

a = urllib2.urlopen(‘https://movie.douban.com/').read()

b = re.findall(r’https://.+\\.jpg‘, a)

i = 0

try:

for c in b:

f = open(str(i) + ‘.jpg’, ‘wb’)

req = urllib2.urlopen(c)

buf = req.read()

f. write(buf)

i += 1

f.close()

except Exception as e:

print e

发表评论

表情:
评论列表 (有 0 条评论,367人围观)

还没有评论,来说两句吧...

相关阅读