Python爬虫实战（2）——抓取知乎热门帖并存储到Mysql数据库-蒲公英云

Python爬虫实战（2）——抓取知乎热门帖并存储到Mysql数据库

野性酷女 2023-10-17 09:16 145阅读 0赞

首先了解一些基础性的理解知识：

python基础知识——正则表达式,_MISAYAONE的博客-CSDN博客正则表达式

Python基础知识——爬虫入门_MISAYAONE的博客-CSDN博客_python基础和网络爬虫爬虫入门

http://blog.csdn.net/misayaaaaa/article/details/53079953mysql 数据库相关

样例代码(有详细注释)：

#coding=utf-8
import urllib2
import re
import MySQLdb
conn = MySQLdb.connect(host='localhost', db='test', user='root', passwd='root1',charset='utf8')
#connect()方法用于创建与数据库的连接，里面可以指定参数，这一步只是连接到了数据库，操作数据库还需要下面的游标
cur = conn.cursor()#通过获取到的conn数据库的cursor方法创建游标
url = 'http://www.zhihu.com/topic/19607535/top-answers'
netthings= urllib2.urlopen(url).read()
print netthings  #使用urllib2模块爬虫抓取了特定网页
list = re.findall('<a class="question_link"(.*?)/a>', netthings, re.S)
#re模块的findall方法可以以列表的形式返回匹配的字符串，re.S表示多行匹配
print list  #使用re模块的正则表达式，将目标字符串存入list
#正则表达式的匹配也是一个难点
p = '>(.*?)<'
for x in list:
    title = re.search(p, x, re.S).group(1)
    hot = "insert into test(title) values('%s')" % title
    print hot
    cur.execute(hot)
    conn.commit()  #提交数据进数据库
conn.close()  #关闭数据库连接

cmd中查看表单的结果：

Center

在这其中遇到了很多的问题，一一分享出来：

1、Mysql创建、查看、删除对应表单。

创建：create table命令用来创建数据表。

create table命令格式：create table <表名> (<字段名1> <类型1> [,..<字段名n> <类型n>]);

例如，建立一个名为MyClass的表：

字段名	数字类型	数据宽度	是否为空	是否主键	自动增加	默认值
id	int	4	否	primary key	auto_increment
name	char	20	否
sex	int	4	否			0
degree	double	16	是

在这题中，创建了一个test表：create table test(title char(200)); (注意需要在use database之后)

查看表单内容 select * from test; 查询test数据表中的所有数据

删除表 DROP TABLE test; 将表单test删除

2、

# -*- coding: utf-8 -*-       
#mysqldb      
import time, MySQLdb      
#连接      
conn=MySQLdb.connect(host="localhost",user="root",passwd="root",db="test",charset="utf8")    
cursor = conn.cursor()      
#删除表  
sql = "drop table if exists user"  
cursor.execute(sql)  
#创建  
sql = "create table if not exists user(name varchar(128) primary key, created int(10))"  
cursor.execute(sql)  
#写入      
sql = "insert into user(name,created) values(%s,%s)"     
param = ("aaa",int(time.time()))      
n = cursor.execute(sql,param)      
print 'insert',n      
#写入多行      
sql = "insert into user(name,created) values(%s,%s)"     
param = (("bbb",int(time.time())), ("ccc",33), ("ddd",44) )  
n = cursor.executemany(sql,param)      
print 'insertmany',n      
#更新      
sql = "update user set name=%s where name='aaa'"     
param = ("zzz")      
n = cursor.execute(sql,param)      
print 'update',n      
#查询      
n = cursor.execute("select * from user")      
for row in cursor.fetchall():      
    print row  
    for r in row:      
        print r      
#删除      
sql = "delete from user where name=%s"     
param =("bbb")      
n = cursor.execute(sql,param)      
print 'delete',n      
#查询      
n = cursor.execute("select * from user")      
print cursor.fetchall()      
cursor.close()      
#提交      
conn.commit()  
#关闭      
conn.close()

一个简单的样例，学习python对mysql的操作。

3、运行出错： ERROR 1045 (28000): Access denied for user ‘mysql’@’localhost’ (using password: NO)

这个问题的话肯定是你把下面的给弄错了，也就是用户和密码，google一下怎么改密码就好了。

user="root",passwd="root"

4、python连接mysql数据库，出现：(2003, “Can’t connect to MySQL server on ‘localhost’ (10061)”

conn=MySQLdb.connect(host="localhost",user="root",passwd="root",db="test",charset="utf8")

将localhost改为127.0.0.1。

Python爬虫实战（2）——抓取知乎热门帖并存储到Mysql数据库

发表评论取消回复

还没有评论，来说两句吧...

相关阅读

相关爬虫实战：Python爬虫项目，从构建网页抓取器到存储结果

相关 Python爬虫实战（1）——抓取帖子并保存内容和图片

相关 Python爬虫实战（2）——抓取知乎热门帖并存储到Mysql数据库

相关爬虫实战(一)-新版知乎

相关 python爬虫实战（一）--爬取知乎话题图片

相关知乎发现页面抓取（只改User-Agent爬虫）

相关【爬虫实战】scrapy实战：爬取知乎用户信息

相关 scrapy抓取知乎全部用户信息

相关 Python 抓取知乎图片（selenium的示例）

相关 python爬虫之知乎（requests方式）

随便看看

TortoiseGit 推送本地仓库变动文件至远程仓库_入门试炼_06

[C++] Value Categories

一文教你学会腾讯云储存配置

K8S CronJob简单入门，和手动重复操作Say Goodbye！

vue的 DatePicker 日期选择器，把选择的日期转化成需要格式传值

奇偶数分离

教程文章

热评文章

1江湖小白之一起学Python （二）爬取数据的保存

2Java Shiro：简化身份验证和授权的安全框架

3Java中try()catch{}的使用方法

4Swagger注解-@ApiModel 和 @ApiModelProperty

5windows下强制杀死tomcat进程

6uni-app 条形码(一维码)/二维码生成实现

标签列表