让Python帮你搞定MySQL数据库

布满荆棘的人生 2022-11-28 13:39 368阅读 0赞

format_png

Mysql是常用的数据库之一,也是面试工作必备技能之一。

本文通过一个实战,将Python与Sql语句结合起来使用,搞定MySQL数据库。

// 实战开始 //

我们在github上下载fifa18球员数据,将这些信息存入到mysql。

①数据下载地址:

https://github.com/amanthedorkknight/fifa18-all-player-statistics

②选择:Complete->basicplayerdata.csv

// 将数据从CSV导入到MySQL //

1)创建对应库与表

  1. #创建公司数据库,编码格式utf-8
  2. CREATE DATABASE fifa18_db DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
  3. #选择数据库:
  4. use fifa18_db;
  5. #创建球员表:id,名称,海报地址,俱乐部,年龄,薪资等,与csv文件对应
  6. create table player(id int Primary key auto_increment,
  7. player_id int, name char(64), age int, poster char(64),
  8. flag char(64), overall int, potential int, club char(64) default '', club_Logo char(64),
  9. value char(16), wage char(16), special int) default charset =utf8;

2)CSV读取文件

数据集中某些球员字段为空,插入时候需要补充默认值,使用DictReader读取。

  1. path = '/home/linux/workdir/data/basicplayerdata.csv'
  2. #字段名称
  3. field = ['','ID','Name','Age','Photo','Nationality','Flag','Overall','Potential','Club','Club Logo','Value','Wage','Special']
  4. f = open(path)
  5. fcsv = csv.DictReader(f, fieldnames = field)
  6. #第一行去掉
  7. line = next(fcsv)
  8. #第一行有效数据
  9. line = next(fcsv)
  10. print(line)

结果:

  1. OrderedDict([('', '0'), ('ID', '158023'), ('Name', 'L. Messi'), ('Age', '30'),
  2. ('Photo', 'https://cdn.sofifa.org/players/4/18/158023.png'), ('Nationality', 'Argentina'),
  3. ('Flag', 'https://cdn.sofifa.org/flags/52.png'),
  4. ('Overall', '94'), ('Potential', '94'), ('Club', 'FC Barcelona'),
  5. ('Club Logo', 'https://cdn.sofifa.org/teams/2/18/light/241.png'),
  6. ('Value', '€118.5M'), ('Wage', '€565K'), ('Special', '2161')])

3)读取一行并整理数据格式

CSV读取字段与数据库对应起来,且不要CSV文件种的第一列,整理数据格式为字典,key为数据库列名,value为CSV对应内容,代码实现。

  1. sqlfield = ['player_id','name','age','poster','nationality','flag','overall','potential','club','club_logo','value','wage','special']
  2. csvfield = ['ID','Name','Age','Photo','Nationality','Flag','Overall','Potential','Club','Club Logo','Value','Wage','Special']
  3. #转成字典:
  4. keysinfo = dict(zip(sqlfield, csvfield))
  5. #填充数据:
  6. data = {}
  7. for sfield, cfield in keysinfo.items():
  8. ele = line.get(cfield, '')
  9. data.setdefault(sfield, ele)
  10. print(data)

结果:

  1. {'player_id': '158023', 'name': 'L. Messi', 'age': '30', 'poster': 'https://cdn.sofifa.org/players/4/18/158023.png',
  2. 'nationality': 'Argentina', 'flag': 'https://cdn.sofifa.org/flags/52.png',
  3. 'overall': '94', 'potential': '94', 'club': 'FC Barcelona',
  4. 'club_logo': 'https://cdn.sofifa.org/teams/2/18/light/241.png',
  5. 'value': '€118.5M', 'wage': '€565K', 'special': '2161'}

4)拼接sql语句

尝试插入一条数据:获取插入字段与对应数据,拼接成sql语句。

  1. tablename = 'player'
  2. keys = data.keys()
  3. fields = ','.join(keys)
  4. vals = ','.join(["'%s'"% val for val in data.values()])
  5. sql = f"INSERT INTO {tablename}({fields}) VALUES({vals})"
  6. print(sql)

结果:

  1. INSERT INTO player(player_id,name,age,poster,nationality,flag,overall,potential,club,club_logo,value,wage,special)
  2. VALUES('158023','L. Messi','30','https://cdn.sofifa.org/players/4/18/158023.png','Argentina','https://cdn.sofifa.org/flags/52.png','94','94','FC Barcelona','https://cdn.sofifa.org/teams/2/18/light/241.png','€118.5M','€565K','2161')

5)连接数据库,并插入数据

  1. #编码格式:utf-8
  2. db = pymysql.connect("localhost","root","abc123","fifa18_db", charset='utf8')
  3. #获取游标
  4. cursor = db.cursor()
  5. cursor.execute(sql)
  6. #提交数据
  7. db.commit()
  8. #断开连接
  9. cursor.close()
  10. db.close()

查看player中信息,插入成功。

format_png 1

基本功能都已经实现,对代码进行整理,根据功能定义类及方法。

  1. import csv
  2. import pymysql
  3. class LoadDataFromCsvToMysql:
  4. def __init__(self, csvpath, csvfield, mysqlfield, table, sqlconfig):
  5. #初始化参数
  6. pass
  7. def connectSql(self):
  8. #连接数据库,获取游标
  9. pass
  10. def disconnectSql(self):
  11. #断开数据库,获取游标
  12. pass
  13. def processSql(self, sql):
  14. #处理sql语句
  15. pass
  16. def loadCsv(self):
  17. #打开csv文件,返回csv对象
  18. pass
  19. def closeCsv(self):
  20. #关闭csv文件
  21. pass
  22. def gensql(self, linedata):
  23. #生成sql语句
  24. pass
  25. def process(self):
  26. #对外接口,打开文件并写数据库
  27. pass

类定义完成之后,我们可以将功能明确的方法实现。

  1. import csv
  2. import pymysql
  3. class LoadDataFromCsvToMysql:
  4. def __init__(self, csvpath, csvfield, mysqlfield, table, sqlconfig):
  5. #初始化参数
  6. self.dbconfig = sqlconfig
  7. self.inpath = csvpath
  8. self.sqlfield = mysqlfield
  9. self.csvfield = csvfield
  10. self.fieldmap = dict(zip(mysqlfield, csvfield[1:]))
  11. self.tablename = table
  12. def connectSql(self):
  13. self.db = pymysql.connect(**self.dbconfig, charset='utf8')
  14. self.cursor = self.db.cursor()
  15. def disconnectSql(self):
  16. self.cursor.close()
  17. self.db.close()
  18. def processSql(self, sql):
  19. ret = self.cursor.execute(sql)
  20. self.db.commit()
  21. return ret
  22. def loadCsv(self):
  23. #打开csv文件
  24. self.f = open(path)
  25. fcsv = csv.DictReader(self.f, fieldnames = self.csvfield)
  26. next(fcsv)
  27. return fcsv
  28. def closeCsv(self):
  29. self.f.close()
  30. def gensql(self, linedata):
  31. pass
  32. def process(self):
  33. #测试数据库与csv文件打开关闭
  34. self.connectSql()
  35. print('connect sql...')
  36. fcsv = self.loadCsv()
  37. self.disconnectSql()
  38. self.closeCsv()
  39. print('disconnect sql') path = '/home/linux/workdir/data/basicplayerdata.csv'
  40. sqlfield= ['player_id','name','age','poster','nationality','flag','overall','potential','club','club_logo','value','wage','special']
  41. csvfield = ['','ID','Name','Age','Photo','Nationality','Flag','Overall','Potential','Club','Club Logo','Value','Wage','Special']
  42. dbconfig = {'host':'localhost','port':3306,'user':'root','passwd':'abc123','db':'fifa18_db'}
  43. obj = LoadDataFromCsvToMysql(path, csvfield, sqlfield, 'player', dbconfig)
  44. obj.process()

输出:

  1. connect sql...
  2. load csvfile
  3. disconnect sql

6)完善process与gensql

先完善process方法实现:

  1. def process(self):
  2. #测试数据库与csv文件打开关闭
  3. self.connectSql()
  4. print('connect sql...')
  5. fcsv = self.loadCsv()
  6. for line in fcsv:
  7. sql = self.gensql(line)
  8. print(sql)
  9. n = self.processSql(sql)
  10. self.disconnectSql()
  11. self.closeCsv()
  12. print('disconnect sql')

然后完善gensql方法,将前面实现添加到此方法中。

  1. def gensql(self, linedata):
  2. #产生sql语句
  3. data = {}
  4. for sfield, cfield in self.fieldmap.items():
  5. ele = linedata.get(cfield, '')
  6. data.setdefault(sfield, ele)
  7. tablename = self.tablename
  8. keys = data.keys()
  9. fields = ','.join(keys)
  10. vals = ','.join(["'%s'"% val for val in data.values()])
  11. sql = f"INSERT INTO {tablename}({fields}) VALUES({vals})"
  12. return sql

测试前将player表中内容删除,然后使用代码先插入一条,并查看结果;如果测试没有问题,可以插入所有数据。实际运行出现问题:sql语句错误 ,字符串拼接错误。

出错sql语句:

  1. "('158023','L. Messi','30','https://cdn.sofifa.org/players/4/18/158023.png',..)"

双引号+单引号,但是文件中,某些字符串带单引号,所以出现字段错误。

修改sql拼接方法:

  1. def gensql(self, linedata):
  2. #产生sql语句
  3. data = {}
  4. for sfield, cfield in self.fieldmap.items():
  5. ele = linedata.get(cfield, '')
  6. data.setdefault(sfield, ele)
  7. tablename = self.tablename
  8. keys = data.keys()
  9. fields = ','.join(keys)
  10. vals = ','.join(['"%s"'% val for val in data.values()])
  11. sql = f'INSERT INTO {tablename}({fields}) VALUES({vals})'
  12. return sql

运行结果:

  1. dstram","18","https://cdn.sofifa.org/players/4/18/238813.png","England","https://cdn.sofifa.org/flags/14.png","47","65","Crewe Alexandra","https://cdn.sofifa.org/teams/2/18/light/121.png","€60K","€1K","1305")
  2. INSERT INTO player(player_id,name,age,poster,nationality,flag,overall,potential,club,club_logo,value,wage,special)
  3. VALUES("238306","A. Conway","19","https://cdn.sofifa.org/players/4/18/238306.png","Republic of Ireland","https://cdn.sofifa.org/flags/25.png","47","63","Galway United","https://cdn.sofifa.org/teams/2/18/light/1571.png","€60K","€1K","1314")
  4. disconnect sql
  5. 1555235373.5942054 1555235392.2122808

插入18000条数据,花费时间大概为20S左右;后续优化:每次插入500条数据,然后在查看花费时间,这个大家可以参考前面案例自己实现。

// 在SQL中查询数据 //

需求:查询player表中阿根廷国家球员姓名,年龄,头像信息。读者朋友可以自己尝试去实现,考虑使用继承。

①sql中查询数据我们步骤:连接数据库,执行sql语句,关闭数据库;

②查询与写入很多方法通用,考虑继承LoadDataFromCsvToMysql类;

③需要重载init,process,processSql方法;

代码实现:

  1. class QueryMysql(LoadDataFromCsvToMysql):
  2. def __init__(self, sqlconfig):
  3. #调用父类方法,初始化传一些无效参数
  4. super(QueryMysql, self).__init__('',[], [], '', dbconfig)
  5. def genSql(self, table, fields, condition=None):
  6. #查询语句生成
  7. fds = ','.join(fields)
  8. cond = ''
  9. print(condition)
  10. if condition:
  11. cond = f' where {condition}'
  12. sql = f'select {fds} from {table}{cond}'
  13. return sql
  14. def process(self, tablename, fields, condition=None):
  15. #对外接口
  16. #调用父类中的连接数据库,关闭数据库方法
  17. self.connectSql()
  18. sql = self.genSql(tablename, fields, condition)
  19. self.processSql(sql)
  20. items = self.cursor.fetchall()
  21. for item in items:
  22. print(item)
  23. print('all Argentina player:', len(items))
  24. self.disconnectSql() dbconfig= {'host':'localhost','port':3306,'user':'root','passwd':'abc123','db':'fifa18_db'}
  25. tablename = 'player'
  26. files = ['name','poster', 'age']
  27. obj = QueryMysql(dbconfig)
  28. cond = 'nationality="Argentina"'
  29. obj.process(tablename, files, cond)

结果:

  1. ('L. Messi', 'https://cdn.sofifa.org/players/4/18/158023.png', 30)
  2. ('G. Higuaín', 'https://cdn.sofifa.org/players/4/18/167664.png', 29)
  3. ('P. Dybala', 'https://cdn.sofifa.org/players/4/18/211110.png', 23)...('A. Miño', 'https://cdn.sofifa.org/players/4/18/243298.png', 23)
  4. ('T. Durso', 'https://cdn.sofifa.org/players/4/18/240955.png', 18)
  5. ('K. Humeler', 'https://cdn.sofifa.org/players/4/18/240291.png', 20)
  6. ('J. Mendive', 'https://cdn.sofifa.org/players/4/18/241584.png', 20)
  7. all Argentina player: 966

结果:

数据集中一共有966名Argentina球员。

转载自:https://blog.csdn.net/LG\_227/article/details/105148841

format_png 2

好文章,我 在看❤

发表评论

表情:
评论列表 (有 0 条评论,368人围观)

还没有评论,来说两句吧...

相关阅读