python3 [爬虫入门实战] 爬虫之selenium 模拟QQ登陆抓取好友说说内容(暂留) 朴灿烈づ我的快乐病毒、 2021-06-24 15:56 330阅读 0赞 ### 很遗憾,部分数据有些问题,不过还是可以进行爬取出来的 ### 先贴上源代码 #encoding=utf8 from selenium import webdriver import re from bs4 import BeautifulSoup from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.action_chains import ActionChains import time #使用selenium # driver = webdriver.PhantomJS(executable_path="D:\\phantomjs.exe") #因为这里把phantomjs.exe 放在py35下了,所以可以不用设置 #1 使用selenium driver = webdriver.PhantomJS() # 2调用get方法,进入到网页 def getdata_byQQ(QQ): driver.get('https://user.qzone.qq.com/{}/311'.format(QQ)) # 3 让页面滚动到下面 driver.execute_script("window.scrollBy(0,3000)") time.sleep(3) driver.execute_script("window.scrollBy(0,5000)") time.sleep(3) # 4 得到页中内容 page_data = driver.page_source # print('page_data '+page_data) # 以上方法的内容是没有得到登陆许可的,所以要进行登陆 try: driver.find_element_by_id('login_div') # 找到登陆入口 a = True except: a = False if a == True: driver.switch_to_frame('login_frame') driver.find_element_by_id('switcher_plogin').click() driver.find_element_by_id('u').clear() # 选择用户名框 driver.find_element_by_id('u').send_keys('QQ号') driver.find_element_by_id('p').clear() driver.find_element_by_id('p').send_keys('QQ密码') driver.find_element_by_id('login_button').click() time.sleep(3) driver.implicitly_wait(3) try: # 是否设置了权限 driver.find_element_by_id('QM_OwnerInfo_Icon') b = True except: b = False if b == True: driver.switch_to_frame('app_canvas_frame') content = driver.find_element_by_css_selector('.content') stime = driver.find_element_by_css_selector('.c_tx.c_tx3.goDetail') for con,sti in zip(content,stime): data = { 'time':sti.text, 'shuos':con.text } print(data) pages = driver.page_source soup = BeautifulSoup.get(pages,'lxml') # 这里尝试获取cookie cookie = driver.get_cookies() cookie_dict = [] for c in cookie: ck = "{0}={1};".format(c['name'], c['value']) cookie_dict.append(ck) i = '' for c in cookie_dict: i += c print('Cookies:', i) print("==========完成================") driver.close() driver.quit() getdata_byQQ(643435675) 再贴上学习blog:[http://zmister.com/archives/98.html][http_zmister.com_archives_98.html] [http_zmister.com_archives_98.html]: http://zmister.com/archives/98.html
还没有评论,来说两句吧...