【爬虫】python爬取微博热搜榜top50

待我称王封你为后i 2022-08-30 05:30 466阅读 0赞

1.说明

微博热搜很有用,不用登录可以访问网页版https://s.weibo.com/top/summary,我们可以通过requests爬取下来并且整理。请注意,我们要有良好的觉悟,不要随意爬取增加微博服务器压力,以下代码仅供学习

2.代码

  1. from lxml import etree
  2. import requests
  3. def get_weibo_top():
  4. url = "https://s.weibo.com/top/summary?cate=realtimehot"
  5. request = requests.get(url)
  6. html = etree.HTML(request.text)
  7. nodes = html.xpath("//div[@class='data']/table/tbody/tr")
  8. all_hot_list = []
  9. for tr_node in nodes[1:]:
  10. rank_top = tr_node.xpath('./td[1]/text()')[0]
  11. if not rank_top or not rank_top.isdigit():
  12. continue
  13. keyword = tr_node.xpath('./td[2]/a/text()')[0]
  14. search_nums = tr_node.xpath('./td[2]/span/text()')[0]
  15. search_url = "https://s.weibo.com" + tr_node.xpath('./td[2]/a/@href')[0]
  16. hot_object = {
  17. "rank_top": rank_top,
  18. "keyword": keyword,
  19. "search_nums": search_nums,
  20. "search_url": search_url
  21. }
  22. all_hot_list.append(hot_object)
  23. return all_hot_list
  24. if __name__ == '__main__':
  25. print(get_weibo_top())

发表评论

表情:
评论列表 (有 0 条评论,466人围观)

还没有评论,来说两句吧...

相关阅读