PHP爬取百度热歌

快来打我* 2022-02-23 02:44 274阅读 0赞

使用php扩展curl爬取百度热歌单曲

要求PHP扩展CURL
爬虫主要是运用正则技术

  1. <?php
  2. /*
  3. 抓取网站链接(http://music.baidu.com/tag/tagname),分析匹配对应的html内容,页面数据格式如下:
  4. <a href="http://music.baidu.com/song/121353608" target="_blank" class="" data-provider="" title="刘珂矣 半壶纱">半壶纱</a>
  5. 之后生成php文件,格式为
  6. <?php
  7. return array();
  8. ?>
  9. */
  10. class Fetch {
  11. function getData($url) {
  12. $data = array();
  13. $str = $this->http($url);
  14. if($str) {
  15. $data = $this->parseHtml($str);
  16. }
  17. return $data;
  18. }
  19. function http($url) {
  20. //No.1
  21. //开始写代码,根据所给链接抓取网站内容
  22. // $curl = $url;
  23. $ch = curl_init();
  24. curl_setopt($ch, CURLOPT_URL, $url);
  25. curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //将curl_exec()获取的信息以文件流的形式返回,而不是直接输出。
  26. curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  27. $output = curl_exec($ch);
  28. return $output;
  29. //end_code
  30. }
  31. function parseHtml($str) {
  32. $ids = array(); //百度歌曲id
  33. $titles = array(); //歌曲名
  34. $names = array(); //歌手名
  35. //No.2
  36. //开始写代码,解析页面内容,获得歌曲编号、歌曲名、艺人名字
  37. $pattern = '/href="\/song\/\d*/';
  38. preg_match_all($pattern, $str, $matches);
  39. foreach ($matches[0] as $v) {
  40. $ids[] = strtok($v, 'href="/song/');
  41. }
  42. $pattern = '/title="收藏\D+" href="#">/'; //歌曲名
  43. preg_match_all($pattern, $str, $matches);
  44. foreach ($matches[0] as $v) {
  45. $titles[] = strtr($v, array('title="收藏'=>'', '" href="#">'=>''));
  46. }
  47. $pattern = '/author_list" title="\D+">/'; //歌手名
  48. preg_match_all($pattern, $str, $matches);
  49. foreach ($matches[0] as $v) {
  50. $names[] = strtr($v, array('author_list" title="'=>'', '">'=>''));
  51. }
  52. //合并数组
  53. foreach ($ids as $key => $value) {
  54. $coalesce[$key]['id'] = $ids[$key];
  55. $coalesce[$key]['title'] = $titles[$key];
  56. $coalesce[$key]['name'] = $names[$key];
  57. }
  58. $url = array();
  59. foreach ($coalesce as $v) {
  60. $url[] = '<a href="http://music.baidu.com/song/'.$v['id'].'" target="_blank" class="" data-provider="" title="'.$v['name'].' '.$v['title'].'">'.$v['title'].'</a>';
  61. }
  62. return $url;
  63. //end_code
  64. }
  65. }
  66. $url = 'http://music.baidu.com/tag/%E7%83%AD%E6%AD%8C';
  67. $fetch = new Fetch();
  68. $data = $fetch->getData($url);
  69. print_r($data);

发表评论

表情:
评论列表 (有 0 条评论,274人围观)

还没有评论,来说两句吧...

相关阅读