1.开局扯犊子

上次编写了爬取笔趣阁小说的示例，这次咱就上网抑云借几首歌听听

2.页面分析

首先咱先进入我们的目标网站 music.163.com/#/artist?id…

找到我们的目标列表 song-list-pre-cache 以及最重要的歌曲id。

因为网抑云有做登录所以还得拿到我们自己的cookie。

这里由于原链有加密，咱不会破解，后面在某度中找到了个连接music.163.com/song/media/… 通过这个链接我们能直接获取到歌曲的MP3地址，接下来就是代码的实现了。

3.代码实现

首先还是引入依赖

xml复制代码<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.6</version>
</dependency>
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.10.2</version>
</dependency>

主要代码

ini复制代码public class MusicSpider {


    private String downloadUrl = "http://music.163.com/song/media/outer/url?id=";

    private String path = "D:/music/";

    /**
     * 获取歌曲列表
     * @param url
     */
    public void getSongList(String url) {
        Document document = getHtml(url);
        if (document != null) {
            Elements elements = document.select("#song-list-pre-data");
            System.out.println("json数据如下" + elements.text());
            String resJson = elements.text();  //解析歌曲列表
            JSONArray jsonArray = JSONObject.parseArray(resJson);
            //遍历获取歌曲信息
            for (int i = 0; i < jsonArray.size(); i++) {
                Music music = new Music();
                String singer = jsonArray.getJSONObject(i).getJSONArray("artists").getJSONObject(0).get("name").toString();
                music.setSinger(singer);
                music.setSongUrl(downloadUrl + jsonArray.getJSONObject(i).get("id").toString());
                music.setSong(jsonArray.getJSONObject(i).get("name").toString());
                try {
                    if(i<jsonArray.size()){
                        Thread.sleep(10000);
                        System.out.println("休息10秒继续爬  ");
                        downloadFile(music.getSongUrl(), path + music.getSong() + "-" + music.getSinger() + ".mp3");
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }

        }
    }


    /**
     * 下载文件
     * @param fileUrl       文件地址
     * @param fileLocal     文件保存地址
     * @throws Exception
     */
    public void downloadFile(String fileUrl, String fileLocal) throws Exception {
        URL url = new URL(fileUrl);
        HttpURLConnection urlCon = (HttpURLConnection) url.openConnection();
        urlCon.setConnectTimeout(6000);
        urlCon.setReadTimeout(6000);
        int code = urlCon.getResponseCode();
        if (code != HttpURLConnection.HTTP_OK) {
            throw new Exception("文件读取失败");
        }

        //读文件流；
        DataInputStream in = new DataInputStream(urlCon.getInputStream());
        DataOutputStream out = new DataOutputStream(new FileOutputStream(fileLocal));
        byte[] buffer = new byte[2048];
        int count = 0;
        while ((count = in.read(buffer)) > 0) {
            out.write(buffer, 0, count);
        }
        out.close();
        in.close();

    }

    /**
     * 获得页面
     * @param url
     * @return
     */
    private Document getHtml(String url) {
        List<Header> headerList = new ArrayList<>();
        headerList.add(new BasicHeader("origin", "music.163.com"));
        headerList.add(new BasicHeader("referer", "https://music.163.com/"));
        headerList.add(new BasicHeader("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"));
        headerList.add(new BasicHeader("cookie", "这里是储存你登录的cookie"));
        String result = HttpClientUtil.doGet(url, headerList, "utf-8");
        if (result != null && !result.contains("n-for404")) {
            return Jsoup.parse(result);
        }
        return null;
    }

}

储存信息的实体类

arduino复制代码@Data
public class Music {

    private String singer;

    private String song;

    private String songUrl;

    public Music(){

    }

    public Music(String singer,String song,String songUrl){
        this.singer = singer;
        this.song = song;
        this.songUrl =songUrl;
    }
}

最后启动

typescript复制代码 public static void main(String[] args){
        MusicSpider spider = new MusicSpider();
        //这里需要自己手动更改歌手的id  ，不知道为什么拿不到排行榜的数据，暂时先这样，以后如果解决了再修改
        spider.getSongList("https://music.163.com/artist?id=3684");
}

4.运行效果

5.总结

基础功能是实现了，不过由于有些奇奇怪怪的问题，拿不到排行榜的内容，只能暂时通过半自动的方式来拿歌曲，各位大佬如果有解决方法可以联系我，还有部分歌曲因为版权等原因是拿不到具体的数据的。害坑还是挺多的。

本文转载自: 掘金

开发者博客 – 和开发相关的这里全都有