爬取B站绘画及相簿区的美图

Viewer:426目录 代码

最近感觉B站上面的小姐姐都很好看,就随手扒了扒B站的接口,随便选择一个作品,点进去按F12打开开发者控制台,然后选择网络标签,查看XHR请求。

在XHR请求中有一条很特殊:

https://api.vc.bilibili.com/link_draw/v1/doc/detail?doc_id=1226480

请求即可得到以下信息:

{"code":0,"msg":"success","message":"success","data":{"user":{"uid":8615585,"head_url":"https://i1.hdslb.com/bfs/face/f05440d2299637ddef3ba87b399accc3440ed387.jpg","name":"chiyo小千代","upload_count":13},"item":{"biz":2,"doc_id":1226480,"poster_uid":8615585,"category":"sifu","type":0,"title":"ANP中村十字","tags":[{"tag":"摄影扶持计划","type":2,"category":"sifu","text":"摄影扶持计划","name":"摄影扶持计划","link":"https://www.bilibili.com/blackboard/PhotoSupport.html"},{"tag":"lolita","type":3,"category":"sifu","text":"lolita","name":"lolita"},{"tag":"ANP","type":3,"category":"sifu","text":"ANP","name":"ANP"},{"tag":"BABY,TSSB","type":3,"category":"sifu","text":"BABY,TSSB","name":"BABY,TSSB"}],"pictures":[{"img_src":"https://i0.hdslb.com/bfs/album/12bddc347234e8c4a045e79acaa5b8916e3ea9c6.jpg","img_width":1200,"img_height":1265,"img_size":1184},{"img_src":"https://i0.hdslb.com/bfs/album/a1ad670db0968adaf57accc1a3b6946994ddec1b.jpg","img_width":871,"img_height":918,"img_size":685},{"img_src":"https://i0.hdslb.com/bfs/album/791b058d10aaf9198650da700fbb7461194a458c.jpg","img_width":1600,"img_height":1687,"img_size":1837},{"img_src":"https://i0.hdslb.com/bfs/album/5c48a98d3e91947b6e199f8e612bd00ee83d77ed.jpg","img_width":1200,"img_height":1265,"img_size":1168},{"img_src":"https://i0.hdslb.com/bfs/album/499ccd7e043d22a6cc733d6dfb0afc214f7e79f2.jpg","img_width":1600,"img_height":1687,"img_size":1784},{"img_src":"https://i0.hdslb.com/bfs/album/d7f8bbb40a8c740a38922f3b7b1d9a093c9671c2.jpg","img_width":1600,"img_height":1687,"img_size":1597},{"img_src":"https://i0.hdslb.com/bfs/album/16cf7587961b9f04b162f83c6dc51469cca52af3.jpg","img_width":1200,"img_height":1265,"img_size":1209}],"source":null,"upload_time":"2017-12-05 19:46:43","upload_timestamp":1512474403,"upload_time_text":"5月前","description":"ginger doll\n\n\nCN:Chiyo\nPHX:独角兽","role":null,"settings":{"copy_forbidden":3},"already_collected":0,"already_liked":0,"user_status":0,"at_control":"","extension":"","view_count":1209,"like_count":0,"collect_count":6,"verify_status":1,"already_voted":0,"vote_count":12,"comment_count":3}}}

经过格式化后,很容易就能看出里面包含的各种信息,包括上传的作品标签,上传时间,作品ID等一些数据,我们只取其中有用的部分。

话说B站的小姐姐是真的好看23333

我用易语言完成这部分任务,批量抓取内容并保存到数据库,现在抓取了大概有几百万条数据,可以写一个脚本批量进行下载,不过由于数据量非常庞大,这里不做演示,PHP和Python有更简单的解决方法,可以通过Tag筛选出想要的分类进行下载。

暂无评论

发表评论