一直都有在csdn上发文章,但是从来不知道csdn每年还有一个博客之星的活动,最近偶然看到去了解了下感觉挺有意思的,竞选还有一定的规则,比如对报名的用户的粉丝量,文章原创量,活跃程度等有一定的参考标准,然后默默的看了下自己的数据,差的十万八千里哦。虽然没有资格参加这个竞赛,但是呢我们可以利用爬虫技术看下往年的那些博客之星有啊,通过数据膜拜下大佬们的实力。 首选我们找到目标地址:http://csdn.bytedemo.com/getStatistics 然后通过python获取数据再进行 数据分析,找到请求接口后,使用 Python 爬虫很容易将其爬取下来,然后对数据进行处理,按照票数进行排名,完整代码如下: import org.apache.commons.httpclient.Credentials;import org.apache.commons.httpclient.HostConfiguration;import org.apache.commons.httpclient.HttpClient;import org.apache.commons.httpclient.HttpMethod;import org.apache.commons.httpclient.HttpStatus;import org.apache.commons.httpclient.UsernamePasswordCredentials;import org.apache.commons.httpclient.auth.AuthScope;import org.apache.commons.httpclient.methods.GetMethod;import java.io.IOException;public class Main { # 代理服务器(产品官网 www.16yun.cn) private static final String PROXY_HOST = "t.16yun.cn"; private static final int PROXY_PORT = 31111; public static void main(String[] args) { HttpClient client = new HttpClient(); HttpMethod method = new GetMethod("https://httpbin.org/ip"); HostConfiguration config = client.getHostConfiguration(); config.setProxy(PROXY_HOST, PROXY_PORT); client.getParams().setAuthenticationPreemptive(true); String username = "16ABCCKJ"; String password = "712323"; Credentials credentials = new UsernamePasswordCredentials(username, password); AuthScope authScope = new AuthScope(PROXY_HOST, PROXY_PORT); client.getState().setProxyCredentials(authScope, credentials); try { client.executeMethod(method); if (method.getStatusCode() == HttpStatus.SC_OK) { String response = method.getResponseBodyAsString(); System.out.println("Response = " + response); } } catch (IOException e) { e.printStackTrace(); } finally { method.releaseConnection(); } }}根据获取到的数据看了下以往的博客之星的数据,果然是大佬,粉丝量,数据量都是我们遥不可及的,希望自己多多发文,写好的文章发给大家看,收获更多的粉丝,让我也有机会参与这样打的竞选活动呀!
|