{"id":385,"date":"2025-12-02T15:14:26","date_gmt":"2025-12-02T07:14:26","guid":{"rendered":"https:\/\/www.52runoob.com\/?p=385"},"modified":"2025-12-02T15:14:26","modified_gmt":"2025-12-02T07:14:26","slug":"%e3%80%90java%e3%80%91%e7%88%ac%e8%99%ab_java%e7%88%ac%e8%99%ab","status":"publish","type":"post","link":"https:\/\/www.52runoob.com\/index.php\/2025\/12\/02\/%e3%80%90java%e3%80%91%e7%88%ac%e8%99%ab_java%e7%88%ac%e8%99%ab\/","title":{"rendered":"\u3010Java\u3011\u722c\u866b_java\u722c\u866b"},"content":{"rendered":"\n<p>\u4e0b\u9762\u7ed9\u4f60\u4e00\u4efd <strong>\u6700\u5b9e\u7528\u3001\u6700\u5b8c\u6574\u7684\u300aJava \u722c\u866b\u5165\u95e8\u4e0e\u5b9e\u6218\u6307\u5357\u300b<\/strong>\uff0c\u5305\u542b\u539f\u7406\u3001\u5e38\u7528\u5e93\u3001\u793a\u4f8b\u4ee3\u7801\u3001\u53cd\u722c\u7a81\u7834\u65b9\u6cd5\uff0c\u4ee5\u53ca\u4f60\u80fd\u7acb\u523b\u8dd1\u8d77\u6765\u7684 Java \u722c\u866b Demo\u3002<\/p>\n\n\n\n<p>\u975e\u5e38\u9002\u5408\u521a\u63a5\u89e6 Java \u722c\u866b\u6216\u51c6\u5907\u5199\u9879\u76ee\u7684\u4f60\u3002<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\udd77\ufe0f Java \u722c\u866b\u5165\u95e8\u4e0e\u5b9e\u6218\u6307\u5357\uff082025 \u7a33\u5b9a\u7248\uff09<\/h1>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u4e00\u3001Java \u722c\u866b\u80fd\u505a\u4ec0\u4e48\uff1f<\/h1>\n\n\n\n<p>Java \u722c\u866b\u5e38\u7528\u4e8e\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u91c7\u96c6\u7f51\u9875\u6570\u636e\uff08\u65b0\u95fb\u3001\u5546\u54c1\u3001\u8bc4\u8bba\uff09<\/li>\n\n\n\n<li>\u8c03\u7528\u63a5\u53e3\u722c JSON \u6570\u636e<\/li>\n\n\n\n<li>\u5b9e\u73b0\u5927\u89c4\u6a21\u591a\u7ebf\u7a0b\u722c\u53d6<\/li>\n\n\n\n<li>\u4f01\u4e1a\u7ea7\u722c\u866b\uff08\u5206\u5e03\u5f0f\u3001\u8c03\u5ea6\u3001\u5165\u5e93\uff09<\/li>\n<\/ul>\n\n\n\n<p>\u5982\u679c\u4f60\u505a\u6570\u636e\u9879\u76ee\u3001NLP \u60c5\u611f\u5206\u6790\u3001\u6570\u636e\u6cbb\u7406\uff0c\u8fd9\u975e\u5e38\u9002\u7528\u3002<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u4e8c\u3001Java \u722c\u866b\u57fa\u7840\u6d41\u7a0b<\/h1>\n\n\n\n<p>\u722c\u866b\u6838\u5fc3 4 \u6b65\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>\u53d1\u9001\u8bf7\u6c42\uff08HTTP GET\/POST\uff09<\/strong><\/li>\n\n\n\n<li><strong>\u83b7\u53d6\u7f51\u9875\u5185\u5bb9\uff08HTML \/ JSON\uff09<\/strong><\/li>\n\n\n\n<li><strong>\u89e3\u6790\u6570\u636e\uff08CSS \/ XPath \/ JSON\uff09<\/strong><\/li>\n\n\n\n<li><strong>\u5b58\u50a8\u6570\u636e\uff08\u6570\u636e\u5e93 \/ \u6587\u4ef6\uff09<\/strong><\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u4e09\u3001\u5e38\u7528 Java \u722c\u866b\u5e93<\/h1>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u529f\u80fd<\/th><th>\u5e93<\/th><th>\u8bf4\u660e<\/th><\/tr><\/thead><tbody><tr><td>\u53d1 HTTP \u8bf7\u6c42<\/td><td><strong>Jsoup<\/strong><\/td><td>\u6700\u9002\u5408\u65b0\u624b\uff0c\u7b80\u5355\u5f3a\u5927<\/td><\/tr><tr><td>\u53d1 HTTP \u8bf7\u6c42<\/td><td><strong>OkHttp<\/strong><\/td><td>\u6027\u80fd\u5f3a\uff0c\u4f01\u4e1a\u5e38\u7528<\/td><\/tr><tr><td>\u53d1 HTTP \u8bf7\u6c42<\/td><td><strong>HttpClient<\/strong><\/td><td>Apache \u51fa\u54c1<\/td><\/tr><tr><td>HTML\u89e3\u6790<\/td><td><strong>Jsoup<\/strong><\/td><td>\u89e3\u6790\u9009\u62e9\u5668\u6700\u5f3a<\/td><\/tr><tr><td>JSON\u89e3\u6790<\/td><td><strong>Gson \/ fastjson2 \/ Jackson<\/strong><\/td><td>\u6839\u636e\u9879\u76ee\u9009\u62e9<\/td><\/tr><tr><td>\u5206\u5e03\u5f0f\u722c\u866b<\/td><td><strong>WebMagic<\/strong><\/td><td>Java \u6700\u6210\u719f\u722c\u866b\u6846\u67b6<\/td><\/tr><tr><td>\u6d4f\u89c8\u5668\u6a21\u62df<\/td><td><strong>Selenium + ChromeDriver<\/strong><\/td><td>\u7834\u89e3 JS \u6e32\u67d3\u9875\u9762<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u56db\u3001\u6700\u5feb\u901f\u5165\u95e8\u793a\u4f8b\uff1a\u722c\u5355\u4e2a\u7f51\u9875\uff08Jsoup\uff09<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc 1\uff09Maven \u4f9d\u8d56<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&amp;lt;dependency&gt;\n    &amp;lt;groupId&gt;org.jsoup&amp;lt;\/groupId&gt;\n    &amp;lt;artifactId&gt;jsoup&amp;lt;\/artifactId&gt;\n    &amp;lt;version&gt;1.17.1&amp;lt;\/version&gt;\n&amp;lt;\/dependency&gt;\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc 2\uff09Java \u4ee3\u7801\u793a\u4f8b<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nimport org.jsoup.Jsoup;\nimport org.jsoup.nodes.Document;\nimport org.jsoup.select.Elements;\n\npublic class SimpleCrawler {\n    public static void main(String&#x5B;] args) {\n        try {\n            \/\/ 1. \u4e0b\u8f7d\u7f51\u9875\n            Document doc = Jsoup.connect(&quot;https:\/\/news.baidu.com&quot;).get();\n\n            \/\/ 2. \u6309 CSS \u9009\u62e9\u5668\u89e3\u6790\u6807\u9898\n            Elements titles = doc.select(&quot;a&quot;);\n\n            \/\/ 3. \u8f93\u51fa\n            titles.stream().limit(10).forEach(e -&gt; \n                System.out.println(e.text() + &quot; \u2192 &quot; + e.attr(&quot;href&quot;))\n            );\n\n        } catch (Exception e) {\n            e.printStackTrace();\n        }\n    }\n}\n\n<\/pre><\/div>\n\n\n<p>\u2714 \u9002\u5408\u722c HTML \u5185\u5bb9<br>\u2714 \u652f\u6301\u6a21\u62df\u6d4f\u89c8\u5668 UA\u3001\u8bbe\u7f6e Cookie\u3001\u8d85\u65f6\u7b49<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u4e94\u3001Java \u722c JSON API\uff08OkHttp \u793a\u4f8b\uff09<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc Maven \u4f9d\u8d56<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&amp;lt;dependency&gt;\n    &amp;lt;groupId&gt;com.squareup.okhttp3&amp;lt;\/groupId&gt;\n    &amp;lt;artifactId&gt;okhttp&amp;lt;\/artifactId&gt;\n    &amp;lt;version&gt;4.11.0&amp;lt;\/version&gt;\n&amp;lt;\/dependency&gt;\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc \u4ee3\u7801\u793a\u4f8b<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nOkHttpClient client = new OkHttpClient();\nRequest request = new Request.Builder()\n        .url(&quot;https:\/\/api.bilibili.com\/x\/web-interface\/ranking\/v2&quot;)\n        .header(&quot;User-Agent&quot;, &quot;Mozilla\/5.0&quot;)\n        .build();\n\nResponse response = client.newCall(request).execute();\nString json = response.body().string();\nSystem.out.println(json);\n\n<\/pre><\/div>\n\n\n<p>\u2714 \u9002\u5408\u722c\u63a5\u53e3\u3001APP API\u3001JSON \u6570\u636e<br>\u2714 \u652f\u6301 Headers\u3001POST\u3001\u4ee3\u7406\u3001\u5f02\u6b65\u8bf7\u6c42<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u516d\u3001Java \u5206\u5e03\u5f0f\u722c\u866b\uff08WebMagic\uff09<\/h1>\n\n\n\n<p>\u5982\u679c\u4f60\u8981\u505a\u201c\u4f01\u4e1a\u7ea7\u722c\u866b\u9879\u76ee\u201d\uff0c\u63a8\u8350\u4f7f\u7528 WebMagic\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc Maven \u4f9d\u8d56<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&amp;lt;dependency&gt;\n    &amp;lt;groupId&gt;us.codecraft&amp;lt;\/groupId&gt;\n    &amp;lt;artifactId&gt;webmagic-core&amp;lt;\/artifactId&gt;\n    &amp;lt;version&gt;0.9.1&amp;lt;\/version&gt;\n&amp;lt;\/dependency&gt;\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc \u7b80\u5355 Demo<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\npublic class MySpider implements PageProcessor {\n\n    private Site site = Site.me()\n            .setRetryTimes(3)\n            .setSleepTime(500)\n            .setUserAgent(&quot;Mozilla\/5.0&quot;);\n\n    @Override\n    public void process(Page page) {\n        \/\/ \u6293\u53d6\u6807\u9898\n        page.putField(&quot;title&quot;, page.getHtml().xpath(&quot;\/\/title\/text()&quot;));\n    }\n\n    @Override\n    public Site getSite() {\n        return site;\n    }\n\n    public static void main(String&#x5B;] args) {\n        Spider.create(new MySpider())\n                .addUrl(&quot;https:\/\/www.jd.com&quot;)\n                .thread(5)\n                .run();\n    }\n}\n\n<\/pre><\/div>\n\n\n<p>\u2714 \u652f\u6301\u8c03\u5ea6\u3001\u961f\u5217\u3001\u7ba1\u9053<br>\u2714 \u591a\u7ebf\u7a0b\u9ad8\u5e76\u53d1<br>\u2714 \u5206\u5e03\u5f0f\uff08Redis + WebMagic\uff09<\/p>\n\n\n\n<p>\u5982\u679c\u4f60\u505a\u300c\u6570\u636e\u6cbb\u7406 + \u722c\u866b + NLP\u300d\u9879\u76ee\uff0c\u975e\u5e38\u9002\u5408\u7528\u5b83\u3002<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u4e03\u3001Java \u722c\u866b\u5982\u4f55\u7a81\u7834\u53cd\u722c\uff1f<\/h1>\n\n\n\n<p>\u4e0b\u9762\u662f<strong>\u6700\u5e38\u89c1\u7684\u53cd\u722c\u7b56\u7565<\/strong>\u4e0e\u5bf9\u5e94\u89e3\u51b3\u65b9\u6cd5\u3002<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u7f51\u7ad9\u53cd\u722c\u624b\u6bb5<\/th><th>Java \u7834\u89e3\u65b9\u5f0f<\/th><\/tr><\/thead><tbody><tr><td>UA \u9650\u5236<\/td><td>\u81ea\u5b9a\u4e49 User-Agent<\/td><\/tr><tr><td>\u9650\u5236 Cookie<\/td><td>\u5e26 Cookie \u8bf7\u6c42<\/td><\/tr><tr><td>\u8bbf\u95ee\u592a\u5feb\u88ab\u5c01<\/td><td>\u52a0\u968f\u673a\u5ef6\u8fdf\u3001\u4ee3\u7406 IP<\/td><\/tr><tr><td>\u9700\u8981\u767b\u5f55<\/td><td>\u6a21\u62df\u767b\u5f55\uff0c\u7ef4\u62a4 Session<\/td><\/tr><tr><td>JS \u6e32\u67d3\u9875\u9762<\/td><td>Selenium + ChromeDriver<\/td><\/tr><tr><td>\u52a0\u5bc6\u53c2\u6570<\/td><td>\u6293\u5305\u5206\u6790 JS \u4ee3\u7801\u9006\u5411<\/td><\/tr><tr><td>\u9a8c\u8bc1\u7801<\/td><td>\u7b2c\u4e09\u65b9\u5e73\u53f0\u8bc6\u522b\uff08\u5982\u963f\u91cc\u4e91 OCR\uff09\u6216\u624b\u52a8\u8f93\u5165<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">UA \u793a\u4f8b<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nJsoup.connect(url)\n     .userAgent(&quot;Mozilla\/5.0 Chrome\/120&quot;)\n     .get();\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">\u4ee3\u7406 IP \u793a\u4f8b<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nProxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(&quot;127.0.0.1&quot;, 8888));\nOkHttpClient client = new OkHttpClient.Builder().proxy(proxy).build();\n\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u516b\u3001\u722c\u866b\u6570\u636e\u5165\u5e93\uff08\u53ef\u76f4\u63a5\u7528\u4e8e\u4f60\u7684\u9879\u76ee\uff09<\/h1>\n\n\n\n<p>\u4f60\u53ef\u4ee5\u628a\u722c\u5230\u7684\u6570\u636e\u5b58\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MySQL<\/strong>\uff08Spring Boot \u76f4\u63a5\u7528\uff09<\/li>\n\n\n\n<li><strong>MongoDB<\/strong><\/li>\n\n\n\n<li><strong>CSV\/Excel<\/strong><\/li>\n\n\n\n<li><strong>Elasticsearch<\/strong><\/li>\n\n\n\n<li><strong>Redis \u7f13\u5b58<\/strong><\/li>\n<\/ul>\n\n\n\n<p>\u793a\u4f8b\uff1a\u5199\u5165 MySQL\uff08JDBC\uff09<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nConnection conn = DriverManager.getConnection(url, user, pass);\nPreparedStatement ps = conn.prepareStatement(&quot;INSERT INTO news(title,url) VALUES(?,?)&quot;);\nps.setString(1, title);\nps.setString(2, link);\nps.execute();\n\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2b50 \u4e5d\u3001\u5982\u679c\u4f60\u9700\u8981\uff0c\u6211\u8fd8\u80fd\u7ed9\u4f60\uff1a<\/h1>\n\n\n\n<p>\u2714 <strong>\u5b8c\u6574\u7684 Java \u722c\u866b\u9879\u76ee\u6a21\u677f\uff08Maven + WebMagic + MySQL\uff09<\/strong><br>\u2714 <strong>Spring Boot + \u722c\u866b + NLP\u60c5\u611f\u5206\u6790 \u4e00\u4f53\u5316\u6848\u4f8b<\/strong>\uff08\u4e0e\u4f60\u7684\u6570\u636e\u6cbb\u7406\u9879\u76ee\u4e00\u81f4\uff09<br>\u2714 <strong>\u53cd\u722c\u7834\u89e3\u4e13\u7528\u5de5\u5177\u4ee3\u7801<\/strong><br>\u2714 <strong>\u6d4f\u89c8\u5668\u81ea\u52a8\u5316\u722c\u53d6\u793a\u4f8b\uff08Selenium\uff09<\/strong><br>\u2714 <strong>\u6293\u5305\u5206\u6790 + \u53c2\u6570\u52a0\u5bc6\u9006\u5411\u6559\u5b66<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4e0b\u9762\u7ed9\u4f60\u4e00\u4efd \u6700\u5b9e\u7528\u3001\u6700\u5b8c\u6574\u7684\u300aJava \u722c\u866b\u5165\u95e8\u4e0e\u5b9e\u6218\u6307\u5357\u300b\uff0c\u5305\u542b\u539f\u7406\u3001\u5e38\u7528\u5e93&#8230; <a class=\"more-link\" href=\"https:\/\/www.52runoob.com\/index.php\/2025\/12\/02\/%e3%80%90java%e3%80%91%e7%88%ac%e8%99%ab_java%e7%88%ac%e8%99%ab\/\">Continue Reading &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[],"class_list":["post-385","post","type-post","status-publish","format-standard","hentry","category-java"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/posts\/385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/comments?post=385"}],"version-history":[{"count":1,"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/posts\/385\/revisions"}],"predecessor-version":[{"id":386,"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/posts\/385\/revisions\/386"}],"wp:attachment":[{"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/media?parent=385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/categories?post=385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.52runoob.com\/index.php\/wp-json\/wp\/v2\/tags?post=385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}