[Python 爬虫]煎蛋网 OOXX 妹子图爬虫（1）——解密图片地址

好！你想做的是一个针对【煎蛋网】妹子图（OOXX）的图片爬虫，重点在于解密隐藏图片地址。煎蛋的图片地址常常不是直接在 HTML 中明文写出，而是通过 JS 加密或构造生成的。

我们就以：https://jandan.net/ooxx 这个页面为例，手把手带你写一个图片爬虫（Part 1：解密图片地址）。

📍 一、煎蛋网 OOXX 页面结构分析

打开 https://jandan.net/ooxx，按 F12 打开开发者工具，可以看到：

每个妹子图的 <li> 标签中，图片地址藏在：

<span class="img-hash">aGVsbG8ud29ybGQ=.jpg</span>

图片的真实地址是用 base64 编码的，需要先解码。

🔐 二、解密图片地址（核心点）

import base64

img_hash = "aGVsbG8ud29ybGQ="  # 举例
img_name = base64.b64decode(img_hash).decode('utf-8')
print("图片地址：", img_name)

如果你从页面提取出 <span class="img-hash"> 中的内容，用 Base64 解码即可得到图片地址（例如 hello.world），拼接成完整 URL：

https://cdn.jandan.net//data/ooxx/202407/15/hello.world

或者现在用的是：

https://img.jandan.net/file/ooxx/图片名

🧪 三、完整代码：爬取并打印解密后的图片地址

import requests
from bs4 import BeautifulSoup
import base64

url = "https://jandan.net/ooxx"

headers = {
    'User-Agent': 'Mozilla/5.0'
}

# 请求页面
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')

# 提取 img-hash 并解密
hash_tags = soup.select('span.img-hash')

for idx, tag in enumerate(hash_tags):
    b64_hash = tag.text.strip()
    try:
        decoded = base64.b64decode(b64_hash).decode('utf-8')
        # 拼接成完整图片 URL（根据实际路径规则）
        img_url = f"https://img.jandan.net/file/ooxx/{decoded}"
        print(f"[{idx}] {img_url}")
    except Exception as e:
        print(f"解码失败：{b64_hash} 错误：{e}")

💾 四、想下载图片？继续加一段：

    img_data = requests.get(img_url, headers=headers).content
    with open(f'ooxx_{idx}.jpg', 'wb') as f:
        f.write(img_data)

📌 五、下一步建议（进阶）

分页抓取：OOXX 页面使用 ?page=xxx 结构，比如 https://jandan.net/ooxx/page-2
图片保存优化：判断扩展名 .jpg/.gif/.png
异步多线程下载：提高速度
加 GUI 或生成图库 HTML

🧠 提醒：请合法合理使用爬虫内容！

煎蛋网站图片带有版权和访问规定，仅供学习研究，不要用于非法或商业用途。

明白了！那我们继续写爬虫的 Part 2：分页下载煎蛋网 OOXX 妹子图，并自动保存图片。

🧰 完整功能：

自动解密图片地址（Base64 解码）
支持分页抓取（可自定义页数）
下载图片并保存到本地

🧱 示例代码：爬取前 3 页 OOXX 妹子图并保存

import os
import requests
from bs4 import BeautifulSoup
import base64
import time

# 保存目录
save_dir = 'jandan_ooxx'
os.makedirs(save_dir, exist_ok=True)

headers = {
    'User-Agent': 'Mozilla/5.0'
}

# 页数范围：从第 1 页开始往后爬（可修改）
start_page = 1
end_page = 3

for page in range(start_page, end_page + 1):
    print(f'\n=== 正在抓取第 {page} 页 ===')
    url = f'https://jandan.net/ooxx/page-{page}'

    try:
        res = requests.get(url, headers=headers, timeout=10)
        soup = BeautifulSoup(res.text, 'html.parser')
        hash_tags = soup.select('span.img-hash')

        for idx, tag in enumerate(hash_tags):
            b64 = tag.text.strip()
            try:
                filename = base64.b64decode(b64).decode('utf-8')
                img_url = f'https://img.jandan.net/file/ooxx/{filename}'

                # 下载图片
                img_res = requests.get(img_url, headers=headers, timeout=10)
                ext = os.path.splitext(filename)[1]
                if not ext or len(ext) > 5:
                    ext = '.jpg'
                file_path = os.path.join(save_dir, f'{page}_{idx}{ext}')

                with open(file_path, 'wb') as f:
                    f.write(img_res.content)

                print(f'[✓] 下载成功: {img_url}')
                time.sleep(0.5)  # 适度延迟，防止封IP
            except Exception as e:
                print(f'[✗] 解码或下载失败: {b64} 错误: {e}')

    except Exception as e:
        print(f'[!] 页面获取失败：{url}，错误：{e}')

📦 最终效果

会在当前目录创建 jandan_ooxx 文件夹
把前 3 页所有妹子图都下载保存到本地，命名如 1_0.jpg、2_1.jpg

💡 可选升级方向

多线程并发下载（速度快）
抓取页数自动检测（从首页读取总页数）
过滤 GIF 动图 / 静态图
下载图片加上文件校验（避免重复）

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

[Python 爬虫]煎蛋网 OOXX 妹子图爬虫（1）——解密图片地址

📍 一、煎蛋网 OOXX 页面结构分析

🔐 二、解密图片地址（核心点）

🧪 三、完整代码：爬取并打印解密后的图片地址

💾 四、想下载图片？继续加一段：

📌 五、下一步建议（进阶）

🧠 提醒：请合法合理使用爬虫内容！

🧰 完整功能：

🧱 示例代码：爬取前 3 页 OOXX 妹子图并保存

📦 最终效果

💡 可选升级方向

lichongyang

发表回复取消回复

[Python 爬虫]煎蛋网 OOXX 妹子图爬虫（1）——解密图片地址

📍 一、煎蛋网 OOXX 页面结构分析

🔐 二、解密图片地址（核心点）

🧪 三、完整代码：爬取并打印解密后的图片地址

💾 四、想下载图片？继续加一段：

📌 五、下一步建议（进阶）

🧠 提醒：请合法合理使用爬虫内容！

🧰 完整功能：

🧱 示例代码：爬取前 3 页 OOXX 妹子图并保存

📦 最终效果

💡 可选升级方向

lichongyang

发表回复 取消回复

发表回复取消回复