干货|自研基于API实现有道云笔记全格式导出

基于有道云笔记 Web API,递归导出所有笔记为可编辑的 Markdown 文件,完整保持目录结构。

功能特性

特性 说明
认证方式 Cookie 登录(从浏览器复制)
支持格式 .md(直接保存)、XML、JSON、HTML → 全部转为 Markdown
目录结构 完整保持有道云笔记的原始目录层级
增量导出 默认跳过已存在的文件,--no-skip 强制覆盖
限流保护 每次请求间隔 150ms,避免触发限制

环境要求

  • Python 3.7+
  • 依赖库:requests
pip install requests

获取 Cookies

脚本通过浏览器 Cookie 认证访问有道云笔记 API,需要手动导出 Cookie:

  1. 浏览器登录 note.youdao.com
  2. F12 打开开发者工具
  3. 进入 ApplicationCookiesnote.youdao.com
  4. 找到以下关键 Cookie:
    • YNOTE_CSTK(必需,CSRF 验证令牌)
    • YNOTE_SESS(必需,会话标识)
    • YNOTE_LOGIN(必需,登录状态)
    • 其他 YNOTE_* 开头的 Cookie 也建议一并复制
  5. 将 Cookie 保存为 cookies.json 文件

💡 也可以使用浏览器扩展 EditThisCookie 一键导出 JSON 格式。

cookies.json 格式

支持两种格式,任选其一:

格式一:数组格式(推荐)

{
    "cookies": [
        ["YNOTE_CSTK", "cstk值", ".youdao.com", "/"],
        ["YNOTE_SESS", "sess值", ".youdao.com", "/"],
        ["YNOTE_LOGIN", "login值", ".youdao.com", "/"]
    ]
}

格式二:字典格式

{
    "YNOTE_CSTK": "cstk值",
    "YNOTE_SESS": "sess值",
    "YNOTE_LOGIN": "login值"
}

Main : youdao_note_export.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import argparse
import json
import logging
import os
import re
import sys
import time
import xml.etree.ElementTree as ET
from pathlib import Path
from typing import Dict, List, Optional, Tuple

try:
    import requests
except ImportError:
    print("请先安装 requests: pip install requests")
    sys.exit(1)

# 日志
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
    datefmt="%H:%M:%S",
)
log = logging.getLogger("youdao-export")

#有道云笔记API端点
BASE = "https://note.youdao.com"
API_ROOT_DIR = BASE + "/yws/api/personal/file?method=getByPath&keyfrom=web&cstk={cstk}"
API_LIST_DIR = (
    BASE
    + "/yws/api/personal/file/{dir_id}?all=true&f=true&len=1000&sort=1"
    "&isReverse=false&method=listPageByParentId&keyfrom=web&cstk={cstk}"
)
API_DOWNLOAD = (
    BASE
    + "/yws/api/personal/sync?method=download&_system=macos&_systemVersion="
    "&_screenWidth=1280&_screenHeight=800&_appName=ynote&_appuser=0123456789abcdeffedcba9876543210"
    "&_vendor=official-website&_launch=16&_firstTime=&_deviceId=0123456789abcdef"
    "&_platform=web&_cityCode=110000&_cityName=&sev=j1&keyfrom=web&cstk={cstk}"
)

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/120.0.0.0 Safari/537.36"
    ),
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
}

#  API 客户端
class YoudaoNoteClient:
    def __init__(self, cookies_path: str = "cookies.json"):
        self.session = requests.Session()
        self.session.headers.update(HEADERS)
        self.cstk: Optional[str] = None
        self._load_cookies(cookies_path)

    # Cookie
    def _load_cookies(self, path: str):
        with open(path, "rb") as f:
            data = json.loads(f.read().decode("utf-8"))

        cookies = data.get("cookies", data)  # 兼容两种格式
        if isinstance(cookies, list):
            for c in cookies:
                name, value = c[0], c[1]
                domain = c[2] if len(c) > 2 else ".youdao.com"
                path_ = c[3] if len(c) > 3 else "/"
                self.session.cookies.set(name, value, domain=domain, path=path_)
                if name == "YNOTE_CSTK":
                    self.cstk = value
        elif isinstance(cookies, dict):
            for name, value in cookies.items():
                self.session.cookies.set(name, value, domain=".youdao.com", path="/")
                if name == "YNOTE_CSTK":
                    self.cstk = value

        if not self.cstk:
            raise ValueError("cookies.json 中缺少 YNOTE_CSTK 字段,无法通过 CSRF 验证")
        log.info("✅ Cookies 加载成功,YNOTE_CSTK=%s...", self.cstk[:8])

    # 获取根目录
    def get_root_id(self) -> str:
        url = API_ROOT_DIR.format(cstk=self.cstk)
        resp = self.session.post(url, data={"path": "/", "entire": "true", "purge": "false", "cstk": self.cstk})
        resp.raise_for_status()
        data = resp.json()
        root_id = data["fileEntry"]["id"]
        log.info("📁 根目录 ID: %s", root_id)
        return root_id

    # 列出目录下所有条目
    def list_dir(self, dir_id: str) -> List[Dict]:
        url = API_LIST_DIR.format(dir_id=dir_id, cstk=self.cstk)
        resp = self.session.get(url)
        resp.raise_for_status()
        data = resp.json()
        entries = data.get("entries", [])
        return entries

    # 下载文件原始内容
    def download_file(self, file_id: str) -> bytes:
        url = API_DOWNLOAD.format(cstk=self.cstk)
        data = {
            "fileId": file_id,
            "version": -1,
            "convert": "true",
            "editorType": 1,
            "cstk": self.cstk,
        }
        resp = self.session.post(url, data=data)
        resp.raise_for_status()
        return resp.content

#  格式转换

def _sanitize_filename(name: str) -> str:
    return re.sub(r'[\/:*?"<>|]', "_", name).strip()

# XML → Markdown
def _xml_get_text(children, key="text") -> str:
    for child in children:
        if key in child.tag:
            return child.text or ""
    return ""

def _xml_convert_element(element, list_items: dict) -> str:
# 转换单个XML element为Markdown文本
    tag = element.tag.replace("{http://note.youdao.com}", "").replace("-", "_")
    text = _xml_get_text(list(element))

    converters = {
        "para": lambda e, t: t,
        "heading": lambda e, t: f"{'#' * int(e.attrib.get('level', 1))} {t}" if t else "",
        "image": lambda e, t: f"![{t}]({_xml_get_text(list(e), 'source')})",
        "attach": lambda e, t: f"[{_xml_get_text(list(e), 'filename')}]({_xml_get_text(list(e), 'resource')})",
        "code": lambda e, t: f"```{_xml_get_text(list(e), 'language')}n{t}n```",
        "todo": lambda e, t: f"- [ ] {t}",
        "quote": lambda e, t: f"> {t}",
        "horizontal_line": lambda e, t: "---",
    }

    if tag in converters:
        return converters[tag](element, text)

    if tag == "list_item":
        list_id = element.attrib.get("list-id", "")
        is_ordered = list_items.get(list_id, "unordered")
        prefix = "1." if is_ordered == "ordered" else "-"
        return f"{prefix} {text}"

    if tag == "table":
        content = _xml_get_text(list(element), "content")
        if not content:
            return ""
        try:
            table_data = json.loads(content)
        except (json.JSONDecodeError, TypeError):
            return ""
        widths = table_data.get("widths", [])
        if not widths:
            return ""
        cols = len(widths)
        rows: List[List[str]] = []
        row: List[str] = []
        for cell in table_data.get("cells", []):
            row.append(cell.get("value", "").replace("|", "\|"))
            if len(row) == cols:
                rows.append(row)
                row = []
        if not rows:
            return ""
        lines = ["| " + " | ".join(rows[0]) + " |"]
        lines.append("| " + " | ".join(["---"] * cols) + " |")
        for r in rows[1:]:
            lines.append("| " + " | ".join(r) + " |")
        return "n".join(lines)

    return text  # fallback

def xml_to_markdown(content: bytes) -> str:
# 将有道 XML 格式笔记转为Markdown
    try:
        root = ET.fromstring(content)
    except ET.ParseError:
        return content.decode("utf-8", errors="replace")

    # 解析列表类型映射
    list_items: Dict[str, str] = {}
    meta = root[0] if len(root) > 0 else None
    if meta is not None:
        for child in meta:
            if "list" in child.tag:
                list_items[child.attrib.get("id", "")] = child.attrib.get("type", "unordered")

    body = root[1] if len(root) > 1 else root
    parts: List[str] = []
    for element in body:
        md = _xml_convert_element(element, list_items)
        if md is not None:
            parts.append(md)
    return "nn".join(parts)

#JSON → Markdown

class JsonConverter:
# JSON 格式笔记转 Markdown

    def _get_common_text(self, content: dict) -> str:
        text = ""
        five = content.get("5")
        if five:
            seven = five[0].get("7")
            if not seven:
                return ""
            for item in seven:
                raw = item.get("8", "")
                attrs = item.get("9")
                if raw and attrs:
                    raw = self._apply_attrs(raw, attrs)
                text += raw
        return text

    def _apply_attrs(self, text: str, attrs: list) -> str:
        for attr in attrs:
            t = attr.get("2", "")
            if t == "b":
                text = f"**{text}**"
            elif t == "i":
                text = f"*{text}*"
        return text

    def _convert_text(self, content: dict) -> str:
        all_text = ""
        for block in content.get("5", []):
            five_inner = block.get("5")
            text_type = block.get("6")
            seven = block.get("7")

            if seven and not five_inner:
                text = ""
                for item in seven:
                    raw = item.get("8", "")
                    attrs = item.get("9")
                    if raw and attrs:
                        raw = self._apply_attrs(raw, attrs)
                    text += raw
            elif text_type == "li" and five_inner:
                source = self._get_common_text(block)
                hf = (block.get("4") or {}).get("hf", "")
                text = f"[{source}]({hf})" if hf else ""
            else:
                text = ""
            if text:
                all_text += text
        return all_text

    def _convert_h(self, content: dict) -> str:
        level = int((content.get("4") or {}).get("l", "h1").replace("h", ""))
        text = self._get_common_text(content)
        return f"{'#' * level} {text}" if text else ""

    def _convert_im(self, content: dict) -> str:
        url = (content.get("4") or {}).get("u", "")
        return f"![]({url})"

    def _convert_a(self, content: dict) -> str:
        fn = (content.get("4") or {}).get("fn", "")
        re_url = (content.get("4") or {}).get("re", "")
        return f"[{fn}]({re_url})"

    def _convert_cd(self, content: dict) -> str:
        lang = (content.get("4") or {}).get("la", "")
        lines = []
        for code in content.get("5", []):
            lines.append(self._get_common_text(code))
        return f"```{lang}n{''.join(lines)}n```"

    def _convert_la(self, content: dict) -> str:
        lines = []
        for line in content.get("5", []):
            lines.append(self._get_common_text(line))
        return f"```n{''.join(lines)}n```"

    def _convert_q(self, content: dict) -> str:
        parts = []
        for q in content.get("5", []):
            t = self._get_common_text(q).replace("n", "")
            parts.append(f"> {t}")
        return "n".join(parts)

    def _convert_l(self, content: dict) -> str:
        text = self._get_common_text(content)
        lt = (content.get("4") or {}).get("lt", "unordered")
        ll = (content.get("4") or {}).get("ll", 1)
        if lt == "unordered":
            return "t" * (ll - 1) + f"- {text}"
        return f"1. {text}"

    def _convert_t(self, content: dict) -> str:
        tr_list = content.get("5", [])
        rows = []
        for tr in tr_list:
            cells = []
            for tc in tr.get("5", []):
                inner = tc.get("5", [{}])[0].get("5", [{}])[0].get("7")
                cells.append(inner[0]["8"] if inner else " ")
            rows.append(cells)
        if not rows:
            return ""
        cols = len(rows[0])
        lines = ["| " + " | ".join(rows[0]) + " |"]
        lines.append("| " + " | ".join(["---"] * cols) + " |")
        for r in rows[1:]:
            lines.append("| " + " | ".join(r) + " |")
        return "n".join(lines)

    def convert(self, content: bytes) -> str:
        try:
            data = json.loads(content)
        except (json.JSONDecodeError, ValueError):
            return content.decode("utf-8", errors="replace")

        converter_map = {
            "t": self._convert_text,
            "h": self._convert_h,
            "im": self._convert_im,
            "a": self._convert_a,
            "cd": self._convert_cd,
            "la": self._convert_la,
            "q": self._convert_q,
            "l": self._convert_l,
            "t": self._convert_t,
        }

        parts = []
        for block in data.get("5", []):
            btype = block.get("6")
            if btype and btype in converter_map:
                md = converter_map[btype](block)
            else:
                md = self._convert_text(block)
            if md:
                parts.append(md)
        return "nn".join(parts)

_json_converter = JsonConverter()

#统一转换

def convert_to_markdown(raw: bytes, suffix: str) -> str:
# 根据文件类型选择转换策略,返回Markdown字符串
    if not raw:
        return ""

    # .md 文件直接返回
    if suffix == ".md":
        return raw.decode("utf-8", errors="replace")

    # 检测 XML
    if raw[:5] == b"<?xml" or b"<note " in raw[:200]:
        return xml_to_markdown(raw)

    # 检测 JSON
    stripped = raw.lstrip()
    if stripped[:1] == b"{":
        return _json_converter.convert(raw)

    # HTML fallback
    text = raw.decode("utf-8", errors="replace")
    if "<html" in text[:500].lower() or "<div" in text[:500].lower():
        try:
            from markdownify import markdownify
            return markdownify(text).strip()
        except ImportError:
            log.warning("HTML 笔记需要 markdownify 库: pip install markdownify")
            return text

    return text

#  递归导出
class YoudaoExporter:
    def __init__(self, client: YoudaoNoteClient, output_dir: str, skip_existing: bool = True):
        self.client = client
        self.output_dir = Path(output_dir)
        self.skip_existing = skip_existing
        self.stats = {"dirs": 0, "notes": 0, "skipped": 0, "errors": 0}

    def run(self, root_dir_id: Optional[str] = None):
        self.output_dir.mkdir(parents=True, exist_ok=True)
        if root_dir_id is None:
            root_dir_id = self.client.get_root_id()
        log.info("🚀 开始导出到: %s", self.output_dir.resolve())
        self._walk(root_dir_id, self.output_dir)
        self._print_summary()

    def _walk(self, dir_id: str, local_path: Path):
# 归遍历目录
        entries = self.client.list_dir(dir_id)
        for entry in entries:
            fe = entry.get("fileEntry", {})
            name = fe.get("name", "未命名")
            entry_id = fe.get("id", "")
            is_dir = fe.get("dir", False)
            suffix = fe.get("suffix", "")

            safe_name = _sanitize_filename(name)

            if is_dir:
                sub_dir = local_path / safe_name
                sub_dir.mkdir(parents=True, exist_ok=True)
                self.stats["dirs"] += 1
                log.info("📂 %s/", sub_dir.relative_to(self.output_dir))
                time.sleep(0.1)  # 避免请求过快
                self._walk(entry_id, sub_dir)
            else:
                self._export_note(entry_id, safe_name, suffix, local_path)

    def _export_note(self, file_id: str, name: str, suffix: str, local_path: Path):
# 导出单条笔记
        # 确保 .md 后缀
        if suffix == ".md" or not suffix:
            md_name = name if name.endswith(".md") else f"{name}.md"
        else:
            md_name = f"{name}.md"

        md_path = local_path / md_name

        # 跳过已存在文件
        if self.skip_existing and md_path.exists():
            self.stats["skipped"] += 1
            log.debug("⏭️  跳过: %s", md_path.relative_to(self.output_dir))
            return

        try:
            raw = self.client.download_file(file_id)
            md_content = convert_to_markdown(raw, suffix)

            md_path.write_text(md_content, encoding="utf-8")
            self.stats["notes"] += 1
            log.info("✅ %s (%d bytes)", md_path.relative_to(self.output_dir), len(md_content))

        except Exception as e:
            self.stats["errors"] += 1
            log.error("❌ %s - %s: %s", name, type(e).__name__, e)

        time.sleep(0.15)  # 请求间隔,避免触发限流

    def _print_summary(self):
        s = self.stats
        log.info("=" * 50)
        log.info("📊 导出完成!")
        log.info("   📁 目录数: %d", s["dirs"])
        log.info("   📝 笔记数: %d", s["notes"])
        log.info("   ⏭️  跳过数: %d", s["skipped"])
        log.info("   ❌ 错误数: %d", s["errors"])
        log.info("   📍 输出路径: %s", self.output_dir.resolve())
        log.info("=" * 50)

def main():
    parser = argparse.ArgumentParser(
        description="有道云笔记全量导出为 Markdown",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
cookies.json格式示例:
  {
    "cookies": [
      ["YNOTE_CSTK", "XXX", ".youdao.com", "/"],
      ["YNOTE_SESS", "XXX", ".youdao.com", "/"],
      ...
    ]
  }

或者简写为dict格式:
  {
    "YNOTE_CSTK": "XXX",
    "YNOTE_SESS": "XXX",
    ...
  }
""",
    )
    parser.add_argument(
        "-c", "--cookies",
        default="cookies.json",
        help="Cookies JSON 文件路径 (默认: cookies.json)",
    )
    parser.add_argument(
        "-o", "--output",
        default="youdao_notes_export",
        help="导出输出目录 (默认: youdao_notes_export)",
    )
    parser.add_argument(
        "-d", "--dir",
        default=None,
        help="只导出指定的有道云笔记顶层目录名 (默认: 全部)",
    )
    parser.add_argument(
        "--no-skip",
        action="store_true",
        help="不跳过已存在的文件,强制重新导出",
    )
    parser.add_argument(
        "-v", "--verbose",
        action="store_true",
        help="显示详细日志",
    )

    args = parser.parse_args()

    if args.verbose:
        logging.getLogger().setLevel(logging.DEBUG)

    if not os.path.exists(args.cookies):
        log.error("❌ Cookies 文件不存在: %s", args.cookies)
        log.info("请按README说明获取Cookies并保存到 %s", args.cookies)
        sys.exit(1)

    # 初始化
    client = YoudaoNoteClient(args.cookies)

    # 获取根目录
    root_id = client.get_root_id()

    # 如果指定了目录
    if args.dir:
        entries = client.list_dir(root_id)
        found = False
        for entry in entries:
            fe = entry.get("fileEntry", {})
            if fe.get("name") == args.dir and fe.get("dir"):
                root_id = fe["id"]
                found = True
                log.info("📁 指定目录: %s (ID: %s)", args.dir, root_id)
                break
        if not found:
            log.error("❌ 未找到目录: %s", args.dir)
            log.info("可用顶层目录:")
            for entry in entries:
                fe = entry.get("fileEntry", {})
                if fe.get("dir"):
                    log.info("  - %s", fe.get("name"))
            sys.exit(1)

    # 开始导出
    exporter = YoudaoExporter(
        client=client,
        output_dir=args.output,
        skip_existing=not args.no_skip,
    )
    exporter.run(root_id)

if __name__ == "__main__":
    main()

使用方法

基本用法

# 导出全部笔记到 youdao_notes_export 目录
python3 youdao_note_export.py -c cookies.json

完整参数

用法: python3 youdao_note_export.py [选项]

选项:
  -h, --help            显示帮助信息
  -c, --cookies PATH    Cookies JSON 文件路径(默认: cookies.json)
  -o, --output DIR      导出输出目录(默认: youdao_notes_export)
  -d, --dir NAME        只导出指定的有道云笔记顶层目录名(默认: 全部)
  --no-skip             不跳过已存在的文件,强制重新导出
  -v, --verbose         显示详细日志

使用示例

# 导出全部笔记
python3 youdao_note_export.py -c cookies.json -o ./my_notes

# 只导出名为"我的笔记"的目录
python3 youdao_note_export.py -c cookies.json -o ./my_notes -d "我的笔记"

# 强制重新导出所有笔记(不跳过已有文件)
python3 youdao_note_export.py --no-skip -v

# 指定 Cookie 文件路径和输出目录
python3 youdao_note_export.py -c ~/Desktop/cookies.json -o ~/Desktop/youdao_backup

导出效果

导出后的目录结构与有道云笔记完全一致:

youdao_notes_export/
├── 我的笔记/
│   ├── 学习笔记/
│   │   ├── 基于MimoLLM大模型训练.md
│   │   ├── RestAPI开发者详解.md
│   │   └── LLM算法笔记.md
│   ├── 工作记录/
│   │   ├── 周报 2024-01.md
│   │   └── 会议纪要.md
│   └── 随手记.md
├── 旅行攻略/
│   ├── 泰国自由行.md 
│   └── 苏州西山自驾行.md 
└── 技术收藏/
    └── Ceph分布式存储技术详解.md

运行日志示例

14:30:01 [INFO] ✅ Cookies 加载成功,YNOTE_CSTK=a1b2c3d4...
14:30:01 [INFO] 📁 根目录 ID: Wf8d9a...
14:30:01 [INFO] 🚀 开始导出到: /home/richard/youdao_notes_export
14:30:01 [INFO] 📂 我的笔记/
14:30:02 [INFO] 📂 我的笔记/学习笔记/
14:30:02 [INFO] ✅ 我的笔记/学习笔记/基于MimoLLM大模型训练.md (3245 bytes)
14:30:03 [INFO] ✅ 我的笔记/学习笔记/RestAPI开发者详解.md (1872 bytes)
14:30:04 [INFO] ✅ 我的笔记/随手记.md (512 bytes)
14:30:04 [INFO] 📂 旅行攻略/
14:30:05 [INFO] ✅ 旅行攻略/泰国自由行.md (4096 bytes)
14:30:05 [INFO] ==================================================
14:30:05 [INFO] 📊 导出完成!
14:30:05 [INFO]    📁 目录数: 3
14:30:05 [INFO]    📝 笔记数: 5
14:30:05 [INFO]    ⏭️  跳过数: 0
14:30:05 [INFO]    ❌ 错误数: 0
14:30:05 [INFO]    📍 输出路径: /home/richard/youdao_notes_export
14:30:05 [INFO] ==================================================

API 原理

脚本基于有道云笔记 Web 端逆向 API,通过浏览器 Cookie 认证访问以下三个核心接口:

接口 方法 用途
/yws/api/personal/file?method=getByPath POST 获取根目录 ID
/yws/api/personal/file/{id}?method=listPageByParentId GET 列出目录下所有文件和子目录
/yws/api/personal/sync?method=download POST 下载笔记原始内容

所有请求通过 YNOTE_CSTK Cookie 进行 CSRF 验证。笔记内容根据编辑器类型返回不同格式:

  • .md 笔记:直接返回 Markdown 文本
  • XML 格式:较早版本编辑器生成,使用 xml.etree.ElementTree 解析并转换
  • JSON 格式:较新版本编辑器生成,解析 JSON 结构并按类型映射为 Markdown
  • HTML 格式:使用 markdownify 库转换(需额外安装)

笔记格式转换对照

有道笔记格式 Markdown 转换结果
标题 (h1-h6) # ~ ######
粗体 **文本**
斜体 *文本*
代码块 ```lang ... ```
引用 > 文本
有序列表 1. 文本
无序列表 - 文本
待办事项 - [ ] 文本
图片 ![alt](url)
附件 [filename](url)
表格 Markdown 表格语法
分割线 ---
链接 [text](url)

常见问题

Q: 报错 YNOTE_CSTK 字段为空

请检查 cookies.json 中是否包含 YNOTE_CSTK 字段。该字段是必需的 CSRF 令牌。

Q: 导出后内容为空或乱码

可能是 Cookie 已过期,请重新登录有道云笔记并重新导出 Cookie。

Q: 请求被限流 / 429 错误

脚本已内置 150ms 请求间隔。如果仍被限流,可以修改脚本中的 time.sleep(0.15) 增大间隔值。

Q: 如何只导出某个目录?

使用 -d 参数指定目录名:

python3 youdao_note_export.py -c cookies.json -d "我的笔记"

运行时会列出所有顶层目录供参考。

Q: HTML 格式笔记转换报错

安装 markdownify 库:

pip install markdownify

致谢

API 接口逆向参考自 DeppWang/youdaonote-pull(MIT License),本脚本在此基础上进行了重写和增强。

许可证

MIT License

上一篇