基于有道云笔记 Web API,递归导出所有笔记为可编辑的 Markdown 文件,完整保持目录结构。
功能特性
| 特性 | 说明 |
|---|---|
| 认证方式 | Cookie 登录(从浏览器复制) |
| 支持格式 | .md(直接保存)、XML、JSON、HTML → 全部转为 Markdown |
| 目录结构 | 完整保持有道云笔记的原始目录层级 |
| 增量导出 | 默认跳过已存在的文件,--no-skip 强制覆盖 |
| 限流保护 | 每次请求间隔 150ms,避免触发限制 |
环境要求
- Python 3.7+
- 依赖库:
requests
pip install requests
获取 Cookies
脚本通过浏览器 Cookie 认证访问有道云笔记 API,需要手动导出 Cookie:
- 浏览器登录 note.youdao.com
- 按 F12 打开开发者工具
- 进入 Application → Cookies →
note.youdao.com - 找到以下关键 Cookie:
YNOTE_CSTK(必需,CSRF 验证令牌)YNOTE_SESS(必需,会话标识)YNOTE_LOGIN(必需,登录状态)- 其他
YNOTE_*开头的 Cookie 也建议一并复制
- 将 Cookie 保存为
cookies.json文件
💡 也可以使用浏览器扩展 EditThisCookie 一键导出 JSON 格式。
cookies.json 格式
支持两种格式,任选其一:
格式一:数组格式(推荐)
{
"cookies": [
["YNOTE_CSTK", "cstk值", ".youdao.com", "/"],
["YNOTE_SESS", "sess值", ".youdao.com", "/"],
["YNOTE_LOGIN", "login值", ".youdao.com", "/"]
]
}
格式二:字典格式
{
"YNOTE_CSTK": "cstk值",
"YNOTE_SESS": "sess值",
"YNOTE_LOGIN": "login值"
}
Main : youdao_note_export.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
import json
import logging
import os
import re
import sys
import time
import xml.etree.ElementTree as ET
from pathlib import Path
from typing import Dict, List, Optional, Tuple
try:
import requests
except ImportError:
print("请先安装 requests: pip install requests")
sys.exit(1)
# 日志
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
datefmt="%H:%M:%S",
)
log = logging.getLogger("youdao-export")
#有道云笔记API端点
BASE = "https://note.youdao.com"
API_ROOT_DIR = BASE + "/yws/api/personal/file?method=getByPath&keyfrom=web&cstk={cstk}"
API_LIST_DIR = (
BASE
+ "/yws/api/personal/file/{dir_id}?all=true&f=true&len=1000&sort=1"
"&isReverse=false&method=listPageByParentId&keyfrom=web&cstk={cstk}"
)
API_DOWNLOAD = (
BASE
+ "/yws/api/personal/sync?method=download&_system=macos&_systemVersion="
"&_screenWidth=1280&_screenHeight=800&_appName=ynote&_appuser=0123456789abcdeffedcba9876543210"
"&_vendor=official-website&_launch=16&_firstTime=&_deviceId=0123456789abcdef"
"&_platform=web&_cityCode=110000&_cityName=&sev=j1&keyfrom=web&cstk={cstk}"
)
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
}
# API 客户端
class YoudaoNoteClient:
def __init__(self, cookies_path: str = "cookies.json"):
self.session = requests.Session()
self.session.headers.update(HEADERS)
self.cstk: Optional[str] = None
self._load_cookies(cookies_path)
# Cookie
def _load_cookies(self, path: str):
with open(path, "rb") as f:
data = json.loads(f.read().decode("utf-8"))
cookies = data.get("cookies", data) # 兼容两种格式
if isinstance(cookies, list):
for c in cookies:
name, value = c[0], c[1]
domain = c[2] if len(c) > 2 else ".youdao.com"
path_ = c[3] if len(c) > 3 else "/"
self.session.cookies.set(name, value, domain=domain, path=path_)
if name == "YNOTE_CSTK":
self.cstk = value
elif isinstance(cookies, dict):
for name, value in cookies.items():
self.session.cookies.set(name, value, domain=".youdao.com", path="/")
if name == "YNOTE_CSTK":
self.cstk = value
if not self.cstk:
raise ValueError("cookies.json 中缺少 YNOTE_CSTK 字段,无法通过 CSRF 验证")
log.info("✅ Cookies 加载成功,YNOTE_CSTK=%s...", self.cstk[:8])
# 获取根目录
def get_root_id(self) -> str:
url = API_ROOT_DIR.format(cstk=self.cstk)
resp = self.session.post(url, data={"path": "/", "entire": "true", "purge": "false", "cstk": self.cstk})
resp.raise_for_status()
data = resp.json()
root_id = data["fileEntry"]["id"]
log.info("📁 根目录 ID: %s", root_id)
return root_id
# 列出目录下所有条目
def list_dir(self, dir_id: str) -> List[Dict]:
url = API_LIST_DIR.format(dir_id=dir_id, cstk=self.cstk)
resp = self.session.get(url)
resp.raise_for_status()
data = resp.json()
entries = data.get("entries", [])
return entries
# 下载文件原始内容
def download_file(self, file_id: str) -> bytes:
url = API_DOWNLOAD.format(cstk=self.cstk)
data = {
"fileId": file_id,
"version": -1,
"convert": "true",
"editorType": 1,
"cstk": self.cstk,
}
resp = self.session.post(url, data=data)
resp.raise_for_status()
return resp.content
# 格式转换
def _sanitize_filename(name: str) -> str:
return re.sub(r'[\/:*?"<>|]', "_", name).strip()
# XML → Markdown
def _xml_get_text(children, key="text") -> str:
for child in children:
if key in child.tag:
return child.text or ""
return ""
def _xml_convert_element(element, list_items: dict) -> str:
# 转换单个XML element为Markdown文本
tag = element.tag.replace("{http://note.youdao.com}", "").replace("-", "_")
text = _xml_get_text(list(element))
converters = {
"para": lambda e, t: t,
"heading": lambda e, t: f"{'#' * int(e.attrib.get('level', 1))} {t}" if t else "",
"image": lambda e, t: f", 'source')})",
"attach": lambda e, t: f"[{_xml_get_text(list(e), 'filename')}]({_xml_get_text(list(e), 'resource')})",
"code": lambda e, t: f"```{_xml_get_text(list(e), 'language')}n{t}n```",
"todo": lambda e, t: f"- [ ] {t}",
"quote": lambda e, t: f"> {t}",
"horizontal_line": lambda e, t: "---",
}
if tag in converters:
return converters[tag](element, text)
if tag == "list_item":
list_id = element.attrib.get("list-id", "")
is_ordered = list_items.get(list_id, "unordered")
prefix = "1." if is_ordered == "ordered" else "-"
return f"{prefix} {text}"
if tag == "table":
content = _xml_get_text(list(element), "content")
if not content:
return ""
try:
table_data = json.loads(content)
except (json.JSONDecodeError, TypeError):
return ""
widths = table_data.get("widths", [])
if not widths:
return ""
cols = len(widths)
rows: List[List[str]] = []
row: List[str] = []
for cell in table_data.get("cells", []):
row.append(cell.get("value", "").replace("|", "\|"))
if len(row) == cols:
rows.append(row)
row = []
if not rows:
return ""
lines = ["| " + " | ".join(rows[0]) + " |"]
lines.append("| " + " | ".join(["---"] * cols) + " |")
for r in rows[1:]:
lines.append("| " + " | ".join(r) + " |")
return "n".join(lines)
return text # fallback
def xml_to_markdown(content: bytes) -> str:
# 将有道 XML 格式笔记转为Markdown
try:
root = ET.fromstring(content)
except ET.ParseError:
return content.decode("utf-8", errors="replace")
# 解析列表类型映射
list_items: Dict[str, str] = {}
meta = root[0] if len(root) > 0 else None
if meta is not None:
for child in meta:
if "list" in child.tag:
list_items[child.attrib.get("id", "")] = child.attrib.get("type", "unordered")
body = root[1] if len(root) > 1 else root
parts: List[str] = []
for element in body:
md = _xml_convert_element(element, list_items)
if md is not None:
parts.append(md)
return "nn".join(parts)
#JSON → Markdown
class JsonConverter:
# JSON 格式笔记转 Markdown
def _get_common_text(self, content: dict) -> str:
text = ""
five = content.get("5")
if five:
seven = five[0].get("7")
if not seven:
return ""
for item in seven:
raw = item.get("8", "")
attrs = item.get("9")
if raw and attrs:
raw = self._apply_attrs(raw, attrs)
text += raw
return text
def _apply_attrs(self, text: str, attrs: list) -> str:
for attr in attrs:
t = attr.get("2", "")
if t == "b":
text = f"**{text}**"
elif t == "i":
text = f"*{text}*"
return text
def _convert_text(self, content: dict) -> str:
all_text = ""
for block in content.get("5", []):
five_inner = block.get("5")
text_type = block.get("6")
seven = block.get("7")
if seven and not five_inner:
text = ""
for item in seven:
raw = item.get("8", "")
attrs = item.get("9")
if raw and attrs:
raw = self._apply_attrs(raw, attrs)
text += raw
elif text_type == "li" and five_inner:
source = self._get_common_text(block)
hf = (block.get("4") or {}).get("hf", "")
text = f"[{source}]({hf})" if hf else ""
else:
text = ""
if text:
all_text += text
return all_text
def _convert_h(self, content: dict) -> str:
level = int((content.get("4") or {}).get("l", "h1").replace("h", ""))
text = self._get_common_text(content)
return f"{'#' * level} {text}" if text else ""
def _convert_im(self, content: dict) -> str:
url = (content.get("4") or {}).get("u", "")
return f""
def _convert_a(self, content: dict) -> str:
fn = (content.get("4") or {}).get("fn", "")
re_url = (content.get("4") or {}).get("re", "")
return f"[{fn}]({re_url})"
def _convert_cd(self, content: dict) -> str:
lang = (content.get("4") or {}).get("la", "")
lines = []
for code in content.get("5", []):
lines.append(self._get_common_text(code))
return f"```{lang}n{''.join(lines)}n```"
def _convert_la(self, content: dict) -> str:
lines = []
for line in content.get("5", []):
lines.append(self._get_common_text(line))
return f"```n{''.join(lines)}n```"
def _convert_q(self, content: dict) -> str:
parts = []
for q in content.get("5", []):
t = self._get_common_text(q).replace("n", "")
parts.append(f"> {t}")
return "n".join(parts)
def _convert_l(self, content: dict) -> str:
text = self._get_common_text(content)
lt = (content.get("4") or {}).get("lt", "unordered")
ll = (content.get("4") or {}).get("ll", 1)
if lt == "unordered":
return "t" * (ll - 1) + f"- {text}"
return f"1. {text}"
def _convert_t(self, content: dict) -> str:
tr_list = content.get("5", [])
rows = []
for tr in tr_list:
cells = []
for tc in tr.get("5", []):
inner = tc.get("5", [{}])[0].get("5", [{}])[0].get("7")
cells.append(inner[0]["8"] if inner else " ")
rows.append(cells)
if not rows:
return ""
cols = len(rows[0])
lines = ["| " + " | ".join(rows[0]) + " |"]
lines.append("| " + " | ".join(["---"] * cols) + " |")
for r in rows[1:]:
lines.append("| " + " | ".join(r) + " |")
return "n".join(lines)
def convert(self, content: bytes) -> str:
try:
data = json.loads(content)
except (json.JSONDecodeError, ValueError):
return content.decode("utf-8", errors="replace")
converter_map = {
"t": self._convert_text,
"h": self._convert_h,
"im": self._convert_im,
"a": self._convert_a,
"cd": self._convert_cd,
"la": self._convert_la,
"q": self._convert_q,
"l": self._convert_l,
"t": self._convert_t,
}
parts = []
for block in data.get("5", []):
btype = block.get("6")
if btype and btype in converter_map:
md = converter_map[btype](block)
else:
md = self._convert_text(block)
if md:
parts.append(md)
return "nn".join(parts)
_json_converter = JsonConverter()
#统一转换
def convert_to_markdown(raw: bytes, suffix: str) -> str:
# 根据文件类型选择转换策略,返回Markdown字符串
if not raw:
return ""
# .md 文件直接返回
if suffix == ".md":
return raw.decode("utf-8", errors="replace")
# 检测 XML
if raw[:5] == b"<?xml" or b"<note " in raw[:200]:
return xml_to_markdown(raw)
# 检测 JSON
stripped = raw.lstrip()
if stripped[:1] == b"{":
return _json_converter.convert(raw)
# HTML fallback
text = raw.decode("utf-8", errors="replace")
if "<html" in text[:500].lower() or "<div" in text[:500].lower():
try:
from markdownify import markdownify
return markdownify(text).strip()
except ImportError:
log.warning("HTML 笔记需要 markdownify 库: pip install markdownify")
return text
return text
# 递归导出
class YoudaoExporter:
def __init__(self, client: YoudaoNoteClient, output_dir: str, skip_existing: bool = True):
self.client = client
self.output_dir = Path(output_dir)
self.skip_existing = skip_existing
self.stats = {"dirs": 0, "notes": 0, "skipped": 0, "errors": 0}
def run(self, root_dir_id: Optional[str] = None):
self.output_dir.mkdir(parents=True, exist_ok=True)
if root_dir_id is None:
root_dir_id = self.client.get_root_id()
log.info("🚀 开始导出到: %s", self.output_dir.resolve())
self._walk(root_dir_id, self.output_dir)
self._print_summary()
def _walk(self, dir_id: str, local_path: Path):
# 归遍历目录
entries = self.client.list_dir(dir_id)
for entry in entries:
fe = entry.get("fileEntry", {})
name = fe.get("name", "未命名")
entry_id = fe.get("id", "")
is_dir = fe.get("dir", False)
suffix = fe.get("suffix", "")
safe_name = _sanitize_filename(name)
if is_dir:
sub_dir = local_path / safe_name
sub_dir.mkdir(parents=True, exist_ok=True)
self.stats["dirs"] += 1
log.info("📂 %s/", sub_dir.relative_to(self.output_dir))
time.sleep(0.1) # 避免请求过快
self._walk(entry_id, sub_dir)
else:
self._export_note(entry_id, safe_name, suffix, local_path)
def _export_note(self, file_id: str, name: str, suffix: str, local_path: Path):
# 导出单条笔记
# 确保 .md 后缀
if suffix == ".md" or not suffix:
md_name = name if name.endswith(".md") else f"{name}.md"
else:
md_name = f"{name}.md"
md_path = local_path / md_name
# 跳过已存在文件
if self.skip_existing and md_path.exists():
self.stats["skipped"] += 1
log.debug("⏭️ 跳过: %s", md_path.relative_to(self.output_dir))
return
try:
raw = self.client.download_file(file_id)
md_content = convert_to_markdown(raw, suffix)
md_path.write_text(md_content, encoding="utf-8")
self.stats["notes"] += 1
log.info("✅ %s (%d bytes)", md_path.relative_to(self.output_dir), len(md_content))
except Exception as e:
self.stats["errors"] += 1
log.error("❌ %s - %s: %s", name, type(e).__name__, e)
time.sleep(0.15) # 请求间隔,避免触发限流
def _print_summary(self):
s = self.stats
log.info("=" * 50)
log.info("📊 导出完成!")
log.info(" 📁 目录数: %d", s["dirs"])
log.info(" 📝 笔记数: %d", s["notes"])
log.info(" ⏭️ 跳过数: %d", s["skipped"])
log.info(" ❌ 错误数: %d", s["errors"])
log.info(" 📍 输出路径: %s", self.output_dir.resolve())
log.info("=" * 50)
def main():
parser = argparse.ArgumentParser(
description="有道云笔记全量导出为 Markdown",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
cookies.json格式示例:
{
"cookies": [
["YNOTE_CSTK", "XXX", ".youdao.com", "/"],
["YNOTE_SESS", "XXX", ".youdao.com", "/"],
...
]
}
或者简写为dict格式:
{
"YNOTE_CSTK": "XXX",
"YNOTE_SESS": "XXX",
...
}
""",
)
parser.add_argument(
"-c", "--cookies",
default="cookies.json",
help="Cookies JSON 文件路径 (默认: cookies.json)",
)
parser.add_argument(
"-o", "--output",
default="youdao_notes_export",
help="导出输出目录 (默认: youdao_notes_export)",
)
parser.add_argument(
"-d", "--dir",
default=None,
help="只导出指定的有道云笔记顶层目录名 (默认: 全部)",
)
parser.add_argument(
"--no-skip",
action="store_true",
help="不跳过已存在的文件,强制重新导出",
)
parser.add_argument(
"-v", "--verbose",
action="store_true",
help="显示详细日志",
)
args = parser.parse_args()
if args.verbose:
logging.getLogger().setLevel(logging.DEBUG)
if not os.path.exists(args.cookies):
log.error("❌ Cookies 文件不存在: %s", args.cookies)
log.info("请按README说明获取Cookies并保存到 %s", args.cookies)
sys.exit(1)
# 初始化
client = YoudaoNoteClient(args.cookies)
# 获取根目录
root_id = client.get_root_id()
# 如果指定了目录
if args.dir:
entries = client.list_dir(root_id)
found = False
for entry in entries:
fe = entry.get("fileEntry", {})
if fe.get("name") == args.dir and fe.get("dir"):
root_id = fe["id"]
found = True
log.info("📁 指定目录: %s (ID: %s)", args.dir, root_id)
break
if not found:
log.error("❌ 未找到目录: %s", args.dir)
log.info("可用顶层目录:")
for entry in entries:
fe = entry.get("fileEntry", {})
if fe.get("dir"):
log.info(" - %s", fe.get("name"))
sys.exit(1)
# 开始导出
exporter = YoudaoExporter(
client=client,
output_dir=args.output,
skip_existing=not args.no_skip,
)
exporter.run(root_id)
if __name__ == "__main__":
main()
使用方法
基本用法
# 导出全部笔记到 youdao_notes_export 目录
python3 youdao_note_export.py -c cookies.json
完整参数
用法: python3 youdao_note_export.py [选项]
选项:
-h, --help 显示帮助信息
-c, --cookies PATH Cookies JSON 文件路径(默认: cookies.json)
-o, --output DIR 导出输出目录(默认: youdao_notes_export)
-d, --dir NAME 只导出指定的有道云笔记顶层目录名(默认: 全部)
--no-skip 不跳过已存在的文件,强制重新导出
-v, --verbose 显示详细日志
使用示例
# 导出全部笔记
python3 youdao_note_export.py -c cookies.json -o ./my_notes
# 只导出名为"我的笔记"的目录
python3 youdao_note_export.py -c cookies.json -o ./my_notes -d "我的笔记"
# 强制重新导出所有笔记(不跳过已有文件)
python3 youdao_note_export.py --no-skip -v
# 指定 Cookie 文件路径和输出目录
python3 youdao_note_export.py -c ~/Desktop/cookies.json -o ~/Desktop/youdao_backup
导出效果
导出后的目录结构与有道云笔记完全一致:
youdao_notes_export/
├── 我的笔记/
│ ├── 学习笔记/
│ │ ├── 基于MimoLLM大模型训练.md
│ │ ├── RestAPI开发者详解.md
│ │ └── LLM算法笔记.md
│ ├── 工作记录/
│ │ ├── 周报 2024-01.md
│ │ └── 会议纪要.md
│ └── 随手记.md
├── 旅行攻略/
│ ├── 泰国自由行.md
│ └── 苏州西山自驾行.md
└── 技术收藏/
└── Ceph分布式存储技术详解.md
运行日志示例
14:30:01 [INFO] ✅ Cookies 加载成功,YNOTE_CSTK=a1b2c3d4...
14:30:01 [INFO] 📁 根目录 ID: Wf8d9a...
14:30:01 [INFO] 🚀 开始导出到: /home/richard/youdao_notes_export
14:30:01 [INFO] 📂 我的笔记/
14:30:02 [INFO] 📂 我的笔记/学习笔记/
14:30:02 [INFO] ✅ 我的笔记/学习笔记/基于MimoLLM大模型训练.md (3245 bytes)
14:30:03 [INFO] ✅ 我的笔记/学习笔记/RestAPI开发者详解.md (1872 bytes)
14:30:04 [INFO] ✅ 我的笔记/随手记.md (512 bytes)
14:30:04 [INFO] 📂 旅行攻略/
14:30:05 [INFO] ✅ 旅行攻略/泰国自由行.md (4096 bytes)
14:30:05 [INFO] ==================================================
14:30:05 [INFO] 📊 导出完成!
14:30:05 [INFO] 📁 目录数: 3
14:30:05 [INFO] 📝 笔记数: 5
14:30:05 [INFO] ⏭️ 跳过数: 0
14:30:05 [INFO] ❌ 错误数: 0
14:30:05 [INFO] 📍 输出路径: /home/richard/youdao_notes_export
14:30:05 [INFO] ==================================================
API 原理
脚本基于有道云笔记 Web 端逆向 API,通过浏览器 Cookie 认证访问以下三个核心接口:
| 接口 | 方法 | 用途 |
|---|---|---|
/yws/api/personal/file?method=getByPath |
POST | 获取根目录 ID |
/yws/api/personal/file/{id}?method=listPageByParentId |
GET | 列出目录下所有文件和子目录 |
/yws/api/personal/sync?method=download |
POST | 下载笔记原始内容 |
所有请求通过 YNOTE_CSTK Cookie 进行 CSRF 验证。笔记内容根据编辑器类型返回不同格式:
.md笔记:直接返回 Markdown 文本- XML 格式:较早版本编辑器生成,使用
xml.etree.ElementTree解析并转换 - JSON 格式:较新版本编辑器生成,解析 JSON 结构并按类型映射为 Markdown
- HTML 格式:使用
markdownify库转换(需额外安装)
笔记格式转换对照
| 有道笔记格式 | Markdown 转换结果 |
|---|---|
| 标题 (h1-h6) | # ~ ###### |
| 粗体 | **文本** |
| 斜体 | *文本* |
| 代码块 | ```lang ... ``` |
| 引用 | > 文本 |
| 有序列表 | 1. 文本 |
| 无序列表 | - 文本 |
| 待办事项 | - [ ] 文本 |
| 图片 |  |
| 附件 | [filename](url) |
| 表格 | Markdown 表格语法 |
| 分割线 | --- |
| 链接 | [text](url) |
常见问题
Q: 报错 YNOTE_CSTK 字段为空
请检查 cookies.json 中是否包含 YNOTE_CSTK 字段。该字段是必需的 CSRF 令牌。
Q: 导出后内容为空或乱码
可能是 Cookie 已过期,请重新登录有道云笔记并重新导出 Cookie。
Q: 请求被限流 / 429 错误
脚本已内置 150ms 请求间隔。如果仍被限流,可以修改脚本中的 time.sleep(0.15) 增大间隔值。
Q: 如何只导出某个目录?
使用 -d 参数指定目录名:
python3 youdao_note_export.py -c cookies.json -d "我的笔记"
运行时会列出所有顶层目录供参考。
Q: HTML 格式笔记转换报错
安装 markdownify 库:
pip install markdownify
致谢
API 接口逆向参考自 DeppWang/youdaonote-pull(MIT License),本脚本在此基础上进行了重写和增强。
许可证
MIT License
.png)

的技术概念与应用-scaled.jpg)