A timed, slide-by-slide walk-through. Each slide tells you what to show on screen, what to say, what to click, and what to ask the class. You'll move between this deck (your notes) and the student materials (the four real editions plus the portal).
一份按时间表展开、逐页讲解的教学手册。每一页告诉你:要展示什么、要说什么、要点哪里、要让学生回答什么。讲课时你会在这份幻灯(你的备课笔记)与学生材料(四个真实版本加门户)之间切换。
Digital-humanities, classics, archaeology, or library-science students with mixed technical backgrounds. Equally usable in advanced undergrad or grad courses.
数字人文、古典学、考古学或图书馆学专业学生(技术背景不一)。本科高年级与研究生课程均可使用。
All five student-facing files plus this deck are bilingual EN/中文. Toggle once via the top-right pill — the choice is sticky across all six files via localStorage.
五份学生材料 + 本备课幻灯都中英双语。在右上角切换一次,localStorage 会让选择在六份文件中保持一致。
Pre-class prep: open all five in browser tabs so you can flip without delay during the lesson.
课前准备:把五份文件都开成浏览器标签页,上课时随手切换,无须等待加载。
| # | File | Role in class | |
|---|---|---|---|
| 1 | sdam-portal.html | Portal · the chooser students see first | 门户页 · 学生最先看到 |
| 2 | sdam-visual-slideshow.html | 27-slide slideshow · the spine of Block 1+2 | 27 张幻灯片 · 第一与第二段的主线 |
| 3 | sdam-reference.html | Long-scroll reference · used in Block 4 | 长滚动参考 · 第四段使用 |
| 4 | sdam-paper-jdh2021.html | JDH 2021 paper walkthrough · used in Block 4 | JDH 2021 论文导读 · 第四段使用 |
| 5 | sdam-case-study-isic000470.html | ISic000470 case study · the whole of Block 3 | ISic000470 案例研究 · 整个第三段 |
localStorage.localStorage 会让六份文件保持一致。sdam_intro/ISic000470 cil x 7296 image.png next to the HTML files. Have it ready to display in the opening hook.sdam_intro/ISic000470 cil x 7296 image.png,与 HTML 文件同目录。开场钩子时立即展示。The case study (Block 3) gets the most time — it's the most concrete and most memorable beat. Give it your most energetic teaching.
案例研究(第三段)分配的时间最多,它最具体、最易记。这段最值得你投入最饱满的精力。
Need 60 / 30 / 150 minutes instead? See compression notes near the end of this deck (slide 35).
如果只有 60 / 30 / 150 分钟?见本幻灯末尾(第 35 页)的压缩与扩展说明。
Show the photograph of the bilingual stonecutter's plaque (full screen). Read the translation aloud. Now ask the class:
展示这块双语石匠铺招牌的照片(全屏)。把译文念出来。然后向全班提问:
"How would you turn this stone into data?"
"你会怎么把这块石头变成数据?"
The full-screen photograph of ISic000470. Photograph file is at ISic000470 cil x 7296 image.png in this same folder.
全屏展示 ISic000470 的照片。图片文件位于本目录下 ISic000470 cil x 7296 image.png。
"How would you turn this stone into data?" — let students answer for ~2 min. Don't direct them; just listen.
"你会怎么把这块石头变成数据?",给学生约 2 分钟自由回答。不要引导,只听。
Photograph it. Transcribe the text. Record where it's from. Note dimensions. Date it. Classify the type. (All real fields in modern databases.)
拍照、转写文本、记录出土地、记尺寸、定年、分类型,这些都是现代数据库的真实字段。
Lump everything into one column. (Tell the class: "Both kinds of answer are useful — one names what to keep, the other warns what's easy to lose.")
"把所有信息塞进一列"。告诉学生:"这两种回答都有用,前者列出该保留的,后者提醒容易丢掉的。"
Bridge to Block 1 (one sentence): "The hardest data work in ancient-world digital humanities is making the abstract good answers concrete and consistent across half a million inscriptions. That's what SDAM does. Today we'll see how — and where they still struggle."
过渡到第一段(一句话):"古代世界数字人文最难的工作,就是把这些抽象的好答案,一致而具体地落在五十万件铭文上。SDAM 做的就是这件事。今天我们看看他们怎么做,以及他们仍然力有不逮的地方。"
Mode: instructor walks through Visual slides 1–13. Students follow on their own laptops or watch on a projector.
模式:教师讲解视觉版第 1–13 张幻灯。学生用自己的电脑跟随,或集中观看投影。
| # | Topic主题 | What to do怎么做 | |
|---|---|---|---|
| 1 | From Stone to Data从石头到数据 | Just title; arrow-key advance. | 仅标题;按方向键继续。 |
| 2 | Why this exists缘起 | Stress the 600,000 figure. Click the dotted-underlined "inscriptions" gloss to demo the popover system. | 强调"60 万件"。点击带下划线的"inscriptions"演示术语弹窗。 |
| 3 | Three steps · E·T·L三个字母 · E·T·L | Click each E / T / L button to reveal the definition. Tell the class: "This whole class is just unpacking these three letters." | 点击 E、T、L 三个按钮逐一展开定义。告诉学生:"今天整堂课其实就是在拆这三个字母。" |
| 4 | Three core pipelines三条核心流水线 | Pause here. Have students click each pipeline card. Establish: EDH and EDCS are different sources; LI merges them. | 在此暂停。让学生点开每张流水线卡。讲明:EDH 与 EDCS 是不同来源;LI 把两者合并。 |
| 5 | Build the pipeline搭建流水线 | 3-min activity: have a student call out the order. Click the blocks live. Each correct placement reveals a definition card — read each aloud. | 3 分钟互动:让学生说出顺序,现场点击方块。每个正确位置弹出说明卡,念出来。 |
| 6 | Extract intro提取概览 | "Extract = make a copy of everything, no changes yet." | "提取 = 把所有内容做一份副本,此时不做任何修改。" |
| 7 | EDH "vending machine"EDH "自动售货机" | Click "Make the request." Let the counter tick up to 81,883. Read aloud the info-panel that pops up afterward — it names the actual notebook (1_1_py_EXTRACTION…). | 点击"发起请求",让计数跳到 81,883。然后念出弹出的说明面板,它给出了真实笔记本名 (1_1_py_EXTRACTION…)。 |
| # | Topic主题 | What to do怎么做 | |
|---|---|---|---|
| 8 | EDCS scraperEDCS 抓取 | Click "Start the scraper." Watch the 18 provinces light up. Note the time difference: API 12 min vs scraper 4–5 hours. | 点击"启动抓取"。看 18 个行省依次亮起。强调时间差:API 12 分钟 vs 抓取 4–5 小时。 |
| 9 | Transform intro转换概览 | "Every cleaning rule is a scholarly choice — and SDAM keeps the original alongside the cleaned version." | "每条清洗规则都是一个学术抉择,SDAM 在保留清洗版的同时也保留原始版。" |
| 10 | Cleaning chips清洗标签 | Click each chip in turn. The dirty value crosses out, the clean value appears in green. Each chip's info-panel names the actual SDAM notebook. | 依次点击每个标签。原值划掉,清洗后绿色显示。说明面板给出对应的 SDAM 笔记本名。 |
| 11 | Merge duplicates合并重复 | Click "Merge the duplicates" — two records collapse into one. Info-panel describes ML harmonization of inscription-type taxonomies. | 点击"合并重复",两条记录折叠为一条。说明面板介绍机器学习如何统一类型分类。 |
| 12 | Load · two destinations加载 · 两个目的地 | Two cards: sciencedata.dk (working drive) vs Zenodo (permanent DOI archive). Click each for the popover. | 两张卡片:sciencedata.dk(工作存储)vs Zenodo(带 DOI 的永久存档)。点击每一张看弹窗。 |
| 13 | Full journey完整旅程 | Click "Play." A single inscription travels left → right through Extract / Transform / Load. Good moment to recap Block 1. | 点击"播放"。一条铭文从左到右穿越提取 / 转换 / 加载。好时机做 第一段总结。 |
"Can someone reproduce E-T-L in their own words?" — listen for: extract = grab, transform = clean, load = publish.
"谁能用自己的话复述 E-T-L?",期望听到:提取 = 抓取,转换 = 清洗,加载 = 发布。
Continuing in the Visual edition · slides 14–19. Each slide unpacks a "next move beyond the basics."
继续在视觉版 · 第 14–19 张幻灯。每页展开"在 ETL 之外"的进一步内容。
| # | Topic主题 | Talking point讲解要点 | |
|---|---|---|---|
| 14 | tempun · datingtempun · 定年 | Click toggle: "Midpoint" → tall fake spike. "Monte Carlo" → smooth curve. Ask: "Which is more honest? Which is easier to make a graph from?" Click the gloss on tempun and Monte Carlo. | 点击切换:"中点定年"→ 一根虚高峰。"蒙特卡洛"→ 平滑曲线。提问:"哪种更诚实?哪种更容易画图?"点击 tempun 与 蒙特卡洛 的术语弹窗。 |
| 15 | Six corpora六个语料库 | Click each card. Make the point: "ETL works for Greek inscriptions, Greek texts, and Bulgarian burial mounds — not just Latin." | 点击每张卡片。强调:"ETL 不仅可用于拉丁文,还可用于希腊文铭文、希腊文献、保加利亚坟丘。" |
| 16 | Analysis ecosystem分析生态 | Each card links to a real SDAM repo on GitHub. Click 1–2 (e.g. epigraphic_roads, social_diversity). Stress: "These are the research questions that the pipelines exist to support." | 每张卡片都链到 GitHub 上真实的 SDAM 仓库。点开 1–2 个(如 epigraphic_roads、social_diversity)。强调:"这些是流水线服务于的 研究问题。" |
| 17 | sddk + sdam packagessddk + sdam 包 | Click each card. Notice: sddk is Python (PyPI), sdam is R (CRAN). Tee up Block 4: "We'll see the actual code soon." | 点击每张卡。注意:sddk 是 Python (PyPI),sdam 是 R (CRAN)。为第四段铺垫:"稍后我们会看真实代码。" |
| 18 | JDH paper · 3 layersJDH 论文 · 三层 | Click each layer card (Narrative / Hermeneutic / Data). Stress: "The journal itself is part of the argument." | 点击每张三层卡(叙事 / 诠释 / 数据)。强调:"这份期刊 本身 就是论文论点的一部分。" |
| 19 | Seven figures gallery七图廊 | Click any tile — links to the Paper Edition's deep dive. Don't enter the Paper Edition yet; just show the link works. | 点击任意一格,链接到论文版的深度讲解。但暂不进入论文版,只演示链接可用。 |
"Notice we've gone from one stone, to 600,000 inscriptions, to a published paper. What got lost between the stone and the chart?" Hold the answer — Block 3 is about exactly that.
"注意,我们从一块石头,到 60 万件铭文,到一篇发表论文。从石头到图表的过程中丢了什么?"先不答,第三段就讲这件事。
Five minutes off-screen. Encourage water, walking, talking.
五分钟离开屏幕。喝水、走动、聊天。
Suggestion: invite students to open the walkthrough portal on their phones to see the responsive layout — it works.
建议:让学生在手机上打开门户页,看看响应式布局,是能用的。
Open sdam-case-study-isic000470.html. Walk the section sidebar top-to-bottom. This is the most concrete, most memorable block of the class — give it the most energy.
打开 sdam-case-study-isic000470.html。沿侧栏从上到下讲解。这是整堂课最具体、最易记忆的一段:投入最饱满的精力。
The stylized stele rendering with the Greek and Latin columns side by side. Open §1 ↗
展示双栏石碑的程式化渲染(希腊文+拉丁文)。打开 §1 ↗
Have students hover over a Greek line — the parallel Latin lights up. Read the translation aloud.
让学生把鼠标悬停到希腊文某行,对应拉丁文会高亮。把译文念出来。
Note the archaic Latin spellings: heic (later hic), aidibus sacreis (later aedibus sacris), qum (later cum). The archaic forms are part of the dating evidence.
注意拉丁文的古拼写:heic(后期 hic)、aidibus sacreis(后期 aedibus sacris)、qum(后期 cum)。这些古形 是定年证据的一部分。
"What is this stone for? What's the social context?" — let a student read the answer from the rendered text.
"这块石头有什么用?社会背景是什么?",让学生从展示文本中读出答案。
Two side-by-side SVG facsimiles: CIL X 7296 (Mommsen 1883) and IG XIV 297 (Kaibel 1890). This is the centerpiece of the class. Take it slow.
两份并置的 SVG 摹本:CIL X 7296(Mommsen 1883)与 IG XIV 297(Kaibel 1890)。这是全课最关键的一段。慢慢讲。
QVM → CVM. The actual stone says QVM. Only I.Sicily's <choice> markup encodes both forms.QVM → CVM。原石上写的是 QVM。只有 I.Sicily 的 <choice> 同时编码两种形式。Read aloud: "Stones lose nothing visually but require a viewer; print editions lose visual evidence but encode argument; digital databases lose argument but enable scale."
念出来:"石头不丢任何视觉信息,但需要观者;印本丢掉视觉证据,但保留论证;数字数据库丢掉论证,但获得规模。"
Click through the five database tiles. Each is a real outbound link to the actual record. Note the EDH absence — this single inscription, recorded in five other databases, is not in EDH at all. Therefore not in the JDH paper's most rigorous filtered subset (EDCSx).
点击五个数据库卡片。每张都是通往真实记录的外链。注意 EDH 缺席,这条铭文出现在另外五个库中,但完全不在 EDH 里。因此也不在论文最严谨的 EDCSx 子集中。
Scroll. Highlight orange-tinted cells (the conflicts): inventory number (3574 vs 8822), width (24.5 vs 14.5 cm), date (four ranges), Latin line 12 (qum vs cum). Don't deep-dive each row — just establish the disagreement is everywhere.
滚动。指出橙色单元(冲突):馆藏号(3574 vs 8822)、宽度(24.5 vs 14.5 cm)、定年(四种)、拉丁文第 12 行(qum vs cum)。不要逐行深入:让学生看到分歧"处处皆是"即可。
Skim, don't deep-dive. Stop on Issue 6 (text↔image anchoring) — it has the only interactive demo.
略览,不深入。在问题 6(文图锚定)停下,仅这一项有交互演示。
"If you were building a corpus right now, which of these seven would worry you the least? The most? Why?"
"如果你现在要建一个语料库,这七项里哪一项最让你担心?哪一项最不担心?为什么?"
Click "Run the merge". Six conflict lines cascade in over ~4 seconds. Read each as it appears.
点击 "运行合并"。六条冲突信息在约 4 秒内依次出现。逐条念出。
"This inscription cannot be merged into a single flat row without losing scholarly content from four of the five records."
"这条铭文无法被压扁为一行,否则就要丢掉另外四条记录中的学术内容。"
"What should the editor's interpretive judgment look like in a digital edition — structured enough to query, but not flattened to a label?"
"在数字版中,编者的诠释性判断应当长什么样,既要结构化可查询,又不能被压扁成单一标签?"
Open sdam-reference.html § 4.5. Then close on the JDH Paper Edition's three Stance cards.
打开 sdam-reference.html §4.5。最后回到论文版的三张"立场"卡。
Read the four properties out loud:
把四条属性念出来:
1_0_py_… / 1_4_r_….1_0_py_… / 1_4_r_…。One sentence: "A language built for statistics and data wrangling since 1993." Show the chained |> regex calls in 1_5_r_TEXT_INSCRIPTION_CLEANING.Rmd. Each call is a scholarly decision.
一句话:"1993 年起为统计与数据整理而生的语言。"展示 1_5_r_TEXT_INSCRIPTION_CLEANING.Rmd 中链式 |> 正则调用。每次调用都是 一次学术抉择。
Click the link to the actual EDH 1_1 notebook on GitHub — show what one looks like in the wild.
点击进入 EDH 1_1 真实 GitHub 笔记本,让学生看真实场景中的样貌。
In Reference §4.5 there's a real ~25-line Python loop. Don't read every character — just point at three things:
参考版 §4.5 中有一段约 25 行的真实 Python 循环。不必逐字念,只点三处:
while True loop — this is the entire ETL "Extract" stage. It runs ~410 times for EDH (81,883 records ÷ 200 per page).while True 循环:它就是整个 ETL "提取"阶段。对 EDH 而言约执行 410 次(81,883 条 ÷ 每页 200)。time.sleep(0.2) — a courtesy. Pauses 200 ms between calls so the EDH server doesn't see this as a denial-of-service attack.time.sleep(0.2):一种礼让。每次调用之间暂停 200 毫秒,避免 EDH 服务器把抓取误判为拒绝服务攻击。ensure_ascii=False, indent=2 — what makes Greek characters readable in the JSON output. Without it, στῆλαι becomes στῆλαι.ensure_ascii=False, indent=2:让希腊字符在 JSON 输出中可读。否则 στῆλαι 会变成 στῆλαι。Show the regex example: input D(is) [M(anibus)] / Iuliae [- - -] / vix(it) ann(os) XX.
展示正则例子:输入 D(is) [M(anibus)] / Iuliae [- - -] / vix(it) ann(os) XX。
"Dis Manibus Iuliae vixit annos XX"
"献给 Iulia 的诸神之灵,她活了 20 岁"
Restorations preserved.
保留修复部分。
"Dis Iuliae vixit annos XX"
"献给 Iulia 的神(缺)……活了 20 岁"
"Manibus" was a restoration → dropped.
"Manibus" 是修复 → 丢弃。
The same regex with one character of difference ([^\]]* vs nothing inside the brackets) flips between the two readings. SDAM ships both columns. EDCS ships neither — only its own one transcription, with no record of which form it chose.
同一个正则只差一个字符(方括号内是 [^\]]* 还是空)就在两种读法之间切换。SDAM 同时发布两列;EDCS 都不发布,只给一种转写,且不说明选了哪种。
Switch to the Paper Edition § 7. Read the three Stance cards aloud:
切换到论文版 §7。把三张"立场"卡逐张念出:
Every figure must be re-derivable from the same code and data.
每张图都必须能由同一份代码与数据重新生成。
Dating uncertainty must be propagated, not hidden.
日期不确定性必须传播,不能掩盖。
Disagreement between EDH and EDCS is not noise to be averaged out — it's scholarly evidence.
EDH 与 EDCS 之间的分歧不是要平均掉的噪声,而是学术证据。
Have a student volunteer paste two lines into a notebook (or a Python REPL on the projector):
让一位志愿者学生把两行代码粘到笔记本(或投影上的 Python REPL):
import pandas as pd EDH = pd.read_json( "https://sciencedata.dk/public/b6b6afdb969d378b70929e86e58ad975/EDH_text_cleaned_2022_11_03.json" ) print(EDH.shape)
If it works, you'll see (81883, ~70) — 81,883 inscriptions × ~70 columns. The dataset that drives the JDH paper is now in memory on a student's laptop. Under fluorescent classroom light, this is a small dramatic moment. Pause for it.
如果成功,会看到 (81883, ~70),81,883 条铭文 × 约 70 列。驱动 JDH 论文的数据集,现在就在学生的笔记本电脑里。在教室灯光下,这是个小小的戏剧时刻。停一下,让它发生。
If the network is down, show a screenshot — but the URL is genuinely public and stable.
如果网络故障,展示截图,但这个 URL 是真实公开且稳定的。
Pose one of the following and let students answer briefly. Aim for ~5 min. Don't moderate too tightly — let the conversation thread.
从下列三题中选一题,让学生简短回答。目标约 5 分钟。不要管得太紧,让讨论自然延展。
"What's a research question you'd ask of this dataset that would have been impossible without ETL?"
"你会向这个数据集提一个什么样的研究问题,是没有 ETL 就根本问不出来的?"
"If you could add one field to the SDAM data model that no current database has, what would it be?"
"如果让你给 SDAM 数据模型新增一个字段(任何现有数据库都没有的),你会加什么?"
"Where should the editor's interpretive judgment live in a digital edition — and how can it be made structured enough to query without losing its argumentative character?"
"在数字版中,编者的诠释性判断应放在哪里,怎么做才能既结构化可查询,又不丢掉它的论证特质?"
Pick a Greek inscription in I.Sicily; trace it across the database layer. Write a one-page note on which of the seven issues recur and which don't.
在 I.Sicily 选一条希腊文铭文,追踪它在各数据库中的样貌。写一页笔记:七项问题中哪几项重现、哪几项不重现。
Run the JDH companion notebook end-to-end. Reproduce the epigraphic-habit curve using midpoint dating, then re-run with tempun's Monte Carlo. Compare the two.
把 JDH 配套笔记本完整跑一遍。先用中点定年复现"铭文习俗"曲线,再用 tempun 蒙特卡洛跑一遍,对比两条曲线。
Which of the case study's seven issues do you think the model could close, and how? Which are irreducibly about which questions get asked rather than which fields exist?
案例的七项问题中,你认为哪些可以被数据模型关闭,怎么做?哪些其实关乎 提什么问题,而非 有什么字段?
Drop Block 4 entirely. Shorten Block 3 to 15 min by skipping § 4's seven-issue walkthrough. Skip the stretch.
整段去掉第四段。第三段缩到 15 分钟,跳过 §4 七项问题的逐项讲解。取消休息。
Hook + Visual slides 1–13 only. The case study and the paper become take-home material. Recap the JDH paper's argument in one sentence at the end.
仅做开场钩子 + 视觉版第 1–13 页。案例研究与论文留作课后材料。结尾用一句话概括 JDH 论文的论点。
Add a hands-on lab where every student loads the EDH JSON in Python or R and produces a single chart of inscription types over time.
增加一段动手实验:让每个学生用 Python 或 R 加载 EDH JSON,画一张"铭文类型随时间变化"的图。
Each slide has a stable hash anchor; you can deep-link from a syllabus, browser tab, or quiz question.
每张幻灯都有稳定的哈希锚点;可在教学大纲、浏览器标签、测验题中深链。
#slide-titleTitle page标题#slide-whyWhy ETL?缘起#slide-etlE·T·L explainedE·T·L 释义#slide-pipelinesThree core pipelines三流水线#slide-builderClick-to-build搭建互动#slide-extractExtract intro提取概览#slide-extract-edhEDH vending machineEDH 售货机#slide-extract-edcsEDCS scraperEDCS 抓取#slide-transformTransform intro转换概览#slide-cleanCleaning interactive清洗互动#slide-mergeDedup / harmonize去重统一#slide-loadLoad destinations加载目的地#slide-flowFull journey完整旅程#slide-tempunProbabilistic dating概率定年#slide-corporaGreek + mounds希腊+坟丘#slide-analysisAnalysis ecosystem分析生态#slide-packagessddk + sdam辅助包#slide-articleJDH 3-layer formatJDH 三层式#slide-figuresSeven figures gallery七图画廊#slide-onestoneCase study intro案例引入#slide-eleven-ids11-IDs problem十一编号#slide-print-digitalCIL/IG print → digitalCIL/IG 纸到数字#slide-jupyter-rJupyter & R explainerJupyter 与 R#slide-anchorText↔image SVG文图锚定#slide-collisionMerge collision sim合并冲突#slide-tryNo-code data load免代码加载#slide-recapWrap & resources总结与资源| If you want to…如果你想…… | Open …打开 … | ||
|---|---|---|---|
| browse all 37 SDAM repos by theme | Reference §2 — sortable interactive table | 按主题浏览全部 37 个 SDAM 仓库 | 参考版 §2 — 可排序交互表 |
| understand the JDH paper section by section | Paper Edition | 逐节理解 JDH 论文 | 论文版 |
| see the actual TEI XML behind I.Sicily | github.com/ISicily/ISicily | 看 I.Sicily 背后的 TEI XML | github.com/ISicily/ISicily |
| read about epigraphic habit in the original | MacMullen 1982 (JSTOR) | 原典读"铭文习俗" | MacMullen 1982 (JSTOR) |
| play with tempun Monte Carlo dating | github.com/sdam-au/tempun_demo | 把玩 tempun 蒙特卡洛定年 | github.com/sdam-au/tempun_demo |
| see the JDH article itself | doi.org/10.1515/jdh-2021-1004 | 原文阅读 JDH 文章 | doi.org/10.1515/jdh-2021-1004 |
Open the four student-facing files in tabs. Pick a language. Start the timer. Welcome the class. Begin with the photograph.
把四份学生材料打开为标签页。选定语言。计时开始。欢迎学生进教室。从照片开始。
Companion markdown: sdam_teaching_guide_100min.md in this folder.
配套 Markdown:本目录下 sdam_teaching_guide_100min.md。