SDAM ETL · 100-min Teaching Slides100 分钟教学幻灯
at时点 00:00 — Start— 开始
Portal门户 Visual视觉 Reference参考 Paper论文 Case案例

Teaching SDAM ETL

SDAM ETL 教学

A 100-minute lesson plan for the five-file SDAM ETL bundle.
面向五件 SDAM ETL 文件的 100 分钟课堂方案。
37 slides · bilingual EN / 中文 · arrow-key + swipe nav
37 张幻灯片 · 中英双语 · 方向键与手势导航
Open student portal ↗打开学生门户 ↗
Use arrows · Esc for overview · swipe on touch
使用 方向键 · 触屏可滑动

What this deck isAn instructor's companion for the SDAM ETL Suite.

这个幻灯是什么SDAM ETL 套件的 教师配套

A timed, slide-by-slide walk-through. Each slide tells you what to show on screen, what to say, what to click, and what to ask the class. You'll move between this deck (your notes) and the student materials (the four real editions plus the portal).

一份按时间表展开、逐页讲解的教学手册。每一页告诉你:要展示什么、要说什么、要点哪里、要让学生回答什么。讲课时你会在这份幻灯(你的备课笔记)与学生材料(四个真实版本加门户)之间切换。

Audience受众

Digital-humanities, classics, archaeology, or library-science students with mixed technical backgrounds. Equally usable in advanced undergrad or grad courses.

数字人文、古典学、考古学或图书馆学专业学生(技术背景不一)。本科高年级与研究生课程均可使用。

Language授课语言

All five student-facing files plus this deck are bilingual EN/中文. Toggle once via the top-right pill — the choice is sticky across all six files via localStorage.

五份学生材料 + 本备课幻灯都中英双语。在右上角切换一次,localStorage 会让选择在六份文件中保持一致。

Open these in tabsFive files to teach with.

把这些先开成标签页用以授课的 五件文件

Pre-class prep: open all five in browser tabs so you can flip without delay during the lesson.

课前准备:把五份文件都开成浏览器标签页,上课时随手切换,无须等待加载。

#FileRole in class
1sdam-portal.htmlPortal · the chooser students see first门户页 · 学生最先看到
2sdam-visual-slideshow.html27-slide slideshow · the spine of Block 1+227 张幻灯片 · 第一与第二段的主线
3sdam-reference.htmlLong-scroll reference · used in Block 4长滚动参考 · 第四段使用
4sdam-paper-jdh2021.htmlJDH 2021 paper walkthrough · used in Block 4JDH 2021 论文导读 · 第四段使用
5sdam-case-study-isic000470.htmlISic000470 case study · the whole of Block 3ISic000470 案例研究 · 整个第三段

By minute 100Five things students should be able to do.

100 分钟后学生 应能做到 五件事。

01

Explain ETL (Extract → Transform → Load) in plain language and apply it to inscription data.

用通俗语言解释 ETL(提取→转换→加载),并应用于铭文数据。

02

Name the three core Latin pipelines (EDH_ETL, EDCS_ETL, LI_ETL) and describe how each handles its source.

说出三条核心拉丁文流水线(EDH_ETL、EDCS_ETL、LI_ETL)并描述各自如何处理来源。

03

Distinguish what gets retained from what gets discarded when an inscription becomes data — using ISic000470 as the worked example.

区分铭文数字化时哪些信息被保留、哪些被丢弃,以 ISic000470 为示例。

04

Read a Python extraction loop and an R cleaning regex chain, line by line.

逐行读懂一段 Python 提取循环和一段 R 清洗正则链。

05

Articulate the JDH 2021 paper's methodological argument: macro-history needs documented dataset construction.

阐述 JDH 2021 论文的方法论主张:宏观史需要被记录的"数据建构"。

10 minutes before classPre-class prep checklist.

课前 10 分钟课前清单

  • Open sdam-portal.html once and click each of the four cards to pre-load the editions in tabs.
  • 先打开 sdam-portal.html,点四张卡片把四个版本预加载到标签页。
  • Pick a language: EN or 中文. Toggle once in the top-right of any file; the choice is sticky across all six files via localStorage.
  • 选定授课语言:EN中文。在任一文件右上角切换一次,localStorage 会让六份文件保持一致。
  • Locate the photograph of ISic000470 — it's at sdam_intro/ISic000470 cil x 7296 image.png next to the HTML files. Have it ready to display in the opening hook.
  • 找到 ISic000470 的照片:在 sdam_intro/ISic000470 cil x 7296 image.png,与 HTML 文件同目录。开场钩子时立即展示。
  • (Optional) Open a Jupyter / Colab tab if you plan to let a student volunteer run the no-code data load at minute 90.
  • (可选)若打算在第 90 分钟让学生现场跑 Python 加载数据,先打开一个 Jupyter / Colab 标签页。
  • Keep this deck in a separate window or projector tab so you can flip between teacher notes and student-facing screens.
  • 把本备课幻灯放在独立窗口或投影标签页,便于在教师笔记与学生界面之间切换。

100 minutesThe whole arc, time-budgeted.

一百分钟完整时间表

5Hook
20Block 1 · ETL
20Block 2 · Beyond
5Stretch
25Block 3 · Case
15Block 4 · Code
10Synth
00:00
05:00
25:00
45:00
50:00
75:00
90:00

The case study (Block 3) gets the most time — it's the most concrete and most memorable beat. Give it your most energetic teaching.

案例研究(第三段)分配的时间最多,它最具体、最易记。这段最值得你投入最饱满的精力。

Need 60 / 30 / 150 minutes instead? See compression notes near the end of this deck (slide 35).

如果只有 60 / 30 / 150 分钟?见本幻灯末尾(第 35 页)的压缩与扩展说明。

⏱ 00:00 — 05:00 · 5 MINUTES · OPENING HOOK5 分钟 · 开场钩子

The bilingual stonecutter.

双语 石匠

🪨
ISic000470 · Palermo · 1st–2nd c. CE
"Inscriptions are designed and carved here for sacred temples in connection with public works."
"此处为神圣庙宇与公共工程刻制铭文。"

Show the photograph of the bilingual stonecutter's plaque (full screen). Read the translation aloud. Now ask the class:

展示这块双语石匠铺招牌的照片(全屏)。把译文念出来。然后向全班提问:

"How would you turn this stone into data?"

"你会怎么把这块石头变成数据?"

Hook · the unfoldingWhat you'll hear, and the bridge.

钩子展开你会 听到 什么,以及如何过渡。

SHOW

The full-screen photograph of ISic000470. Photograph file is at ISic000470 cil x 7296 image.png in this same folder.

全屏展示 ISic000470 的照片。图片文件位于本目录下 ISic000470 cil x 7296 image.png

ASK

"How would you turn this stone into data?" — let students answer for ~2 min. Don't direct them; just listen.

"你会怎么把这块石头变成数据?",给学生约 2 分钟自由回答。不要引导,只听。

SAY · GOOD ANSWERS GESTURE AT

Photograph it. Transcribe the text. Record where it's from. Note dimensions. Date it. Classify the type. (All real fields in modern databases.)

拍照、转写文本、记录出土地、记尺寸、定年、分类型,这些都是现代数据库的真实字段。

SAY · BAD ANSWERS

Lump everything into one column. (Tell the class: "Both kinds of answer are useful — one names what to keep, the other warns what's easy to lose.")

"把所有信息塞进一列"。告诉学生:"这两种回答都有用,前者列出该保留的,后者提醒容易丢掉的。"

Bridge to Block 1 (one sentence): "The hardest data work in ancient-world digital humanities is making the abstract good answers concrete and consistent across half a million inscriptions. That's what SDAM does. Today we'll see how — and where they still struggle."

过渡到第一段(一句话):"古代世界数字人文最难的工作,就是把这些抽象的好答案,一致而具体地落在五十万件铭文上。SDAM 做的就是这件事。今天我们看看他们怎么做,以及他们仍然力有不逮的地方。"

⏱ 05:00 — 25:00 · 20 MINUTES · BLOCK 120 分钟 · 第一段

What is ETL?

什么是 ETL

Mode: instructor walks through Visual slides 1–13. Students follow on their own laptops or watch on a projector.

模式:教师讲解视觉版第 1–13 张幻灯。学生用自己的电脑跟随,或集中观看投影。

Block 1 · slide-by-slideVisual slides 1–7: hook, ETL, pipelines, builder, extract.

第一段 · 逐页讲解视觉版 第 1–7 页:开场、ETL、流水线、搭建、提取。

#Topic主题What to do怎么做
1From Stone to Data从石头到数据Just title; arrow-key advance.仅标题;按方向键继续。
2Why this exists缘起Stress the 600,000 figure. Click the dotted-underlined "inscriptions" gloss to demo the popover system.强调"60 万件"。点击带下划线的"inscriptions"演示术语弹窗。
3Three steps · E·T·L三个字母 · E·T·LClick each E / T / L button to reveal the definition. Tell the class: "This whole class is just unpacking these three letters."点击 E、T、L 三个按钮逐一展开定义。告诉学生:"今天整堂课其实就是在拆这三个字母。"
4Three core pipelines三条核心流水线Pause here. Have students click each pipeline card. Establish: EDH and EDCS are different sources; LI merges them.在此暂停。让学生点开每张流水线卡。讲明:EDH 与 EDCS 是不同来源;LI 把两者合并。
5Build the pipeline搭建流水线3-min activity: have a student call out the order. Click the blocks live. Each correct placement reveals a definition card — read each aloud.3 分钟互动:让学生说出顺序,现场点击方块。每个正确位置弹出说明卡,念出来。
6Extract intro提取概览"Extract = make a copy of everything, no changes yet.""提取 = 把所有内容做一份副本,此时不做任何修改。"
7EDH "vending machine"EDH "自动售货机"Click "Make the request." Let the counter tick up to 81,883. Read aloud the info-panel that pops up afterward — it names the actual notebook (1_1_py_EXTRACTION…).点击"发起请求",让计数跳到 81,883。然后念出弹出的说明面板,它给出了真实笔记本名 (1_1_py_EXTRACTION…)。

Block 1 · slide-by-slideVisual slides 8–13: scraper, transform, merge, load, journey.

第一段 · 逐页讲解视觉版 第 8–13 页:抓取、转换、合并、加载、旅程。

#Topic主题What to do怎么做
8EDCS scraperEDCS 抓取Click "Start the scraper." Watch the 18 provinces light up. Note the time difference: API 12 min vs scraper 4–5 hours.点击"启动抓取"。看 18 个行省依次亮起。强调时间差:API 12 分钟 vs 抓取 4–5 小时
9Transform intro转换概览"Every cleaning rule is a scholarly choice — and SDAM keeps the original alongside the cleaned version.""每条清洗规则都是一个学术抉择,SDAM 在保留清洗版的同时也保留原始版。"
10Cleaning chips清洗标签Click each chip in turn. The dirty value crosses out, the clean value appears in green. Each chip's info-panel names the actual SDAM notebook.依次点击每个标签。原值划掉,清洗后绿色显示。说明面板给出对应的 SDAM 笔记本名。
11Merge duplicates合并重复Click "Merge the duplicates" — two records collapse into one. Info-panel describes ML harmonization of inscription-type taxonomies.点击"合并重复",两条记录折叠为一条。说明面板介绍机器学习如何统一类型分类。
12Load · two destinations加载 · 两个目的地Two cards: sciencedata.dk (working drive) vs Zenodo (permanent DOI archive). Click each for the popover.两张卡片:sciencedata.dk(工作存储)vs Zenodo(带 DOI 的永久存档)。点击每一张看弹窗。
13Full journey完整旅程Click "Play." A single inscription travels left → right through Extract / Transform / Load. Good moment to recap Block 1.点击"播放"。一条铭文从左到右穿越提取 / 转换 / 加载。好时机做 第一段总结
RECAP · ASK

"Can someone reproduce E-T-L in their own words?" — listen for: extract = grab, transform = clean, load = publish.

"谁能用自己的话复述 E-T-L?",期望听到:提取 = 抓取,转换 = 清洗,加载 = 发布。

⏱ 25:00 — 45:00 · 20 MINUTES · BLOCK 220 分钟 · 第二段

Beyond ETL: uncertainty, ecosystem, paper.

超越 ETL:不确定性、生态、论文。

Continuing in the Visual edition · slides 14–19. Each slide unpacks a "next move beyond the basics."

继续在视觉版 · 第 14–19 张幻灯。每页展开"在 ETL 之外"的进一步内容。

Block 2 · slide-by-slideVisual slides 14–19.

第二段 · 逐页讲解视觉版 第 14–19 页

#Topic主题Talking point讲解要点
14tempun · datingtempun · 定年Click toggle: "Midpoint" → tall fake spike. "Monte Carlo" → smooth curve. Ask: "Which is more honest? Which is easier to make a graph from?" Click the gloss on tempun and Monte Carlo.点击切换:"中点定年"→ 一根虚高峰。"蒙特卡洛"→ 平滑曲线。提问:"哪种更诚实?哪种更容易画图?"点击 tempun蒙特卡洛 的术语弹窗。
15Six corpora六个语料库Click each card. Make the point: "ETL works for Greek inscriptions, Greek texts, and Bulgarian burial mounds — not just Latin."点击每张卡片。强调:"ETL 不仅可用于拉丁文,还可用于希腊文铭文、希腊文献、保加利亚坟丘。"
16Analysis ecosystem分析生态Each card links to a real SDAM repo on GitHub. Click 1–2 (e.g. epigraphic_roads, social_diversity). Stress: "These are the research questions that the pipelines exist to support."每张卡片都链到 GitHub 上真实的 SDAM 仓库。点开 1–2 个(如 epigraphic_roadssocial_diversity)。强调:"这些是流水线服务于的 研究问题。"
17sddk + sdam packagessddk + sdam 包Click each card. Notice: sddk is Python (PyPI), sdam is R (CRAN). Tee up Block 4: "We'll see the actual code soon."点击每张卡。注意:sddk 是 Python (PyPI),sdam 是 R (CRAN)。为第四段铺垫:"稍后我们会看真实代码。"
18JDH paper · 3 layersJDH 论文 · 三层Click each layer card (Narrative / Hermeneutic / Data). Stress: "The journal itself is part of the argument."点击每张三层卡(叙事 / 诠释 / 数据)。强调:"这份期刊 本身 就是论文论点的一部分。"
19Seven figures gallery七图廊Click any tile — links to the Paper Edition's deep dive. Don't enter the Paper Edition yet; just show the link works.点击任意一格,链接到论文版的深度讲解。但暂不进入论文版,只演示链接可用。
PIVOT TO BLOCK 3 · ASK

"Notice we've gone from one stone, to 600,000 inscriptions, to a published paper. What got lost between the stone and the chart?" Hold the answer — Block 3 is about exactly that.

"注意,我们从一块石头,到 60 万件铭文,到一篇发表论文。从石头到图表的过程中丢了什么?"先不答,第三段就讲这件事。

⏱ 45:00 — 50:00 · 5 MINUTES · STRETCH5 分钟 · 休息

Stretch break.

休息 一下。

Five minutes off-screen. Encourage water, walking, talking.

五分钟离开屏幕。喝水、走动、聊天。

Suggestion: invite students to open the walkthrough portal on their phones to see the responsive layout — it works.

建议:让学生在手机上打开门户页,看看响应式布局,是能用的。

⏱ 50:00 — 75:00 · 25 MINUTES · BLOCK 325 分钟 · 第三段

One stone, five databases.

一块石头,五个数据库

Open sdam-case-study-isic000470.html. Walk the section sidebar top-to-bottom. This is the most concrete, most memorable block of the class — give it the most energy.

打开 sdam-case-study-isic000470.html。沿侧栏从上到下讲解。这是整堂课最具体、最易记忆的一段:投入最饱满的精力。

Case study · § 1The text · 5 minutes.

案例 · § 1文本 · 5 分钟

SHOW

The stylized stele rendering with the Greek and Latin columns side by side. Open §1 ↗

展示双栏石碑的程式化渲染(希腊文+拉丁文)。打开 §1 ↗

DO

Have students hover over a Greek line — the parallel Latin lights up. Read the translation aloud.

让学生把鼠标悬停到希腊文某行,对应拉丁文会高亮。把译文念出来。

SAY

Note the archaic Latin spellings: heic (later hic), aidibus sacreis (later aedibus sacris), qum (later cum). The archaic forms are part of the dating evidence.

注意拉丁文的古拼写:heic(后期 hic)、aidibus sacreis(后期 aedibus sacris)、qum(后期 cum)。这些古形 是定年证据的一部分

ASK

"What is this stone for? What's the social context?" — let a student read the answer from the rendered text.

"这块石头有什么用?社会背景是什么?",让学生从展示文本中读出答案。

Case study · § 1.5 · the centerpieceFrom paper to digital · 10 minutes.

案例 · § 1.5 · 全堂高峰从纸到数字 · 10 分钟

Two side-by-side SVG facsimiles: CIL X 7296 (Mommsen 1883) and IG XIV 297 (Kaibel 1890). This is the centerpiece of the class. Take it slow.

两份并置的 SVG 摹本:CIL X 7296(Mommsen 1883)IG XIV 297(Kaibel 1890)。这是全课最关键的一段。慢慢讲。

  1. What both editors recorded — interpretive prose argument, provenance history ("formerly with the Jesuits"), per-editor judgment of the stonecutter (Mommsen: infantia; Kaibel: nec Graecus opinor nec Romanus).
  2. 两位编者都记录了:论证性散文、流转史("曾在耶稣会")、对石匠的判断(Mommsen:infantia;Kaibel:nec Graecus opinor nec Romanus)。
  3. What they silently changed — the orange marker on line 6 shows QVM → CVM. The actual stone says QVM. Only I.Sicily's <choice> markup encodes both forms.
  4. 他们悄悄改动的:第 6 行的橙色标记显示 QVM → CVM。原石上写的是 QVM。只有 I.Sicily 的 <choice> 同时编码两种形式。
  5. What Kaibel added that the digital era kept — the dashed-bordered line of normalized Greek (Στῆλαι ἐνθάδε…) is Kaibel's invention. It is the canonical PHI text today, four iterations away from the stone.
  6. Kaibel 添加而数字时代继承的:那行虚线框内的规范希腊文(Στῆλαι ἐνθάδε…)是 Kaibel 的发明。今天 PHI 把它当"原文"用,距离石头已有四次转录。
SAY · CLOSING LINE

Read aloud: "Stones lose nothing visually but require a viewer; print editions lose visual evidence but encode argument; digital databases lose argument but enable scale."

念出来:"石头不丢任何视觉信息,但需要观者;印本丢掉视觉证据,但保留论证;数字数据库丢掉论证,但获得规模。"

Case study · §§ 2 + 3Five views + comparison · 8 min.

案例 · §§ 2 + 3五库视图 + 对比 · 8 分钟

§ 2 · 5 MIN · Five database views五个数据库视图

Click through the five database tiles. Each is a real outbound link to the actual record. Note the EDH absence — this single inscription, recorded in five other databases, is not in EDH at all. Therefore not in the JDH paper's most rigorous filtered subset (EDCSx).

点击五个数据库卡片。每张都是通往真实记录的外链。注意 EDH 缺席,这条铭文出现在另外五个库中,但完全不在 EDH 里。因此也不在论文最严谨的 EDCSx 子集中。

§ 3 · 3 MIN · Comparison table对比表

Scroll. Highlight orange-tinted cells (the conflicts): inventory number (3574 vs 8822), width (24.5 vs 14.5 cm), date (four ranges), Latin line 12 (qum vs cum). Don't deep-dive each row — just establish the disagreement is everywhere.

滚动。指出橙色单元(冲突):馆藏号(3574 vs 8822)、宽度(24.5 vs 14.5 cm)、定年(四种)、拉丁文第 12 行(qum vs cum)。不要逐行深入:让学生看到分歧"处处皆是"即可。

Case study · § 4The seven issues · 7 min, ~1 each.

案例 · § 4七项问题 · 7 分钟(每项约 1 分钟)。

Skim, don't deep-dive. Stop on Issue 6 (text↔image anchoring) — it has the only interactive demo.

略览,不深入。在问题 6(文图锚定)停下,仅这一项有交互演示。

  1. 1 · ID proliferation — eleven names for the same stone.
  2. 1 · 编号泛滥:同一块石头有十一个名字。
  3. 2 · Inventory mismatch — 3574 ≠ 8822 (same museum).
  4. 2 · 馆藏号冲突:3574 ≠ 8822(同一博物馆)。
  5. 3 · Date disagreement — four databases, four ranges.
  6. 3 · 定年分歧:四个库给四种范围。
  7. 4 · Text variants — same DB transcribes "qum" vs "cum."
  8. 4 · 文本异读:同一库转写为 qum vs cum。
  9. 5 · Image asymmetry — tiled TIF (I.Sicily) to nothing (PHI).
  10. 5 · 图像不对称:从分块 TIF(I.Sicily)到完全无图(PHI)。
  11. 6 · Text↔image anchoringstop here, run the SVG zone demo.
  12. 6 · 文图锚定在此停下,演示 SVG 分区交互
  13. 7 · The EDH gap — this stone falls outside the rigorous subset.
  14. 7 · EDH 缺席:此石不在论文的严谨子集内。
DISCUSSION (3 MIN)

"If you were building a corpus right now, which of these seven would worry you the least? The most? Why?"

"如果你现在要建一个语料库,这七项里哪一项最让你担心?哪一项最不担心?为什么?"

Case study · § 5The merge simulator · 3 min.

案例 · § 5合并模拟 · 3 分钟。

DO

Click "Run the merge". Six conflict lines cascade in over ~4 seconds. Read each as it appears.

点击 "运行合并"。六条冲突信息在约 4 秒内依次出现。逐条念出。

SAY · CLOSING LINE

"This inscription cannot be merged into a single flat row without losing scholarly content from four of the five records."

"这条铭文无法被压扁为一行,否则就要丢掉另外四条记录中的学术内容。"

ASK · TO BRIDGE BLOCK 4

"What should the editor's interpretive judgment look like in a digital edition — structured enough to query, but not flattened to a label?"

"在数字版中,编者的诠释性判断应当长什么样,既要结构化可查询,又不能被压扁成单一标签?"

⏱ 75:00 — 90:00 · 15 MINUTES · BLOCK 415 分钟 · 第四段

Code & method.

代码与 方法

Open sdam-reference.html § 4.5. Then close on the JDH Paper Edition's three Stance cards.

打开 sdam-reference.html §4.5。最后回到论文版的三张"立场"卡。

Block 4 · Jupyter + RWhat are these, in plain language? · 6 min.

第四段 · Jupyter + R用大白话说,这俩是啥? · 6 分钟。

JUPYTER · 3 MINJUPYTER · 3 分钟

Read the four properties out loud:

把四条属性念出来:

  1. The narrative is part of the artifact.
  2. 叙述是产物的一部分。
  3. Outputs are first-class.
  4. 输出是一等对象。
  5. Notebooks are diffable (file is JSON).
  6. 笔记本可做差异比对(文件是 JSON)。
  7. Naming convention: 1_0_py_… / 1_4_r_….
  8. 命名规则:1_0_py_… / 1_4_r_…
R · 3 MINR · 3 分钟

One sentence: "A language built for statistics and data wrangling since 1993." Show the chained |> regex calls in 1_5_r_TEXT_INSCRIPTION_CLEANING.Rmd. Each call is a scholarly decision.

一句话:"1993 年起为统计与数据整理而生的语言。"展示 1_5_r_TEXT_INSCRIPTION_CLEANING.Rmd 中链式 |> 正则调用。每次调用都是 一次学术抉择

SHOW

Click the link to the actual EDH 1_1 notebook on GitHub — show what one looks like in the wild.

点击进入 EDH 1_1 真实 GitHub 笔记本,让学生看真实场景中的样貌。

Block 4 · reading codeWalking the extraction loop · 5 min.

第四段 · 读代码逐行讲 提取循环 · 5 分钟。

In Reference §4.5 there's a real ~25-line Python loop. Don't read every character — just point at three things:

参考版 §4.5 中有一段约 25 行的真实 Python 循环。不必逐字念,只点三处:

  1. The while True loop — this is the entire ETL "Extract" stage. It runs ~410 times for EDH (81,883 records ÷ 200 per page).
  2. while True 循环:它就是整个 ETL "提取"阶段。对 EDH 而言约执行 410 次(81,883 条 ÷ 每页 200)。
  3. time.sleep(0.2) — a courtesy. Pauses 200 ms between calls so the EDH server doesn't see this as a denial-of-service attack.
  4. time.sleep(0.2):一种礼让。每次调用之间暂停 200 毫秒,避免 EDH 服务器把抓取误判为拒绝服务攻击。
  5. ensure_ascii=False, indent=2 — what makes Greek characters readable in the JSON output. Without it, στῆλαι becomes στῆλαι.
  6. ensure_ascii=False, indent=2:让希腊字符在 JSON 输出中可读。否则 στῆλαι 会变成 στῆλαι

Block 4 · cleaning regexThe qum vs cum decision, in code · 3 min.

第四段 · 清洗正则代码层面的 qum vs cum 抉择 · 3 分钟。

Show the regex example: input D(is) [M(anibus)] / Iuliae [- - -] / vix(it) ann(os) XX.

展示正则例子:输入 D(is) [M(anibus)] / Iuliae [- - -] / vix(it) ann(os) XX

INTERPRETIVE

"Dis Manibus Iuliae vixit annos XX"

"献给 Iulia 的诸神之灵,她活了 20 岁"

Restorations preserved.

保留修复部分。

CONSERVATIVE

"Dis Iuliae vixit annos XX"

"献给 Iulia 的神(缺)……活了 20 岁"

"Manibus" was a restoration → dropped.

"Manibus" 是修复 → 丢弃。

KEY INSIGHT

The same regex with one character of difference ([^\]]* vs nothing inside the brackets) flips between the two readings. SDAM ships both columns. EDCS ships neither — only its own one transcription, with no record of which form it chose.

同一个正则只差一个字符(方括号内是 [^\]]* 还是空)就在两种读法之间切换。SDAM 同时发布两列;EDCS 都不发布,只给一种转写,且不说明选了哪种。

Block 4 · closeThe methodological argument, in three stances · 1 min.

第四段 · 收尾方法论的 三个立场 · 1 分钟。

Switch to the Paper Edition § 7. Read the three Stance cards aloud:

切换到论文版 §7。把三张"立场"卡逐张念出:

STANCE 1

Reproducibility

可复现

Every figure must be re-derivable from the same code and data.

每张图都必须能由同一份代码与数据重新生成。

STANCE 2

Honest uncertainty

诚实的不确定性

Dating uncertainty must be propagated, not hidden.

日期不确定性必须传播,不能掩盖。

STANCE 3

Bias is content

偏差也是内容

Disagreement between EDH and EDCS is not noise to be averaged out — it's scholarly evidence.

EDH 与 EDCS 之间的分歧不是要平均掉的噪声,而是学术证据。

⏱ 90:00 — 95:00 · 5 MINUTES · HANDS-ON5 分钟 · 动手

Two lines of Python · 81,883 inscriptions in memory.

两行 Python · 把 81,883 条铭文加载到内存。

Have a student volunteer paste two lines into a notebook (or a Python REPL on the projector):

让一位志愿者学生把两行代码粘到笔记本(或投影上的 Python REPL):

import pandas as pd
EDH = pd.read_json(
    "https://sciencedata.dk/public/b6b6afdb969d378b70929e86e58ad975/EDH_text_cleaned_2022_11_03.json"
)
print(EDH.shape)

If it works, you'll see (81883, ~70) — 81,883 inscriptions × ~70 columns. The dataset that drives the JDH paper is now in memory on a student's laptop. Under fluorescent classroom light, this is a small dramatic moment. Pause for it.

如果成功,会看到 (81883, ~70),81,883 条铭文 × 约 70 列。驱动 JDH 论文的数据集,现在就在学生的笔记本电脑里。在教室灯光下,这是个小小的戏剧时刻。停一下,让它发生。

If the network is down, show a screenshot — but the URL is genuinely public and stable.

如果网络故障,展示截图,但这个 URL 是真实公开且稳定的。

95:00 — 100:00Closing question · pick one.

95:00 — 100:00结课 提问 · 三选一。

Pose one of the following and let students answer briefly. Aim for ~5 min. Don't moderate too tightly — let the conversation thread.

从下列三题中选一题,让学生简短回答。目标约 5 分钟。不要管得太紧,让讨论自然延展。

QUESTION 1

"What's a research question you'd ask of this dataset that would have been impossible without ETL?"

"你会向这个数据集提一个什么样的研究问题,是没有 ETL 就根本问不出来的?"

QUESTION 2

"If you could add one field to the SDAM data model that no current database has, what would it be?"

"如果让你给 SDAM 数据模型新增一个字段(任何现有数据库都没有的),你会加什么?"

QUESTION 3

"Where should the editor's interpretive judgment live in a digital edition — and how can it be made structured enough to query without losing its argumentative character?"

"在数字版中,编者的诠释性判断应放在哪里,怎么做才能既结构化可查询,又不丢掉它的论证特质?"

After classTake-home assignments · pick one.

课后课后 作业 · 三选一。

Option A

Walk a Greek inscription through the same five databases

把一条希腊文铭文带过同样的五个数据库

Pick a Greek inscription in I.Sicily; trace it across the database layer. Write a one-page note on which of the seven issues recur and which don't.

在 I.Sicily 选一条希腊文铭文,追踪它在各数据库中的样貌。写一页笔记:七项问题中哪几项重现、哪几项不重现。

Option B

Reproduce JDH Fig 1 with two different dating methods

用两种不同定年方法复现 JDH 图 1

Run the JDH companion notebook end-to-end. Reproduce the epigraphic-habit curve using midpoint dating, then re-run with tempun's Monte Carlo. Compare the two.

把 JDH 配套笔记本完整跑一遍。先用中点定年复现"铭文习俗"曲线,再用 tempun 蒙特卡洛跑一遍,对比两条曲线。

Option C

Critique the SDAM data model

批评 SDAM 的数据模型

Which of the case study's seven issues do you think the model could close, and how? Which are irreducibly about which questions get asked rather than which fields exist?

案例的七项问题中,你认为哪些可以被数据模型关闭,怎么做?哪些其实关乎 提什么问题,而非 有什么字段

Other lengthsIf you have less or more time.

其他时长若你只有 更少或更多时间

60
MINUTES分钟

Drop Block 4 entirely. Shorten Block 3 to 15 min by skipping § 4's seven-issue walkthrough. Skip the stretch.

整段去掉第四段。第三段缩到 15 分钟,跳过 §4 七项问题的逐项讲解。取消休息。

30
MINUTES分钟

Hook + Visual slides 1–13 only. The case study and the paper become take-home material. Recap the JDH paper's argument in one sentence at the end.

仅做开场钩子 + 视觉版第 1–13 页。案例研究与论文留作课后材料。结尾用一句话概括 JDH 论文的论点。

150
MINUTES分钟

Add a hands-on lab where every student loads the EDH JSON in Python or R and produces a single chart of inscription types over time.

增加一段动手实验:让每个学生用 Python 或 R 加载 EDH JSON,画一张"铭文类型随时间变化"的图。

Bookmark thisVisual edition · slide-ID quick reference.

建议收藏视觉版 幻灯锚点速查

Each slide has a stable hash anchor; you can deep-link from a syllabus, browser tab, or quiz question.

每张幻灯都有稳定的哈希锚点;可在教学大纲、浏览器标签、测验题中深链。

#slide-titleTitle page标题
#slide-whyWhy ETL?缘起
#slide-etlE·T·L explainedE·T·L 释义
#slide-pipelinesThree core pipelines三流水线
#slide-builderClick-to-build搭建互动
#slide-extractExtract intro提取概览
#slide-extract-edhEDH vending machineEDH 售货机
#slide-extract-edcsEDCS scraperEDCS 抓取
#slide-transformTransform intro转换概览
#slide-cleanCleaning interactive清洗互动
#slide-mergeDedup / harmonize去重统一
#slide-loadLoad destinations加载目的地
#slide-flowFull journey完整旅程
#slide-tempunProbabilistic dating概率定年
#slide-corporaGreek + mounds希腊+坟丘
#slide-analysisAnalysis ecosystem分析生态
#slide-packagessddk + sdam辅助包
#slide-articleJDH 3-layer formatJDH 三层式
#slide-figuresSeven figures gallery七图画廊
#slide-onestoneCase study intro案例引入
#slide-eleven-ids11-IDs problem十一编号
#slide-print-digitalCIL/IG print → digitalCIL/IG 纸到数字
#slide-jupyter-rJupyter & R explainerJupyter 与 R
#slide-anchorText↔image SVG文图锚定
#slide-collisionMerge collision sim合并冲突
#slide-tryNo-code data load免代码加载
#slide-recapWrap & resources总结与资源

After-class resourcesWhere to dig deeper.

课后资源进一步 深入

If you want to…如果你想……Open …打开 …
browse all 37 SDAM repos by themeReference §2 — sortable interactive table按主题浏览全部 37 个 SDAM 仓库参考版 §2 — 可排序交互表
understand the JDH paper section by sectionPaper Edition逐节理解 JDH 论文论文版
see the actual TEI XML behind I.Sicilygithub.com/ISicily/ISicily看 I.Sicily 背后的 TEI XMLgithub.com/ISicily/ISicily
read about epigraphic habit in the originalMacMullen 1982 (JSTOR)原典读"铭文习俗"MacMullen 1982 (JSTOR)
play with tempun Monte Carlo datinggithub.com/sdam-au/tempun_demo把玩 tempun 蒙特卡洛定年github.com/sdam-au/tempun_demo
see the JDH article itselfdoi.org/10.1515/jdh-2021-1004原文阅读 JDH 文章doi.org/10.1515/jdh-2021-1004
⏱ 100:00 · CLASS COMPLETE完成

Now go teach.

现在去 上课 吧。

Open the four student-facing files in tabs. Pick a language. Start the timer. Welcome the class. Begin with the photograph.

把四份学生材料打开为标签页。选定语言。计时开始。欢迎学生进教室。从照片开始。

100
minutes · five files · one stone分钟 · 五件文件 · 一块石头

Companion markdown: sdam_teaching_guide_100min.md in this folder.

配套 Markdown:本目录下 sdam_teaching_guide_100min.md