Prosodic is a Python library and web app for metrical-phonological analysis of poetry. It parses text into a linguistic hierarchy (text → stanza → line → word → syllable → phoneme), runs a constraint-satisfaction metrical parser, and identifies stress patterns (iambic, trochaic, anapestic, dactylic), foot/syllable schemes, and named rhyme schemes (sonnet variants, couplet, ballad, etc.).
Try the hosted version at prosodic.app — paste a poem, see scansions, rhyme schemes, and form classification immediately. This notebook walks through the full Python API — from parsing a single line up to poem-level form classification. Click the Open in Colab badge above to run it in your browser.
Built by Ryan Heuser, Josh Falk, and Arto Anttila, with contributions from Sam Bowman.
pip install prosodic
# or for development:
pip install git+https://github.com/quadrismegistus/prosodicYou'll also need espeak (free TTS) to phonemize words not in the CMU dictionary:
- Mac:
brew install espeak - Linux:
apt-get install espeak libespeak1 libespeak-dev - Windows: download from the espeak-ng releases
Skip this cell when running locally. It installs system + Python deps in a Colab runtime.
# Auto-install dependencies if running in Google Colab.
# Locally this is a no-op.
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
import subprocess
subprocess.run(
["apt-get", "-qq", "install", "-y",
"espeak", "libespeak1", "libespeak-dev"],
check=True,
)
subprocess.run(["pip", "install", "-q", "prosodic"], check=True)
print("Colab setup complete.")
else:
print("Local environment — skipping Colab setup.")Local environment — skipping Colab setup.
A complete tour of Prosodic in five lines.
import prosodic
sonnet = prosodic.Text("""When in the chronicle of wasted time
I see descriptions of the fairest wights,
And beauty making beautiful old rhyme
In praise of ladies dead and lovely knights,
Then, in the blazon of sweet beauty's best,
Of hand, of foot, of lip, of eye, of brow,
I see their antique pen would have express'd
Even such a beauty as you master now.
So all their praises are but prophecies
Of this our time, all you prefiguring;
And, for they look'd but with divining eyes,
They had not skill enough your worth to sing:
For we, which now behold these present days,
Had eyes to wonder, but lack tongues to praise.""")
sonnet.parse()
print(sonnet.summary()) #st #ln parse rhyme #feet #syll #parse
----- ----- ----------- ------- ------- ------- --------
1 1 -+-+-+-+-+ a 5 10 2
1 2 -+-+-+-+-+ b 5 10 1
1 3 -+-+-+-+-+ a 5 10 3
1 4 -+-+-+-+-+ b 5 10 1
1 5 -+-+-+-+-+ - 5 10 8
1 6 -+-+-+-+-+ c 5 10 1
1 7 -+--++-+-+ - 4 10 8
1 8 +-+-+-+-+-+ c 6 11 2
1 9 -+-+-+-+-- - 4 10 3
1 10 -+-+-+-+-- d 4 10 6
1 11 -+-+-+-+-+ e 5 10 2
1 12 -+-+-+-+-+ d 5 10 2
1 13 -+-+-+-+-+ e 5 10 2
1 14 -+-+-+-+-+ e 5 10 3
estimated schema
----------
meter: Iambic
feet: Pentameter
syllables: 10
rhyme: Sonnet A (abab cdcd eefeff)
You can build a Text from a string, a file, or just a single line.
# from a string
short = prosodic.Text("A horse, a horse, my kingdom for a horse!")
# from a file (local path or URL)
shaksonnets = prosodic.Text(fn='https://raw.githubusercontent.com/quadrismegistus/prosodic/refs/heads/master/corpora/corppoetry_en/en.shakespeare.txt')
# a single line via .line1
line = prosodic.Text("Shall I compare thee to a summer's day?").line1
print(f"short: {len(short.lines)} line(s)")
print(f"sonnets: {len(shaksonnets.lines):,} lines, {len(shaksonnets.stanzas):,} stanzas")
print(f"single line: {line}")short: 1 line(s)
�[32m[1.93s] Building long text�[0m: 0%| | 0/20307 [00:00<?, ?it/s]
�[32m[1.93s] Building long text�[0m: 16%|█▌ | 3167/20307 [00:00<00:00, 24475.44it/s]
�[32m[1.93s] Building long text�[0m: 35%|███▍ | 7015/20307 [00:00<00:00, 31822.63it/s]
�[32m[1.93s] Building long text�[0m: 51%|█████ | 10287/20307 [00:00<00:00, 28739.40it/s]
�[32m[1.93s] Building long text�[0m: 70%|██████▉ | 14152/20307 [00:00<00:00, 32236.26it/s]
�[32m[1.93s] Building long text�[0m: 86%|████████▌ | 17459/20307 [00:00<00:00, 28657.85it/s]
sonnets: 2,155 lines, 154 stanzas
single line: Line(num=1, txt="Shall I compare thee to a summer's day?")
Prosodic organizes text into a tree of linguistic entities. Children are constructed lazily on first access — the underlying source of truth is a per-syllable DataFrame.
# tree access
print(f"sonnet has {len(sonnet.stanzas)} stanzas, {len(sonnet.lines)} lines")
print(f"line 1 has {len(sonnet.lines[0].wordtokens)} word tokens")
print(f"first word: {sonnet.lines[0].wordtokens[0]}")sonnet has 1 stanzas, 14 lines
line 1 has 7 word tokens
first word: WordToken(num=1, txt='When', lang='en', para_num=1, line_num=1, sent_num=1, sentpart_num=1, linepart_num=1)
# attribute shortcut: text.line1 == text.lines[0]
sonnet.line1Line
| wordtoken_is_punc | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| stanza_num | line_num | linepart_num | sent_num | sentpart_num | wordtoken_num | wordtoken_txt | wordtype_txt | wordform_num | wordform_ipa_origin | syll_num | syll_txt | syll_ipa | |
| 1 | 1 | 1 | 1 | 1 | 1 | When | When | 1 | dict | 1 | When | wɛn | 0 |
| 2 | dict | 1 | When | 'wɛn | 0 | ||||||||
| 2 | in | in | 1 | dict | 1 | in | ɪn | 0 | |||||
| 2 | dict | 1 | in | 'ɪn | 0 | ||||||||
| 3 | the | the | 1 | dict | 1 | the | ðə | 0 | |||||
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |||||
| 4 | chronicle | chronicle | 1 | dict | 3 | cle | kəl | 0 | |||||
| 5 | of | of | 1 | dict | 1 | of | ʌv | 0 | |||||
| 6 | wasted | wasted | 1 | dict | 1 | was | 'weɪ | 0 | |||||
| 2 | ted | stəd | 0 | ||||||||||
| 7 | time | time | 1 | dict | 1 | time | 'taɪm | 0 |
12 rows × 1 columns
# wordform → syllable → phoneme
wordform = sonnet.line1.wordtokens[1].wordform
print(f"wordform: {wordform}")
for syll in wordform.syllables:
print(f" syllable: {syll}, IPA={syll.ipa!r}, stressed={syll.is_stressed}, heavy={syll.is_heavy}")
for phon in syll.phonemes:
print(f" phon: {phon.txt!r}")wordform: WordForm(num=1, txt='in', force_ambig_stress=True, ipa_origin='dict')
syllable: Syllable(num=1, txt='in', ipa='ɪn'), IPA='ɪn', stressed=False, heavy=True
phon: 'ɪ'
phon: 'n'
The whole text is also accessible as a flat per-syllable DataFrame. This is the source of truth — entities are constructed from it on demand.
# .df is the syllable-level DataFrame
sonnet.df.head(8).dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| word_num | line_num | para_num | sent_num | sentpart_num | linepart_num | word_txt | is_punc | form_idx | num_forms | syll_idx | syll_ipa | syll_text | is_stressed | is_heavy | is_strong | is_weak | is_functionword | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | When | 0 | 0 | 2 | 0 | wɛn | When | False | True | False | False | True |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | When | 0 | 1 | 2 | 0 | 'wɛn | When | True | True | False | False | False |
| 2 | 2 | 1 | 1 | 1 | 1 | 1 | in | 0 | 0 | 2 | 0 | ɪn | in | False | True | False | False | True |
| 3 | 2 | 1 | 1 | 1 | 1 | 1 | in | 0 | 1 | 2 | 0 | 'ɪn | in | True | True | False | False | False |
| 4 | 3 | 1 | 1 | 1 | 1 | 1 | the | 0 | 0 | 1 | 0 | ðə | the | False | False | False | False | True |
| 5 | 4 | 1 | 1 | 1 | 1 | 1 | chronicle | 0 | 0 | 1 | 0 | 'krɑ | chro | True | False | True | False | False |
| 6 | 4 | 1 | 1 | 1 | 1 | 1 | chronicle | 0 | 0 | 1 | 1 | nɪ | ni | False | False | False | True | False |
| 7 | 4 | 1 | 1 | 1 | 1 | 1 | chronicle | 0 | 0 | 1 | 2 | kəl | cle | False | True | False | False | False |
# columns
list(sonnet.df.columns)['word_num',
'line_num',
'para_num',
'sent_num',
'sentpart_num',
'linepart_num',
'word_txt',
'is_punc',
'form_idx',
'num_forms',
'syll_idx',
'syll_ipa',
'syll_text',
'is_stressed',
'is_heavy',
'is_strong',
'is_weak',
'is_functionword']
text.parse() runs an exhaustive vectorized parser: it evaluates every possible scansion against a configurable set of metrical constraints (numpy on CPU, torch on GPU when available), then uses harmonic bounding to identify optimal parses. Constraints include w_peak (no peak in weak position), w_stress (no stress in weak), s_unstress (no unstress in strong), unres_within/unres_across (no unresolved disyllables), foot_size. See prosodic/parsing/constraints.py for the full list.
# parse a single line
line = prosodic.Text("Shall I compare thee to a summer's day?").line1
line.parse()
print(line.best_parse)Parse(txt="shall I com PARE thee TO a SUM mer's DAY")
# inspect the parse
bp = line.best_parse
print(f"meter: {bp.meter_str} (- = weak, + = strong)")
print(f"stress: {bp.stress_str} (- = unstressed, + = stressed)")
print(f"score: {bp.score} (sum of weighted constraint violations)")
print(f"feet: {bp.feet}")
print(f"foot_type: {bp.foot_type} (per-parse classification)")
print(f"is_rising: {bp.is_rising}")meter: -+-+-+-+-+ (- = weak, + = strong)
stress: ---+---+-+ (- = unstressed, + = stressed)
score: 2.0 (sum of weighted constraint violations)
feet: ['ws', 'ws', 'ws', 'ws', 'ws']
foot_type: iambic (per-parse classification)
is_rising: True
# all unbounded parses for the line, sorted by score
for p in line.parses.unbounded:
print(f"{p.meter_str} score={p.score}")-+-+-+-+-+ score=2.0
# parse the full sonnet
sonnet.parse()
for line in sonnet.lines[:6]:
bp = line.best_parse
print(f"L{line.num:2d} {bp.meter_str} score={bp.score:.1f} ambig={len(line.parses.unbounded)}")L 1 -+-+-+-+-+ score=2.0 ambig=2
L 2 -+-+-+-+-+ score=1.0 ambig=1
L 3 -+-+-+-+-+ score=2.0 ambig=3
L 4 -+-+-+-+-+ score=0.0 ambig=1
L 5 -+-+-+-+-+ score=3.0 ambig=8
L 6 -+-+-+-+-+ score=0.0 ambig=1
Per-syllable parse results across the whole text — useful for analysis, plotting, or export.
sonnet.parsed_df.head(10).dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| line_num | word_num | form_idx | syll_idx | line_syll_idx | parse_idx | parse_rank | parse_score | is_best | is_bounded | ... | pos_size | meter_val | syll_txt | syll_ipa | is_stressed | *w_peak | *w_stress | *s_unstress | *unres_across | *unres_within | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 2.0 | True | False | ... | 1 | w | When | wɛn | False | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 2 | 0 | 0 | 1 | 1 | 1 | 2.0 | True | False | ... | 1 | s | in | ɪn | False | 0 | 0 | 1 | 0 | 0 |
| 2 | 1 | 3 | 0 | 0 | 2 | 1 | 1 | 2.0 | True | False | ... | 1 | w | the | ðə | False | 0 | 0 | 0 | 0 | 0 |
| 3 | 1 | 4 | 0 | 0 | 3 | 1 | 1 | 2.0 | True | False | ... | 1 | s | chro | 'krɑ | True | 0 | 0 | 0 | 0 | 0 |
| 4 | 1 | 4 | 0 | 1 | 4 | 1 | 1 | 2.0 | True | False | ... | 1 | w | ni | nɪ | False | 0 | 0 | 0 | 0 | 0 |
| 5 | 1 | 4 | 0 | 2 | 5 | 1 | 1 | 2.0 | True | False | ... | 1 | s | cle | kəl | False | 0 | 0 | 1 | 0 | 0 |
| 6 | 1 | 5 | 0 | 0 | 6 | 1 | 1 | 2.0 | True | False | ... | 1 | w | of | ʌv | False | 0 | 0 | 0 | 0 | 0 |
| 7 | 1 | 6 | 0 | 0 | 7 | 1 | 1 | 2.0 | True | False | ... | 1 | s | was | 'weɪ | True | 0 | 0 | 0 | 0 | 0 |
| 8 | 1 | 6 | 0 | 1 | 8 | 1 | 1 | 2.0 | True | False | ... | 1 | w | ted | stəd | False | 0 | 0 | 0 | 0 | 0 |
| 9 | 1 | 7 | 0 | 0 | 9 | 1 | 1 | 2.0 | True | False | ... | 1 | s | time | 'taɪm | True | 0 | 0 | 0 | 0 | 0 |
10 rows × 21 columns
# every column you might want for analysis
list(sonnet.parsed_df.columns)['line_num',
'word_num',
'form_idx',
'syll_idx',
'line_syll_idx',
'parse_idx',
'parse_rank',
'parse_score',
'is_best',
'is_bounded',
'pos_idx',
'pos_size',
'meter_val',
'syll_txt',
'syll_ipa',
'is_stressed',
'*w_peak',
'*w_stress',
'*s_unstress',
'*unres_across',
'*unres_within']
The default Meter allows up to 2-syllable strong/weak positions. You can change constraints, weights, position widths, or unit of parsing.
# stricter binary meter
strict = prosodic.Meter(
constraints=['w_peak', 'w_stress', 's_unstress', 'foot_size'],
max_s=1, max_w=1,
)
print(strict)Meter(constraints={'w_peak': 1.0, 'w_stress': 1.0, 's_unstress': 1.0, 'foot_size': 1.0}, max_s=1, max_w=1, resolve_optionality=True, parse_unit='line')
# parse with a custom meter
sonnet.parse(meter=strict)
print(sonnet.line1.best_parse)Parse(txt='when IN the CHRO ni CLE of WAS ted TIME')
Prosodic 3 includes prosodic/analysis/ (a port of the standalone poesy package) for higher-order summary statistics over a parsed text.
# meter classification (iambic / trochaic / anapestic / dactylic)
sonnet.meter_type{'foot': 'binary',
'head': 'final',
'type': 'iambic',
'mpos_freqs': {'w': 0.48175182481751827,
's': 0.48905109489051096,
'ww': 0.021897810218978103,
'ss': 0.0072992700729927005},
'perc_lines_starting': {'w': 0.9285714285714286, 's': 0.07142857142857142},
'perc_lines_ending': {'s': 0.8571428571428571, 'w': 0.14285714285714285},
'perc_lines_fourth': {'s': 0.8571428571428571, 'w': 0.14285714285714285},
'ambiguity': 2.4793969867312335}
# repeating beat-length template (e.g. invariable pentameter, ballad meter)
print('feet scheme:', sonnet.line_scheme)
print('syll scheme:', sonnet.syllable_scheme)feet scheme: {'combo': (5,), 'diff': 8}
syll scheme: {'combo': (10,), 'diff': 1}
Rhyme is computed via feature-weighted edit distance over IPA segments (panphon). 0 = perfect rhyme; higher = slant rhyme.
# pairwise rime distance
sonnet.line1.rime_distance(sonnet.lines[2]) # 'time' vs 'rhyme'0.0
# every rhyming line in the text, with its closest partner
for line, (dist, partner) in list(sonnet.get_rhyming_lines().items())[:6]:
print(f"L{line.num:2d} ↔ L{partner.num:2d} dist={dist:.2f} '{line.txt.strip()[:35]}' / '{partner.txt.strip()[:35]}'")L 3 ↔ L 1 dist=0.00 'And beauty making beautiful old rhy' / 'When in the chronicle of wasted tim'
L 8 ↔ L 6 dist=0.00 'Even such a beauty as you master no' / 'Of hand, of foot, of lip, of eye, o'
L14 ↔ L13 dist=0.00 'Had eyes to wonder, but lack tongue' / 'For we, which now behold these pres'
# per-line rhyme group IDs (0 = no rhyme partner)
print('IDs: ', sonnet.rhyme_ids)
from prosodic.analysis import nums_to_scheme
print('letters:', ''.join(nums_to_scheme(sonnet.rhyme_ids)))IDs: [1, 2, 1, 2, 0, 3, 0, 3, 0, 4, 5, 4, 5, 5]
letters: abab-c-c-dedee
Match observed rhyme groups against a 39-form catalog (Sonnet variants, Couplet, Sestet, Triplet, Rhyme Royal, Spenserian, etc.) by Jaccard similarity over rhyme-edge sets.
rs = sonnet.rhyme_scheme
print(f"name: {rs['name']}")
print(f"form: {rs['form']}")
print(f"accuracy: {rs['accuracy']:.2f}")
print()
print("top candidates:")
for name, form, score in rs['candidates'][:5]:
print(f" {score:.2f} {name:30s} {form}")name: Sonnet A
form: abab cdcd eefeff
accuracy: 0.70
top candidates:
0.70 Sonnet A abab cdcd eefeff
0.56 Sonnet, Shakespearean abab cdcd efefgg
0.43 Sonnet E abab cbcd cdedee
0.40 Sonnet B abab cdcd effegg
0.36 Sonnet D ababbcdc ceceff
# form predicates
print('is_sonnet: ', sonnet.is_sonnet)
print('is_shakespearean_sonnet: ', sonnet.is_shakespearean_sonnet)is_sonnet: True
is_shakespearean_sonnet: False
text.summary() rolls everything together: per-line parse + rhyme letter + foot/syllable count + ambiguity, plus an estimated-schema block.
print(sonnet.summary()) #st #ln parse rhyme #feet #syll #parse
----- ----- ----------- ------- ------- ------- --------
1 1 -+-+-+-+-+ a 5 10 2
1 2 -+-+-+-+-+ b 5 10 1
1 3 -+-+-+-+-+ a 5 10 3
1 4 -+-+-+-+-+ b 5 10 1
1 5 -+-+-+-+-+ - 5 10 8
1 6 -+-+-+-+-+ c 5 10 1
1 7 -+--++-+-+ - 4 10 8
1 8 +-+-+-+-+-+ c 6 11 2
1 9 -+-+-+-+-- - 4 10 3
1 10 -+-+-+-+-- d 4 10 6
1 11 -+-+-+-+-+ e 5 10 2
1 12 -+-+-+-+-+ d 5 10 2
1 13 -+-+-+-+-+ e 5 10 2
1 14 -+-+-+-+-+ e 5 10 3
estimated schema
----------
meter: Iambic
feet: Pentameter
syllables: 10
rhyme: Sonnet A (abab cdcd eefeff)
Meter.fit() learns constraint weights from a target scansion (or annotated data) using L-BFGS-B Maximum Entropy optimization (Goldwater & Johnson 2003 / Hayes MaxEnt OT). The learned weights can be split by syllable position (zones) so positional sensitivity transfers to parsing.
# Train weights to match an iambic pentameter target across all sonnet lines
import warnings
warnings.filterwarnings('ignore')
meter = prosodic.Meter()
meter.fit(sonnet, 'wswswswsws', zones=3)
print('top learned weights (zone × constraint):')
for name, w in sorted(meter.zone_weights.items(), key=lambda x: -abs(x[1]))[:8]:
print(f" {w:+.3f} {name}")�[93m[0.83s] prosodic.parsing.maxent.MaxEntTrainer._build_line_data(): 1/14 lines had no matching scansion among parser candidates (syllable count mismatch?)�[0m
top learned weights (zone × constraint):
+6.162 s_unstress_z1
+4.707 unres_within_z3
+4.149 unres_across_z2
+3.641 unres_across_z3
+3.557 unres_within_z2
+2.884 s_unstress_z3
+2.374 unres_across_z1
+2.075 w_stress_z2
With syntax=True, Prosodic uses spaCy's dependency parser to compute phrasal prominence (Liberman & Prince 1977) per word. This adds a phrasal_stress column to the syllable DataFrame and enables the w_prom and s_demoted constraints. Requires pip install prosodic[syntax].
t = prosodic.Text("...", syntax=True)
t.parse()
# phrasal_stress: 0 = sentence root, -1 = direct dependent, deeper = more embeddedParquet-backed save/load preserves the syllable DataFrame and any computed parse results — no need to re-parse on reload.
import tempfile, os, shutil
out = tempfile.mkdtemp(prefix='prosodic_demo_')
sonnet.save(out)
print('saved files:')
for f in sorted(os.listdir(out)):
print(f' {f}')
# reload
loaded = prosodic.TextModel.load(out)
print(f'\nreloaded: {len(loaded.lines)} lines, parse cached?',
loaded._cached_parsed_df is not None)
shutil.rmtree(out)saved files:
meta.json
parsed.parquet
syll.parquet
text.txt.gz
reloaded: 14 lines, parse cached?
True
A hosted instance is live at prosodic.app — no install required. To run it locally:
prosodic web # http://127.0.0.1:8181
prosodic web --port 5111
prosodic web --dev # auto-reload backend + frontendFive tabs: Parse (text input + corpus dropdown + sortable, paginated results), Line (single-line scansion detail showing all candidates), Meter (constraint config + weights), MaxEnt (annotated-data training), Settings. See prosodic/web/ for the implementation.
If you have access to a Prosodic server (prosodic web or prosodic.app), you can use the remote client to parse without installing torch / espeak / numpy locally — only requests is required.
import prosodic
prosodic.set_server('https://prosodic.app')
t = prosodic.Text("From fairest creatures we desire increase")
t.parse() # delegates to /api/parse
print(t.lines[0].best_parse.meter_str)
result = t.fit(target_scansion='wswswswsws', zones=3) # delegates to /api/maxent/fit
print(result.weights, result.accuracy)prosodic/parsing/constraints.py: every metrical constraint, with a vectorized lambda for the parserprosodic/parsing/maxent.py: MaxEnt OT weight learnerprosodic/analysis/: poem-level form classification (this notebook'smeter_type/rhyme_scheme/summary)prosodic/profiling.py: performance benchmarks (runpython -m prosodic.profiling)CLAUDE.md: architectural overview and design notes