Skip to content

quadrismegistus/prosodic

Repository files navigation

Prosodic 3

Open In Colab Demo Code coverage

Prosodic is a Python library and web app for metrical-phonological analysis of poetry. It parses text into a linguistic hierarchy (text → stanza → line → word → syllable → phoneme), runs a constraint-satisfaction metrical parser, and identifies stress patterns (iambic, trochaic, anapestic, dactylic), foot/syllable schemes, and named rhyme schemes (sonnet variants, couplet, ballad, etc.).

Try the hosted version at prosodic.app — paste a poem, see scansions, rhyme schemes, and form classification immediately. This notebook walks through the full Python API — from parsing a single line up to poem-level form classification. Click the Open in Colab badge above to run it in your browser.

Built by Ryan Heuser, Josh Falk, and Arto Anttila, with contributions from Sam Bowman.

Install

pip install prosodic
# or for development:
pip install git+https://github.com/quadrismegistus/prosodic

You'll also need espeak (free TTS) to phonemize words not in the CMU dictionary:

  • Mac: brew install espeak
  • Linux: apt-get install espeak libespeak1 libespeak-dev
  • Windows: download from the espeak-ng releases

Setup (Colab only)

Skip this cell when running locally. It installs system + Python deps in a Colab runtime.

# Auto-install dependencies if running in Google Colab.
# Locally this is a no-op.
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess
    subprocess.run(
        ["apt-get", "-qq", "install", "-y",
         "espeak", "libespeak1", "libespeak-dev"],
        check=True,
    )
    subprocess.run(["pip", "install", "-q", "prosodic"], check=True)
    print("Colab setup complete.")
else:
    print("Local environment — skipping Colab setup.")
Local environment — skipping Colab setup.

Quickstart

A complete tour of Prosodic in five lines.

import prosodic

sonnet = prosodic.Text("""When in the chronicle of wasted time
I see descriptions of the fairest wights,
And beauty making beautiful old rhyme
In praise of ladies dead and lovely knights,
Then, in the blazon of sweet beauty's best,
Of hand, of foot, of lip, of eye, of brow,
I see their antique pen would have express'd
Even such a beauty as you master now.
So all their praises are but prophecies
Of this our time, all you prefiguring;
And, for they look'd but with divining eyes,
They had not skill enough your worth to sing:
For we, which now behold these present days,
Had eyes to wonder, but lack tongues to praise.""")

sonnet.parse()
print(sonnet.summary())
  #st    #ln  parse        rhyme      #feet    #syll    #parse
-----  -----  -----------  -------  -------  -------  --------
    1      1  -+-+-+-+-+   a              5       10         2
    1      2  -+-+-+-+-+   b              5       10         1
    1      3  -+-+-+-+-+   a              5       10         3
    1      4  -+-+-+-+-+   b              5       10         1
    1      5  -+-+-+-+-+   -              5       10         8
    1      6  -+-+-+-+-+   c              5       10         1
    1      7  -+--++-+-+   -              4       10         8
    1      8  +-+-+-+-+-+  c              6       11         2
    1      9  -+-+-+-+--   -              4       10         3
    1     10  -+-+-+-+--   d              4       10         6
    1     11  -+-+-+-+-+   e              5       10         2
    1     12  -+-+-+-+-+   d              5       10         2
    1     13  -+-+-+-+-+   e              5       10         2
    1     14  -+-+-+-+-+   e              5       10         3


estimated schema
----------
meter: Iambic
feet: Pentameter
syllables: 10
rhyme: Sonnet A (abab cdcd eefeff)

Reading texts

You can build a Text from a string, a file, or just a single line.

# from a string
short = prosodic.Text("A horse, a horse, my kingdom for a horse!")

# from a file (local path or URL)
shaksonnets = prosodic.Text(fn='https://raw.githubusercontent.com/quadrismegistus/prosodic/refs/heads/master/corpora/corppoetry_en/en.shakespeare.txt')

# a single line via .line1
line = prosodic.Text("Shall I compare thee to a summer's day?").line1

print(f"short: {len(short.lines)} line(s)")
print(f"sonnets: {len(shaksonnets.lines):,} lines, {len(shaksonnets.stanzas):,} stanzas")
print(f"single line: {line}")
short: 1 line(s)

�[32m[1.93s] Building long text�[0m: 0%| | 0/20307 [00:00<?, ?it/s]

�[32m[1.93s] Building long text�[0m: 16%|█▌ | 3167/20307 [00:00<00:00, 24475.44it/s]

�[32m[1.93s] Building long text�[0m: 35%|███▍ | 7015/20307 [00:00<00:00, 31822.63it/s]

�[32m[1.93s] Building long text�[0m: 51%|█████ | 10287/20307 [00:00<00:00, 28739.40it/s]

�[32m[1.93s] Building long text�[0m: 70%|██████▉ | 14152/20307 [00:00<00:00, 32236.26it/s]

�[32m[1.93s] Building long text�[0m: 86%|████████▌ | 17459/20307 [00:00<00:00, 28657.85it/s]

sonnets: 2,155 lines, 154 stanzas
single line: Line(num=1, txt="Shall I compare thee to a summer's day?")

The hierarchy: stanzas → lines → words → syllables → phonemes

Prosodic organizes text into a tree of linguistic entities. Children are constructed lazily on first access — the underlying source of truth is a per-syllable DataFrame.

# tree access
print(f"sonnet has {len(sonnet.stanzas)} stanzas, {len(sonnet.lines)} lines")
print(f"line 1 has {len(sonnet.lines[0].wordtokens)} word tokens")
print(f"first word: {sonnet.lines[0].wordtokens[0]}")
sonnet has 1 stanzas, 14 lines
line 1 has 7 word tokens
first word: WordToken(num=1, txt='When', lang='en', para_num=1, line_num=1, sent_num=1, sentpart_num=1, linepart_num=1)
# attribute shortcut: text.line1 == text.lines[0]
sonnet.line1

Line

<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
wordtoken_is_punc
stanza_num line_num linepart_num sent_num sentpart_num wordtoken_num wordtoken_txt wordtype_txt wordform_num wordform_ipa_origin syll_num syll_txt syll_ipa
1 1 1 1 1 1 When When 1 dict 1 When wɛn 0
2 dict 1 When 'wɛn 0
2 in in 1 dict 1 in ɪn 0
2 dict 1 in 'ɪn 0
3 the the 1 dict 1 the ðə 0
... ... ... ... ... ... ... ... ...
4 chronicle chronicle 1 dict 3 cle kəl 0
5 of of 1 dict 1 of ʌv 0
6 wasted wasted 1 dict 1 was 'weɪ 0
2 ted stəd 0
7 time time 1 dict 1 time 'taɪm 0

12 rows × 1 columns

# wordform → syllable → phoneme
wordform = sonnet.line1.wordtokens[1].wordform
print(f"wordform: {wordform}")
for syll in wordform.syllables:
    print(f"  syllable: {syll}, IPA={syll.ipa!r}, stressed={syll.is_stressed}, heavy={syll.is_heavy}")
    for phon in syll.phonemes:
        print(f"    phon: {phon.txt!r}")
wordform: WordForm(num=1, txt='in', force_ambig_stress=True, ipa_origin='dict')
  syllable: Syllable(num=1, txt='in', ipa='ɪn'), IPA='ɪn', stressed=False, heavy=True
    phon: 'ɪ'
    phon: 'n'

DataFrame view

The whole text is also accessible as a flat per-syllable DataFrame. This is the source of truth — entities are constructed from it on demand.

# .df is the syllable-level DataFrame
sonnet.df.head(8)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
word_num line_num para_num sent_num sentpart_num linepart_num word_txt is_punc form_idx num_forms syll_idx syll_ipa syll_text is_stressed is_heavy is_strong is_weak is_functionword
0 1 1 1 1 1 1 When 0 0 2 0 wɛn When False True False False True
1 1 1 1 1 1 1 When 0 1 2 0 'wɛn When True True False False False
2 2 1 1 1 1 1 in 0 0 2 0 ɪn in False True False False True
3 2 1 1 1 1 1 in 0 1 2 0 'ɪn in True True False False False
4 3 1 1 1 1 1 the 0 0 1 0 ðə the False False False False True
5 4 1 1 1 1 1 chronicle 0 0 1 0 'krɑ chro True False True False False
6 4 1 1 1 1 1 chronicle 0 0 1 1 ni False False False True False
7 4 1 1 1 1 1 chronicle 0 0 1 2 kəl cle False True False False False
# columns
list(sonnet.df.columns)
['word_num',
 'line_num',
 'para_num',
 'sent_num',
 'sentpart_num',
 'linepart_num',
 'word_txt',
 'is_punc',
 'form_idx',
 'num_forms',
 'syll_idx',
 'syll_ipa',
 'syll_text',
 'is_stressed',
 'is_heavy',
 'is_strong',
 'is_weak',
 'is_functionword']

Metrical parsing

text.parse() runs an exhaustive vectorized parser: it evaluates every possible scansion against a configurable set of metrical constraints (numpy on CPU, torch on GPU when available), then uses harmonic bounding to identify optimal parses. Constraints include w_peak (no peak in weak position), w_stress (no stress in weak), s_unstress (no unstress in strong), unres_within/unres_across (no unresolved disyllables), foot_size. See prosodic/parsing/constraints.py for the full list.

# parse a single line
line = prosodic.Text("Shall I compare thee to a summer's day?").line1
line.parse()
print(line.best_parse)
Parse(txt="shall I com PARE thee TO a SUM mer's DAY")
# inspect the parse
bp = line.best_parse
print(f"meter:     {bp.meter_str}    (- = weak, + = strong)")
print(f"stress:    {bp.stress_str}    (- = unstressed, + = stressed)")
print(f"score:     {bp.score}    (sum of weighted constraint violations)")
print(f"feet:      {bp.feet}")
print(f"foot_type: {bp.foot_type}    (per-parse classification)")
print(f"is_rising: {bp.is_rising}")
meter:     -+-+-+-+-+    (- = weak, + = strong)
stress:    ---+---+-+    (- = unstressed, + = stressed)
score:     2.0    (sum of weighted constraint violations)
feet:      ['ws', 'ws', 'ws', 'ws', 'ws']
foot_type: iambic    (per-parse classification)
is_rising: True
# all unbounded parses for the line, sorted by score
for p in line.parses.unbounded:
    print(f"{p.meter_str}  score={p.score}")
-+-+-+-+-+  score=2.0
# parse the full sonnet
sonnet.parse()
for line in sonnet.lines[:6]:
    bp = line.best_parse
    print(f"L{line.num:2d}  {bp.meter_str}  score={bp.score:.1f}  ambig={len(line.parses.unbounded)}")
L 1  -+-+-+-+-+  score=2.0  ambig=2
L 2  -+-+-+-+-+  score=1.0  ambig=1
L 3  -+-+-+-+-+  score=2.0  ambig=3
L 4  -+-+-+-+-+  score=0.0  ambig=1
L 5  -+-+-+-+-+  score=3.0  ambig=8
L 6  -+-+-+-+-+  score=0.0  ambig=1

The parsed DataFrame

Per-syllable parse results across the whole text — useful for analysis, plotting, or export.

sonnet.parsed_df.head(10)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
line_num word_num form_idx syll_idx line_syll_idx parse_idx parse_rank parse_score is_best is_bounded ... pos_size meter_val syll_txt syll_ipa is_stressed *w_peak *w_stress *s_unstress *unres_across *unres_within
0 1 1 0 0 0 1 1 2.0 True False ... 1 w When wɛn False 0 0 0 0 0
1 1 2 0 0 1 1 1 2.0 True False ... 1 s in ɪn False 0 0 1 0 0
2 1 3 0 0 2 1 1 2.0 True False ... 1 w the ðə False 0 0 0 0 0
3 1 4 0 0 3 1 1 2.0 True False ... 1 s chro 'krɑ True 0 0 0 0 0
4 1 4 0 1 4 1 1 2.0 True False ... 1 w ni False 0 0 0 0 0
5 1 4 0 2 5 1 1 2.0 True False ... 1 s cle kəl False 0 0 1 0 0
6 1 5 0 0 6 1 1 2.0 True False ... 1 w of ʌv False 0 0 0 0 0
7 1 6 0 0 7 1 1 2.0 True False ... 1 s was 'weɪ True 0 0 0 0 0
8 1 6 0 1 8 1 1 2.0 True False ... 1 w ted stəd False 0 0 0 0 0
9 1 7 0 0 9 1 1 2.0 True False ... 1 s time 'taɪm True 0 0 0 0 0

10 rows × 21 columns

# every column you might want for analysis
list(sonnet.parsed_df.columns)
['line_num',
 'word_num',
 'form_idx',
 'syll_idx',
 'line_syll_idx',
 'parse_idx',
 'parse_rank',
 'parse_score',
 'is_best',
 'is_bounded',
 'pos_idx',
 'pos_size',
 'meter_val',
 'syll_txt',
 'syll_ipa',
 'is_stressed',
 '*w_peak',
 '*w_stress',
 '*s_unstress',
 '*unres_across',
 '*unres_within']

Custom meters

The default Meter allows up to 2-syllable strong/weak positions. You can change constraints, weights, position widths, or unit of parsing.

# stricter binary meter
strict = prosodic.Meter(
    constraints=['w_peak', 'w_stress', 's_unstress', 'foot_size'],
    max_s=1, max_w=1,
)
print(strict)
Meter(constraints={'w_peak': 1.0, 'w_stress': 1.0, 's_unstress': 1.0, 'foot_size': 1.0}, max_s=1, max_w=1, resolve_optionality=True, parse_unit='line')
# parse with a custom meter
sonnet.parse(meter=strict)
print(sonnet.line1.best_parse)
Parse(txt='when IN the CHRO ni CLE of WAS ted TIME')

Poem-level analysis

Prosodic 3 includes prosodic/analysis/ (a port of the standalone poesy package) for higher-order summary statistics over a parsed text.

# meter classification (iambic / trochaic / anapestic / dactylic)
sonnet.meter_type
{'foot': 'binary',
 'head': 'final',
 'type': 'iambic',
 'mpos_freqs': {'w': 0.48175182481751827,
  's': 0.48905109489051096,
  'ww': 0.021897810218978103,
  'ss': 0.0072992700729927005},
 'perc_lines_starting': {'w': 0.9285714285714286, 's': 0.07142857142857142},
 'perc_lines_ending': {'s': 0.8571428571428571, 'w': 0.14285714285714285},
 'perc_lines_fourth': {'s': 0.8571428571428571, 'w': 0.14285714285714285},
 'ambiguity': 2.4793969867312335}
# repeating beat-length template (e.g. invariable pentameter, ballad meter)
print('feet  scheme:', sonnet.line_scheme)
print('syll  scheme:', sonnet.syllable_scheme)
feet  scheme: {'combo': (5,), 'diff': 8}
syll  scheme: {'combo': (10,), 'diff': 1}

Rhyme detection

Rhyme is computed via feature-weighted edit distance over IPA segments (panphon). 0 = perfect rhyme; higher = slant rhyme.

# pairwise rime distance
sonnet.line1.rime_distance(sonnet.lines[2])  # 'time' vs 'rhyme'
0.0
# every rhyming line in the text, with its closest partner
for line, (dist, partner) in list(sonnet.get_rhyming_lines().items())[:6]:
    print(f"L{line.num:2d} ↔ L{partner.num:2d}  dist={dist:.2f}  '{line.txt.strip()[:35]}' / '{partner.txt.strip()[:35]}'")
L 3 ↔ L 1  dist=0.00  'And beauty making beautiful old rhy' / 'When in the chronicle of wasted tim'
L 8 ↔ L 6  dist=0.00  'Even such a beauty as you master no' / 'Of hand, of foot, of lip, of eye, o'
L14 ↔ L13  dist=0.00  'Had eyes to wonder, but lack tongue' / 'For we, which now behold these pres'
# per-line rhyme group IDs (0 = no rhyme partner)
print('IDs:    ', sonnet.rhyme_ids)
from prosodic.analysis import nums_to_scheme
print('letters:', ''.join(nums_to_scheme(sonnet.rhyme_ids)))
IDs:     [1, 2, 1, 2, 0, 3, 0, 3, 0, 4, 5, 4, 5, 5]
letters: abab-c-c-dedee

Named rhyme scheme matching

Match observed rhyme groups against a 39-form catalog (Sonnet variants, Couplet, Sestet, Triplet, Rhyme Royal, Spenserian, etc.) by Jaccard similarity over rhyme-edge sets.

rs = sonnet.rhyme_scheme
print(f"name:     {rs['name']}")
print(f"form:     {rs['form']}")
print(f"accuracy: {rs['accuracy']:.2f}")
print()
print("top candidates:")
for name, form, score in rs['candidates'][:5]:
    print(f"  {score:.2f}  {name:30s} {form}")
name:     Sonnet A
form:     abab cdcd eefeff
accuracy: 0.70

top candidates:
  0.70  Sonnet A                       abab cdcd eefeff
  0.56  Sonnet, Shakespearean          abab cdcd efefgg
  0.43  Sonnet E                       abab cbcd cdedee
  0.40  Sonnet B                       abab cdcd effegg
  0.36  Sonnet D                       ababbcdc ceceff
# form predicates
print('is_sonnet:               ', sonnet.is_sonnet)
print('is_shakespearean_sonnet: ', sonnet.is_shakespearean_sonnet)
is_sonnet:                True
is_shakespearean_sonnet:  False

Tabular summary

text.summary() rolls everything together: per-line parse + rhyme letter + foot/syllable count + ambiguity, plus an estimated-schema block.

print(sonnet.summary())
  #st    #ln  parse        rhyme      #feet    #syll    #parse
-----  -----  -----------  -------  -------  -------  --------
    1      1  -+-+-+-+-+   a              5       10         2
    1      2  -+-+-+-+-+   b              5       10         1
    1      3  -+-+-+-+-+   a              5       10         3
    1      4  -+-+-+-+-+   b              5       10         1
    1      5  -+-+-+-+-+   -              5       10         8
    1      6  -+-+-+-+-+   c              5       10         1
    1      7  -+--++-+-+   -              4       10         8
    1      8  +-+-+-+-+-+  c              6       11         2
    1      9  -+-+-+-+--   -              4       10         3
    1     10  -+-+-+-+--   d              4       10         6
    1     11  -+-+-+-+-+   e              5       10         2
    1     12  -+-+-+-+-+   d              5       10         2
    1     13  -+-+-+-+-+   e              5       10         2
    1     14  -+-+-+-+-+   e              5       10         3


estimated schema
----------
meter: Iambic
feet: Pentameter
syllables: 10
rhyme: Sonnet A (abab cdcd eefeff)

MaxEnt weight learning

Meter.fit() learns constraint weights from a target scansion (or annotated data) using L-BFGS-B Maximum Entropy optimization (Goldwater & Johnson 2003 / Hayes MaxEnt OT). The learned weights can be split by syllable position (zones) so positional sensitivity transfers to parsing.

# Train weights to match an iambic pentameter target across all sonnet lines
import warnings
warnings.filterwarnings('ignore')

meter = prosodic.Meter()
meter.fit(sonnet, 'wswswswsws', zones=3)

print('top learned weights (zone × constraint):')
for name, w in sorted(meter.zone_weights.items(), key=lambda x: -abs(x[1]))[:8]:
    print(f"  {w:+.3f}  {name}")
�[93m[0.83s] prosodic.parsing.maxent.MaxEntTrainer._build_line_data(): 1/14 lines had no matching scansion among parser candidates (syllable count mismatch?)�[0m


top learned weights (zone × constraint):
  +6.162  s_unstress_z1
  +4.707  unres_within_z3
  +4.149  unres_across_z2
  +3.641  unres_across_z3
  +3.557  unres_within_z2
  +2.884  s_unstress_z3
  +2.374  unres_across_z1
  +2.075  w_stress_z2

Phrasal stress (optional)

With syntax=True, Prosodic uses spaCy's dependency parser to compute phrasal prominence (Liberman & Prince 1977) per word. This adds a phrasal_stress column to the syllable DataFrame and enables the w_prom and s_demoted constraints. Requires pip install prosodic[syntax].

t = prosodic.Text("...", syntax=True)
t.parse()
# phrasal_stress: 0 = sentence root, -1 = direct dependent, deeper = more embedded

Save and load

Parquet-backed save/load preserves the syllable DataFrame and any computed parse results — no need to re-parse on reload.

import tempfile, os, shutil
out = tempfile.mkdtemp(prefix='prosodic_demo_')
sonnet.save(out)
print('saved files:')
for f in sorted(os.listdir(out)):
    print(f'  {f}')

# reload
loaded = prosodic.TextModel.load(out)
print(f'\nreloaded: {len(loaded.lines)} lines, parse cached?',
      loaded._cached_parsed_df is not None)
shutil.rmtree(out)
saved files:
  meta.json
  parsed.parquet
  syll.parquet
  text.txt.gz



reloaded: 14 lines, parse cached?

 True

Web app

A hosted instance is live at prosodic.app — no install required. To run it locally:

prosodic web                     # http://127.0.0.1:8181
prosodic web --port 5111
prosodic web --dev               # auto-reload backend + frontend

Five tabs: Parse (text input + corpus dropdown + sortable, paginated results), Line (single-line scansion detail showing all candidates), Meter (constraint config + weights), MaxEnt (annotated-data training), Settings. See prosodic/web/ for the implementation.

Remote client

If you have access to a Prosodic server (prosodic web or prosodic.app), you can use the remote client to parse without installing torch / espeak / numpy locally — only requests is required.

import prosodic
prosodic.set_server('https://prosodic.app')

t = prosodic.Text("From fairest creatures we desire increase")
t.parse()                            # delegates to /api/parse
print(t.lines[0].best_parse.meter_str)

result = t.fit(target_scansion='wswswswsws', zones=3)  # delegates to /api/maxent/fit
print(result.weights, result.accuracy)

Further reading

About

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors