Prosodic 3

Prosodic is a Python library and web app for metrical-phonological analysis of poetry. It parses text into a linguistic hierarchy (text → stanza → line → word → syllable → phoneme), runs a constraint-satisfaction metrical parser, and identifies stress patterns (iambic, trochaic, anapestic, dactylic), foot/syllable schemes, and named rhyme schemes (sonnet variants, couplet, ballad, etc.).

Try the hosted version at prosodic.app — paste a poem, see scansions, rhyme schemes, and form classification immediately. This notebook walks through the full Python API — from parsing a single line up to poem-level form classification. Click the Open in Colab badge above to run it in your browser.

Built by Ryan Heuser, Josh Falk, and Arto Anttila, with contributions from Sam Bowman.

Install

pip install prosodic
# or for development:
pip install git+https://github.com/quadrismegistus/prosodic

You'll also need espeak (free TTS) to phonemize words not in the CMU dictionary:

Mac: brew install espeak
Linux: apt-get install espeak libespeak1 libespeak-dev
Windows: download from the espeak-ng releases

Setup (Colab only)

Skip this cell when running locally. It installs system + Python deps in a Colab runtime.

# Auto-install dependencies if running in Google Colab.
# Locally this is a no-op.
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess
    subprocess.run(
        ["apt-get", "-qq", "install", "-y",
         "espeak", "libespeak1", "libespeak-dev"],
        check=True,
    )
    subprocess.run(["pip", "install", "-q", "prosodic"], check=True)
    print("Colab setup complete.")
else:
    print("Local environment — skipping Colab setup.")

Local environment — skipping Colab setup.

Quickstart

A complete tour of Prosodic in five lines.

import prosodic

sonnet = prosodic.Text("""When in the chronicle of wasted time
I see descriptions of the fairest wights,
And beauty making beautiful old rhyme
In praise of ladies dead and lovely knights,
Then, in the blazon of sweet beauty's best,
Of hand, of foot, of lip, of eye, of brow,
I see their antique pen would have express'd
Even such a beauty as you master now.
So all their praises are but prophecies
Of this our time, all you prefiguring;
And, for they look'd but with divining eyes,
They had not skill enough your worth to sing:
For we, which now behold these present days,
Had eyes to wonder, but lack tongues to praise.""")

sonnet.parse()
print(sonnet.summary())

  #st    #ln  parse        rhyme      #feet    #syll    #parse
-----  -----  -----------  -------  -------  -------  --------
    1      1  -+-+-+-+-+   a              5       10         2
    1      2  -+-+-+-+-+   b              5       10         1
    1      3  -+-+-+-+-+   a              5       10         3
    1      4  -+-+-+-+-+   b              5       10         1
    1      5  -+-+-+-+-+   -              5       10         8
    1      6  -+-+-+-+-+   c              5       10         1
    1      7  -+--++-+-+   -              4       10         8
    1      8  +-+-+-+-+-+  c              6       11         2
    1      9  -+-+-+-+--   -              4       10         3
    1     10  -+-+-+-+--   d              4       10         6
    1     11  -+-+-+-+-+   e              5       10         2
    1     12  -+-+-+-+-+   d              5       10         2
    1     13  -+-+-+-+-+   e              5       10         2
    1     14  -+-+-+-+-+   e              5       10         3


estimated schema
----------
meter: Iambic
feet: Pentameter
syllables: 10
rhyme: Sonnet A (abab cdcd eefeff)

Reading texts

You can build a Text from a string, a file, or just a single line.

# from a string
short = prosodic.Text("A horse, a horse, my kingdom for a horse!")

# from a file (local path or URL)
shaksonnets = prosodic.Text(fn='https://raw.githubusercontent.com/quadrismegistus/prosodic/refs/heads/master/corpora/corppoetry_en/en.shakespeare.txt')

# a single line via .line1
line = prosodic.Text("Shall I compare thee to a summer's day?").line1

print(f"short: {len(short.lines)} line(s)")
print(f"sonnets: {len(shaksonnets.lines):,} lines, {len(shaksonnets.stanzas):,} stanzas")
print(f"single line: {line}")

short: 1 line(s)

�[32m[1.93s] Building long text�[0m: 0%| | 0/20307 [00:00<?, ?it/s]

�[32m[1.93s] Building long text�[0m: 16%|█▌ | 3167/20307 [00:00<00:00, 24475.44it/s]

�[32m[1.93s] Building long text�[0m: 35%|███▍ | 7015/20307 [00:00<00:00, 31822.63it/s]

�[32m[1.93s] Building long text�[0m: 51%|█████ | 10287/20307 [00:00<00:00, 28739.40it/s]

�[32m[1.93s] Building long text�[0m: 70%|██████▉ | 14152/20307 [00:00<00:00, 32236.26it/s]

�[32m[1.93s] Building long text�[0m: 86%|████████▌ | 17459/20307 [00:00<00:00, 28657.85it/s]

sonnets: 2,155 lines, 154 stanzas
single line: Line(num=1, txt="Shall I compare thee to a summer's day?")

The hierarchy: stanzas → lines → words → syllables → phonemes

Prosodic organizes text into a tree of linguistic entities. Children are constructed lazily on first access — the underlying source of truth is a per-syllable DataFrame.

# tree access
print(f"sonnet has {len(sonnet.stanzas)} stanzas, {len(sonnet.lines)} lines")
print(f"line 1 has {len(sonnet.lines[0].wordtokens)} word tokens")
print(f"first word: {sonnet.lines[0].wordtokens[0]}")

sonnet has 1 stanzas, 14 lines
line 1 has 7 word tokens
first word: WordToken(num=1, txt='When', lang='en', para_num=1, line_num=1, sent_num=1, sentpart_num=1, linepart_num=1)

# attribute shortcut: text.line1 == text.lines[0]
sonnet.line1

Line

													wordtoken_is_punc
stanza_num	line_num	linepart_num	sent_num	sentpart_num	wordtoken_num	wordtoken_txt	wordtype_txt	wordform_num	wordform_ipa_origin	syll_num	syll_txt	syll_ipa
1	1	1	1	1	1	When	When	1	dict	1	When	wɛn	0
					1	When	When	2	dict	1	When	'wɛn	0
					2	in	in	1	dict	1	in	ɪn	0
					2	in	in	2	dict	1	in	'ɪn	0
					3	the	the	1	dict	1	the	ðə	0
					...	...	...	...	...	...	...	...	...
					4	chronicle	chronicle	1	dict	3	cle	kəl	0
					5	of	of	1	dict	1	of	ʌv	0
					6	wasted	wasted	1	dict	1	was	'weɪ	0
					6	wasted	wasted	1	dict	2	ted	stəd	0
					7	time	time	1	dict	1	time	'taɪm	0

12 rows × 1 columns

# wordform → syllable → phoneme
wordform = sonnet.line1.wordtokens[1].wordform
print(f"wordform: {wordform}")
for syll in wordform.syllables:
    print(f"  syllable: {syll}, IPA={syll.ipa!r}, stressed={syll.is_stressed}, heavy={syll.is_heavy}")
    for phon in syll.phonemes:
        print(f"    phon: {phon.txt!r}")

wordform: WordForm(num=1, txt='in', force_ambig_stress=True, ipa_origin='dict')
  syllable: Syllable(num=1, txt='in', ipa='ɪn'), IPA='ɪn', stressed=False, heavy=True
    phon: 'ɪ'
    phon: 'n'

DataFrame view

The whole text is also accessible as a flat per-syllable DataFrame. This is the source of truth — entities are constructed from it on demand.

# .df is the syllable-level DataFrame
sonnet.df.head(8)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	word_num	line_num	para_num	sent_num	sentpart_num	linepart_num	word_txt	form_idx	num_forms	syll_idx	syll_ipa	syll_text	is_stressed	is_heavy	is_strong	is_weak	is_functionword
0	1	1	1	1	1	1	When	0	2	0	wɛn	When	False	True	False	False	True
1	1	1	1	1	1	1	When	1	2	0	'wɛn	When	True	True	False	False	False
2	2	1	1	1	1	1	in	0	2	0	ɪn	in	False	True	False	False	True
3	2	1	1	1	1	1	in	1	2	0	'ɪn	in	True	True	False	False	False
4	3	1	1	1	1	1	the	0	1	0	ðə	the	False	False	False	False	True
5	4	1	1	1	1	1	chronicle	0	1	0	'krɑ	chro	True	False	True	False	False
6	4	1	1	1	1	1	chronicle	0	1	1	nɪ	ni	False	False	False	True	False
7	4	1	1	1	1	1	chronicle	0	1	2	kəl	cle	False	True	False	False	False

# columns
list(sonnet.df.columns)

['word_num',
 'line_num',
 'para_num',
 'sent_num',
 'sentpart_num',
 'linepart_num',
 'word_txt',
 'is_punc',
 'form_idx',
 'num_forms',
 'syll_idx',
 'syll_ipa',
 'syll_text',
 'is_stressed',
 'is_heavy',
 'is_strong',
 'is_weak',
 'is_functionword']

Metrical parsing

text.parse() runs an exhaustive vectorized parser: it evaluates every possible scansion against a configurable set of metrical constraints (numpy on CPU, torch on GPU when available), then uses harmonic bounding to identify optimal parses. Constraints include w_peak (no peak in weak position), w_stress (no stress in weak), s_unstress (no unstress in strong), unres_within/unres_across (no unresolved disyllables), foot_size. See prosodic/parsing/constraints.py for the full list.

# parse a single line
line = prosodic.Text("Shall I compare thee to a summer's day?").line1
line.parse()
print(line.best_parse)

Parse(txt="shall I com PARE thee TO a SUM mer's DAY")

# inspect the parse
bp = line.best_parse
print(f"meter:     {bp.meter_str}    (- = weak, + = strong)")
print(f"stress:    {bp.stress_str}    (- = unstressed, + = stressed)")
print(f"score:     {bp.score}    (sum of weighted constraint violations)")
print(f"feet:      {bp.feet}")
print(f"foot_type: {bp.foot_type}    (per-parse classification)")
print(f"is_rising: {bp.is_rising}")

meter:     -+-+-+-+-+    (- = weak, + = strong)
stress:    ---+---+-+    (- = unstressed, + = stressed)
score:     2.0    (sum of weighted constraint violations)
feet:      ['ws', 'ws', 'ws', 'ws', 'ws']
foot_type: iambic    (per-parse classification)
is_rising: True

# all unbounded parses for the line, sorted by score
for p in line.parses.unbounded:
    print(f"{p.meter_str}  score={p.score}")

-+-+-+-+-+  score=2.0

# parse the full sonnet
sonnet.parse()
for line in sonnet.lines[:6]:
    bp = line.best_parse
    print(f"L{line.num:2d}  {bp.meter_str}  score={bp.score:.1f}  ambig={len(line.parses.unbounded)}")

L 1  -+-+-+-+-+  score=2.0  ambig=2
L 2  -+-+-+-+-+  score=1.0  ambig=1
L 3  -+-+-+-+-+  score=2.0  ambig=3
L 4  -+-+-+-+-+  score=0.0  ambig=1
L 5  -+-+-+-+-+  score=3.0  ambig=8
L 6  -+-+-+-+-+  score=0.0  ambig=1

The parsed DataFrame

Per-syllable parse results across the whole text — useful for analysis, plotting, or export.

sonnet.parsed_df.head(10)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	line_num	word_num	syll_idx	line_syll_idx	parse_idx	parse_rank	parse_score	is_best	is_bounded	...	pos_size	meter_val	syll_txt	syll_ipa	is_stressed	*s_unstress
0	1	1	0	0	1	1	2.0	True	False	...	1	w	When	wɛn	False	0
1	1	2	0	1	1	1	2.0	True	False	...	1	s	in	ɪn	False	1
2	1	3	0	2	1	1	2.0	True	False	...	1	w	the	ðə	False	0
3	1	4	0	3	1	1	2.0	True	False	...	1	s	chro	'krɑ	True	0
4	1	4	1	4	1	1	2.0	True	False	...	1	w	ni	nɪ	False	0
5	1	4	2	5	1	1	2.0	True	False	...	1	s	cle	kəl	False	1
6	1	5	0	6	1	1	2.0	True	False	...	1	w	of	ʌv	False	0
7	1	6	0	7	1	1	2.0	True	False	...	1	s	was	'weɪ	True	0
8	1	6	1	8	1	1	2.0	True	False	...	1	w	ted	stəd	False	0
9	1	7	0	9	1	1	2.0	True	False	...	1	s	time	'taɪm	True	0

10 rows × 21 columns

# every column you might want for analysis
list(sonnet.parsed_df.columns)

['line_num',
 'word_num',
 'form_idx',
 'syll_idx',
 'line_syll_idx',
 'parse_idx',
 'parse_rank',
 'parse_score',
 'is_best',
 'is_bounded',
 'pos_idx',
 'pos_size',
 'meter_val',
 'syll_txt',
 'syll_ipa',
 'is_stressed',
 '*w_peak',
 '*w_stress',
 '*s_unstress',
 '*unres_across',
 '*unres_within']

Custom meters

The default Meter allows up to 2-syllable strong/weak positions. You can change constraints, weights, position widths, or unit of parsing.

# stricter binary meter
strict = prosodic.Meter(
    constraints=['w_peak', 'w_stress', 's_unstress', 'foot_size'],
    max_s=1, max_w=1,
)
print(strict)

Meter(constraints={'w_peak': 1.0, 'w_stress': 1.0, 's_unstress': 1.0, 'foot_size': 1.0}, max_s=1, max_w=1, resolve_optionality=True, parse_unit='line')

# parse with a custom meter
sonnet.parse(meter=strict)
print(sonnet.line1.best_parse)

Parse(txt='when IN the CHRO ni CLE of WAS ted TIME')

Poem-level analysis

Prosodic 3 includes prosodic/analysis/ (a port of the standalone poesy package) for higher-order summary statistics over a parsed text.

# meter classification (iambic / trochaic / anapestic / dactylic)
sonnet.meter_type

{'foot': 'binary',
 'head': 'final',
 'type': 'iambic',
 'mpos_freqs': {'w': 0.48175182481751827,
  's': 0.48905109489051096,
  'ww': 0.021897810218978103,
  'ss': 0.0072992700729927005},
 'perc_lines_starting': {'w': 0.9285714285714286, 's': 0.07142857142857142},
 'perc_lines_ending': {'s': 0.8571428571428571, 'w': 0.14285714285714285},
 'perc_lines_fourth': {'s': 0.8571428571428571, 'w': 0.14285714285714285},
 'ambiguity': 2.4793969867312335}

# repeating beat-length template (e.g. invariable pentameter, ballad meter)
print('feet  scheme:', sonnet.line_scheme)
print('syll  scheme:', sonnet.syllable_scheme)

feet  scheme: {'combo': (5,), 'diff': 8}
syll  scheme: {'combo': (10,), 'diff': 1}

Rhyme detection

Rhyme is computed via feature-weighted edit distance over IPA segments (panphon). 0 = perfect rhyme; higher = slant rhyme.

# pairwise rime distance
sonnet.line1.rime_distance(sonnet.lines[2])  # 'time' vs 'rhyme'

0.0

# every rhyming line in the text, with its closest partner
for line, (dist, partner) in list(sonnet.get_rhyming_lines().items())[:6]:
    print(f"L{line.num:2d} ↔ L{partner.num:2d}  dist={dist:.2f}  '{line.txt.strip()[:35]}' / '{partner.txt.strip()[:35]}'")

L 3 ↔ L 1  dist=0.00  'And beauty making beautiful old rhy' / 'When in the chronicle of wasted tim'
L 8 ↔ L 6  dist=0.00  'Even such a beauty as you master no' / 'Of hand, of foot, of lip, of eye, o'
L14 ↔ L13  dist=0.00  'Had eyes to wonder, but lack tongue' / 'For we, which now behold these pres'

# per-line rhyme group IDs (0 = no rhyme partner)
print('IDs:    ', sonnet.rhyme_ids)
from prosodic.analysis import nums_to_scheme
print('letters:', ''.join(nums_to_scheme(sonnet.rhyme_ids)))

IDs:     [1, 2, 1, 2, 0, 3, 0, 3, 0, 4, 5, 4, 5, 5]
letters: abab-c-c-dedee

Named rhyme scheme matching

Match observed rhyme groups against a 39-form catalog (Sonnet variants, Couplet, Sestet, Triplet, Rhyme Royal, Spenserian, etc.) by Jaccard similarity over rhyme-edge sets.

rs = sonnet.rhyme_scheme
print(f"name:     {rs['name']}")
print(f"form:     {rs['form']}")
print(f"accuracy: {rs['accuracy']:.2f}")
print()
print("top candidates:")
for name, form, score in rs['candidates'][:5]:
    print(f"  {score:.2f}  {name:30s} {form}")

name:     Sonnet A
form:     abab cdcd eefeff
accuracy: 0.70

top candidates:
  0.70  Sonnet A                       abab cdcd eefeff
  0.56  Sonnet, Shakespearean          abab cdcd efefgg
  0.43  Sonnet E                       abab cbcd cdedee
  0.40  Sonnet B                       abab cdcd effegg
  0.36  Sonnet D                       ababbcdc ceceff

# form predicates
print('is_sonnet:               ', sonnet.is_sonnet)
print('is_shakespearean_sonnet: ', sonnet.is_shakespearean_sonnet)

is_sonnet:                True
is_shakespearean_sonnet:  False

Tabular summary

text.summary() rolls everything together: per-line parse + rhyme letter + foot/syllable count + ambiguity, plus an estimated-schema block.

print(sonnet.summary())

  #st    #ln  parse        rhyme      #feet    #syll    #parse
-----  -----  -----------  -------  -------  -------  --------
    1      1  -+-+-+-+-+   a              5       10         2
    1      2  -+-+-+-+-+   b              5       10         1
    1      3  -+-+-+-+-+   a              5       10         3
    1      4  -+-+-+-+-+   b              5       10         1
    1      5  -+-+-+-+-+   -              5       10         8
    1      6  -+-+-+-+-+   c              5       10         1
    1      7  -+--++-+-+   -              4       10         8
    1      8  +-+-+-+-+-+  c              6       11         2
    1      9  -+-+-+-+--   -              4       10         3
    1     10  -+-+-+-+--   d              4       10         6
    1     11  -+-+-+-+-+   e              5       10         2
    1     12  -+-+-+-+-+   d              5       10         2
    1     13  -+-+-+-+-+   e              5       10         2
    1     14  -+-+-+-+-+   e              5       10         3


estimated schema
----------
meter: Iambic
feet: Pentameter
syllables: 10
rhyme: Sonnet A (abab cdcd eefeff)

MaxEnt weight learning

Meter.fit() learns constraint weights from a target scansion (or annotated data) using L-BFGS-B Maximum Entropy optimization (Goldwater & Johnson 2003 / Hayes MaxEnt OT). The learned weights can be split by syllable position (zones) so positional sensitivity transfers to parsing.

# Train weights to match an iambic pentameter target across all sonnet lines
import warnings
warnings.filterwarnings('ignore')

meter = prosodic.Meter()
meter.fit(sonnet, 'wswswswsws', zones=3)

print('top learned weights (zone × constraint):')
for name, w in sorted(meter.zone_weights.items(), key=lambda x: -abs(x[1]))[:8]:
    print(f"  {w:+.3f}  {name}")

�[93m[0.83s] prosodic.parsing.maxent.MaxEntTrainer._build_line_data(): 1/14 lines had no matching scansion among parser candidates (syllable count mismatch?)�[0m


top learned weights (zone × constraint):
  +6.162  s_unstress_z1
  +4.707  unres_within_z3
  +4.149  unres_across_z2
  +3.641  unres_across_z3
  +3.557  unres_within_z2
  +2.884  s_unstress_z3
  +2.374  unres_across_z1
  +2.075  w_stress_z2

Phrasal stress (optional)

With syntax=True, Prosodic uses spaCy's dependency parser to compute phrasal prominence (Liberman & Prince 1977) per word. This adds a phrasal_stress column to the syllable DataFrame and enables the w_prom and s_demoted constraints. Requires pip install prosodic[syntax].

t = prosodic.Text("...", syntax=True)
t.parse()
# phrasal_stress: 0 = sentence root, -1 = direct dependent, deeper = more embedded

Save and load

Parquet-backed save/load preserves the syllable DataFrame and any computed parse results — no need to re-parse on reload.

import tempfile, os, shutil
out = tempfile.mkdtemp(prefix='prosodic_demo_')
sonnet.save(out)
print('saved files:')
for f in sorted(os.listdir(out)):
    print(f'  {f}')

# reload
loaded = prosodic.TextModel.load(out)
print(f'\nreloaded: {len(loaded.lines)} lines, parse cached?',
      loaded._cached_parsed_df is not None)
shutil.rmtree(out)

saved files:
  meta.json
  parsed.parquet
  syll.parquet
  text.txt.gz



reloaded: 14 lines, parse cached?

 True

Web app

A hosted instance is live at prosodic.app — no install required. To run it locally:

prosodic web                     # http://127.0.0.1:8181
prosodic web --port 5111
prosodic web --dev               # auto-reload backend + frontend

Five tabs: Parse (text input + corpus dropdown + sortable, paginated results), Line (single-line scansion detail showing all candidates), Meter (constraint config + weights), MaxEnt (annotated-data training), Settings. See prosodic/web/ for the implementation.

Remote client

If you have access to a Prosodic server (prosodic web or prosodic.app), you can use the remote client to parse without installing torch / espeak / numpy locally — only requests is required.

import prosodic
prosodic.set_server('https://prosodic.app')

t = prosodic.Text("From fairest creatures we desire increase")
t.parse()                            # delegates to /api/parse
print(t.lines[0].best_parse.meter_str)

result = t.fit(target_scansion='wswswswsws', zones=3)  # delegates to /api/maxent/fit
print(result.weights, result.accuracy)

Name		Name	Last commit message	Last commit date
Latest commit History 723 Commits
.claude		.claude
.github/workflows		.github/workflows
corpora		corpora
data		data
deploy		deploy
desktop		desktop
docs		docs
notebooks		notebooks
prosodic		prosodic
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
.style.yapf		.style.yapf
CLAUDE.md		CLAUDE.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.ipynb		README.ipynb
README.md		README.md
_version.py		_version.py
codecov.yml		codecov.yml
dev-requirements.txt		dev-requirements.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prosodic 3

Install

Setup (Colab only)

Quickstart

Reading texts

The hierarchy: stanzas → lines → words → syllables → phonemes

DataFrame view

Metrical parsing

The parsed DataFrame

Custom meters

Poem-level analysis

Rhyme detection

Named rhyme scheme matching

Tabular summary

MaxEnt weight learning

Phrasal stress (optional)

Save and load

Web app

Remote client

Further reading

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prosodic 3

Install

Setup (Colab only)

Quickstart

Reading texts

The hierarchy: stanzas → lines → words → syllables → phonemes

DataFrame view

Metrical parsing

The parsed DataFrame

Custom meters

Poem-level analysis

Rhyme detection

Named rhyme scheme matching

Tabular summary

MaxEnt weight learning

Phrasal stress (optional)

Save and load

Web app

Remote client

Further reading

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages