TalkLess uses an optimization-based approach to select the best transcript edits. For each segment, we generate 25 candidate shortened transcripts and evaluate them using a multi-objective function that balances several key criteria:
Evaluation Function
Variable Definitions
- $C_{i,j}$: The j-th candidate shortened transcript for segment $S_i$
- $\tau$: Target compression ratio (e.g., 0.15, 0.25, 0.5, 0.75)
- $S_i$: Original transcript segment i
- $\lambda_1, \lambda_2, \lambda_3, \lambda_4$: Weighting coefficients for each optimization component
Optimization Components:
1. Compression Score
Measures how well the candidate achieves the target compression ratio $\tau$. Higher scores indicate better alignment with the desired compression level.
2. Number of Edits
Minimizes audio artifacts by reducing the number of required edit operations. Uses the Needleman-Wunsch algorithm to compute minimum edit distance between original and shortened transcript, then normalizes by segment length.
3. Insertion Length
Encourages candidates that avoid long insertions while preserving important content. Penalizes lengthy inserted phrases that may disrupt natural speech flow.
4. Coverage Score
Ensures preservation of important information by matching each sentence $s$ in the original segment to the most similar sentence $c$ in the candidate using sentence transformers, then averaging similarity scores.