I’ve spent a lot of money on music over the years and one website that I have purchased mp3’s from is JunoDownload. It’s a digital download website predominantly used by DJs and has a huge back catalogue of tracks for sale on its platform.
It’s a great music resource and they provide a generous 2 minute sample mp3 file for each song they have for sale. The only problem is…it’s really hard to find music on the site that isn’t a new release or currently top of the sales charts.
The website is heavily geared towards promoting new content, and that makes sense as it’s going to be the new music that generates the most revenue – but what about the other 99% of tracks for sale on the website?
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
The synchronization map can be output to file in several formats, depending on its application:
- research: Audacity (AUD), ELAN (EAF), TextGrid;
- digital publishing: SMIL for EPUB 3;
- closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
- Web: JSON;
- further processing: CSV, SSV, TSV, TXT, XML.