TREC Podcasts Track

The Podcasts Track at the Text Retrieval Conference (TREC) was intended to encourage research in podcast retrieval and access by researchers from the information retrieval, the NLP, and the speech analysis fields. The track which was first organised in 2020 continued with small modifications in 2021 and while it did not run in 2022 the organisers will explore a potential modified new launch in 2023 in some form. In 2022, the related CHIIR workshop on Audio Collection Human Interaction discussed use cases and usage situations of podcast material.

Dataset

The core data for the challenge is the Spotify English-language Podcast Dataset, described here. Unfortunately, due to shifting priorities, the team no longer maintains the dataset, and of December 2023, no longer takes requests to access it.

Tasks

The Podcasts Track was organised with two challenge tasks: segment retrieval and podcast episode summarization.

Task 1: Segment retrieval

Given a query, retrieve relevant two-minute segments from the data. A segment is a two-minute chunk starting on the minute; e.g. [0.0-119.9] seconds, [60-179.9] seconds, [120-239.9] seconds, etc. The two-minute segments are judged manually by NIST assessors for their relevance to the query, using both the transcript of the podcast as well as the corresponding audio segment. Assessments were made on a graded scale of Perfect, Excellent, Good, Fair, Bad. (The Perfect score pertains only to queries with a given target item.)

In 2020 there were three types of queries: topical, known-item, and refinding queries. In 2021 the refinding queries and the known-item queries were combined, and add non-topical target notions, to get participants to find segments that are entertaining, opinionated, and contain discussion.

Resources

Search topics for the 2020 segment retrieval task
Relevance assessments (“qrels”)

Task 2: Summarization

Given a podcast episode, its audio, and its transcription, return a short text snippet capturing the most important information in the content. Returned summaries should be grammatical, standalone utterances of significantly shorter length than the input episode description. The quality of the summary is assessed manually by NIST assessors on a graded scale of Excellent, Good, Fair, Bad and via the standard Rouge metric compared to creator-provided descriptions of their episodes.

In 2020, the objective for the task was “to provide a short text summary that a user might read when deciding whether to listen to a podcast. The summary should accurately convey the content of the podcast, be human-readable, and be short enough to be quickly read on a smartphone screen.”

In 2021 the objective was very slightly reworded to give more guidance to participants as to what type of summary we are looking for and the participants were asked to submit an audio clip to give the listener a sense of what the episode sounds like.

Resources

Submitted summaries from the 2020 edition can be found in the data set directory.

Reports

An overview of the 2020 edition of the track is given in

Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth J. F. Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, and Yongze Yu. 2020. TREC 2020 Podcasts Track Overview. Proceedings from the 29th Text Retrieval Conference (TREC). NIST.

A first version of the 2021 edition of the track is given in

Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth J. F. Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, and Yongze Yu. 2020. TREC 2020 Podcasts Track Overview. (Notebook version: final version to appear in the TREC proceedings in early 2022)

Reports from the individual participants can be found from the TREC website

TREC 2020 publications

Organisers

2020

Ann Clifton, Spotify
Sravana Reddy, Spotify
Yongze Yu, Spotify
Aasish Pappu, Spotify
Jussi Karlgren, Spotify
Ben Carterette, Spotify
Jen McFadden, Spotify
Gareth Jones, Dublin City University
Maria Eskevich, CLARIN ERIC
Rosie Jones, Spotify

2021

Ben Carterette, Spotify
Ann Clifton, Spotify
Maria Eskevich, CLARIN ERIC
Gareth F. Jones, Dublin City University
Rosie Jones, Spotify
Jussi Karlgren, Spotify
Sravana Reddy, Spotify
Md Iftekhar Tanveer, Spotify