TREC Podcasts Track

The Podcasts Track at the Text Retrieval Conference (TREC) is intended to encourage research in podcast retrieval and access by researchers from the information retrieval, the NLP, and the speech analysis fields. The track which was first organised in 2020 continued with small modifications in 2021 will not run in 2022 but will explore a potential modified new launch in 2023. In the meanwhile, the CHIIR workshop on Audio Collection Human Interaction welcomes participants with ideas about the use cases and usage situations of podcast material!


The core data for the challenge is the Spotify English-language Podcast Dataset, available here for non-commercial use.


The Podcasts Track has two challenge tasks: segment retrieval and podcast episode summarization.

Task 1: Segment retrieval

Given a query, retrieve relevant two-minute segments from the data. A segment is a two-minute chunk starting on the minute; e.g. [0.0-119.9] seconds, [60-179.9] seconds, [120-239.9] seconds, etc. The two-minute segments are judged manually by NIST assessors for their relevance to the query, using both the transcript of the podcast as well as the corresponding audio segment. Assessments were made on a graded scale of Perfect, Excellent, Good, Fair, Bad. (The Perfect score pertains only to queries with a given target item.)

In 2020 there were three types of queries: topical, known-item, and refinding queries. In 2021 the refinding queries and the known-item queries were combined, and add non-topical target notions, to get participants to find segments that are entertaining, opinionated, and contain discussion.


Task 2: Summarization

Given a podcast episode, its audio, and its transcription, return a short text snippet capturing the most important information in the content. Returned summaries should be grammatical, standalone utterances of significantly shorter length than the input episode description. The quality of the summary is assessed manually by NIST assessors on a graded scale of Excellent, Good, Fair, Bad and via the standard Rouge metric compared to creator-provided descriptions of their episodes.

In 2020, the objective for the task was “to provide a short text summary that a user might read when deciding whether to listen to a podcast. The summary should accurately convey the content of the podcast, be human-readable, and be short enough to be quickly read on a smartphone screen.

In 2021 the objective was very slightly reworded to give more guidance to participants as to what type of summary we are looking for and the participants were asked to submit an audio clip to give the listener a sense of what the episode sounds like.



An overview of the 2020 edition of the track is given in

A first version of the 2021 edition of the track is given in

Reports from the individual participants can be found from the TREC website