Skip to the content.

TREC 2021 Podcasts Track Guidelines

Guidelines V2.9, September 1, 2021

The latest change is to make more explicit the format for an “Episode URI”

Task 1: Fixed-length Segment Retrieval

Given a retrieval topic (a phrase, sentence or set of words) and a set of ranking criteria, retrieve and rank relevant two-minute segments from the data. A segment is a two-minute chunk starting on the minute; e.g. [0.0-119.9] seconds, [60-179.9] seconds, [120-239.9] seconds, etc. The lists of segments are to be submitted in four separately ranked lists.

Topics for the Fixed-length Segment Retrieval Task

Topics consist of a topic number, keyword query, a query type, and a description of the user’s information need. The query types in 2021 are “topical” and “known-item”.

The test topics for 2021 are found on the active participants’ site at NIST. You need to register for the track to be able to access them.

Topical queries

40 of the 50 segment retrieval queries are of this type.

<topic>
<num>3</num>
<query>black hole image</query>
<type>topical</type>
<description>In May 2019 astronomers released the first-ever picture of a black hole. I would like to hear some conversations and educational discussion about the science of astronomy, black holes, and of the picture itself.</description>
</topic>

Ranking criteria for topical queries

For topical queries, as in 2020, participants are asked to submit a ranked list of topically relevant segments for each query topic. In addition, this year we ask for three reranked lists of those same topically relevant segments. This means that for each query topic we expect four segment lists. For some queries the reranking may have little effect, which we look forward to studying after submissions are in.

Assessment of topical queries in the fixed-length segment retrieval task

The submitted two-minute length segments will be judged by NIST assessors for their topical relevance to the topic description and each relevant topic will also be assess for their adherence to the reranking criteria. NIST assessors will have access to both the text transcript of the episode (including text before and after the text of the two-segment, which can be used as context) as well as the corresponding audio segment.

For the adhoc submission, the assessments will be made on the PEGFB graded scale (Perfect, Excellent, Good, Fair, Bad) as follows:

For the three other criteria, the assessments will be made on a three grade scale as follows:

Known-item queries

<topic>
<num>54</num>
<query>bias in college admissions</query>
<type>known item</type>
<description>I read an article that mentioned a podcast about bias in college admissions.  I would like to listen to it but I don’t know the name of the show.</description>
</topic>

Assessment of known-item queries in the fixed-length segment retrieval task

Evaluation Metric for fixed-length segment retrieval task

The primary metrics for evaluation will be nDCG at a cutoff of 30 documents, precision at 10, and nDCG over the entire ranked list. A single episode may contribute one or more relevant segments, some of which may be overlapping, but these will be treated as independent items for the purpose of nDCG computation. We expect to assess at a pool depth of 10, meaning the top 10 segments for each ranked list.

Submission Format for the fixed-length segment retrieval task

Submissions for the ad hoc retrieval task should be in standard whitespace-delimited TREC 6-column format.

Note: the submission format has been changed in V2.5 of these instructions.

TOPICID   QTYPE   EPISODEID_OFFSET   RANK   SCORE   RUNID

Participants may submit up to 4 runs. Each run should consist of a 6-column file consisting of ranked results for all topics in the test set of at most 1,000 segments for each topic. The QTYPE field is used to distinguish the various ranking criteria for each topic.

Known-item topics are only to be submitted with the QR ranking, since the various reranking criteria are not applicable to them. The RANK field is not must be an ascending integer starting from 1, restarting for each ranking criterion. Note that the assessment will be made from the top of the ranked list to the pool depth which will be determined after submissions are in but is almost certain to be less than 100 and likely to be less than 50.

Example submission

9 QR spotify:episode:3ZUU9IO0V8kaZaUPD6qqDY_360.0 1 121.2 myrun1
9 QR spotify:episode:000HP8n3hNIfglT2wSI2cA_60.0 2 87.8 myrun1
...
9 QS spotify:episode:0CDSmklC42307Ktg86ER7Y_840.0 1 87.98 myrun1
9 QS spotify:episode:000A9sRBYdVh66csG2qEdj_120.0 2 67.2 myrun1
...
9 QD spotify:episode:6O8djf3RL94yNfaoWqvk3r_840.0 1 4.82 myrun1
9 QD spotify:episode:6v0auT8BbeXzMTmg9FjyDB_1980.0 2 1.17 myrun1
...
9 QE spotify:episode:0bI9g00ZrF5VczfdkWds7a_2280.0 1 82.0 myrun1
9 QE spotify:episode:000HP8n3hNIfglT2wSI2cA_480.0 2 31.4 myrun1
...

Timeline for segment retrieval task

Task 2: Summarization

The user task is to provide a short description of the podcast episode to help the user decide whether to listen to a podcast. This user task is the background for both the assessment of the text snippet and the audio clip.

Given a podcast episode, its audio, and transcription, return

Assessment criteria for the summarization task

The assessment and scoring of the text snippet will be done similarly as last year, on the EGFB (Excellent, Good, Fair, Bad) scale as approximately follows:

The audio clips will be assessed by a yes/no-question:

Evaluation Set for Summarization

The evaluation set will be created from a set of 500 held-out episodes which will be used to test the summarization systems. These test episodes will be provided later in the task.

Submission Format for the summarization task

A run will comprise exactly two files per summary, where the name of each summary file is the ID of the episode it is intended to represent, with the suffix “_summary.txt” appended to the text summary file and “_clips.ogg” appended to the audio file. Please include files for a summary, even if your system happens to produce no output for that episode. Each text summary file will be read and assessed as a plain text file, so no special characters or markups are allowed. The audio file is expected in an OGG container with most reasonable listenable audio formats acceptable. The summary files should be delivered in a directory whose name should be the concatenation of the Team ID and a number (1 to N) for the run. (For example, if the Team ID is “SYSX’’ then the directory name for the first run should be “SYSX1”.) Please package the directory in a tarfile and gzip the tarfile before submitting it to NIST.

Timeline for summarization task

TREC Podcasts 2021 Track Organizers