Understanding the Lede: taking action against information overload(ing)

Spotting the most relevant and informative section of news articles

By: Andrii Elyiv, Nikhil Aggarwal and Aldo Visibelli

Living in a fast paced society characterized by information overload(ing), also known as infobesity or infoxication, strongly inhibits the processing capacity of decision makers. Information overload is indeed likely to seriously hamper decision quality. Thus the ability of individuals and organizations to promptly make sense of key messages behind great amounts of information volumes becomes increasingly indispensable.

The online presence of high volumes of contradictory and, in some cases, unreliable published content regarding the recent Covid-19 crisis is a perfect example of the emerging challenges that individuals and organizations must address to make well-informed decisions. For this reason individuals and organizations, clearly overwhelmed by the wide variety of narratives and distinct perspectives, are currently eager for efficient text mining solutions seeking to speeden the process of better understanding and interpreting news. They must be offered the possibility to extrapolate the meaning by reading a composition’s section. The extrapolation of exhaustive summaries and highly-informative pieces of content may constitute a viable first step.

Information overload(ing) has put pressure on text analysis and text mining technologies to develop an automized solution capable of understanding and identifying the lede across a huge sample of news articles in different languages and with different structures. Identifying the lede is a significant first step to understand the published content, a first step towards making sense of great volumes of unstructured text data and a significant breakthrough towards the development of relevant news summaries. Identifying the lede implies spotting the key message within a piece of content.

The lede, or lead paragraph (sometimes shortened to lead), is jargon for the introductory portion of a news story — or what might be called the lead portion of the news story. Strictly speaking, the lede is the first sentence or short portion of an article that gives the gist of the story and contains the most important points readers need to know. Journalistic leads aim at grabbing readers’ attention. The failure to mention the most important or attention-grabbing elements of a story is sometimes called “burying the lead”. In journalism, the first paragraph that summarizes or introduces the story is also called the “blurb paragraph”, “teaser text” or, in the United Kingdom, the “standfirst”.

The lede usually does not exceed 40 words and can be found at the very beginning of the piece of content. The following analysis shows our attempt to spot the most relevant and informative section of news articles. As shown below, we have tried to assess the importance of sentences within news articles by using nearly 13,000 news articles and human written brief summaries. The level of importance, shown in the 2D plot below, was based on the occurrence levels of words present in both the human written summary and the article’s body.

Source: Connexun’s News API

We used two attributes for the analysis: relative sentence length with respect to average length of sentence in text (Y-axis) and its position within the body of the article (X-axis). As shown above, the longer and the closer the sentence to the end, the less informative it is. The yellow area defines the most informative or “important” part of the news lede, which usually resides between the 4ᵗʰ and 9ᵗʰ sentences of the piece of content.

For more information on Connexun follow us on Linkedin.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Connexun | news api

Connexun is the ultimate AI news engine — turning unstructured news content into multi-purpose actionable data.