A new brief overview at some of the most disruptive AI technologies enhancing automation within the field of journalism
It is clear by now that artificial intelligence, machine learning and data science are enabling publishers, journalists and reporters to be more efficient in the retrieval, understanding and publishing of relevant content. They may be used indeed to generate a greater amount of increasingly fact-based, data-driven content, in a shorter amount of time, saving money and enabling journalists to monitor and keep up with the ever-increasing scale of global news published online.
If, on the one hand, fact-finding or fact-checking tasks are likely to become increasingly irrelevant with the rise of new AI tools & technologies, it is unlikely that the present AI developments in journalism will replace the role of the journalist or writer. Most jobs however will become “augmented” by additional capabilities to gather and manage data. Professional journalists and contributors infact may be required to become increasingly aware and familiar with new technologies and data-driven instruments out there.
Artificial intelligence in news media is being employed in a great amount of distinct ways. The most common use is research automation. It can be employed, for example, for collecting and cross-referencing data and beyond. The following post, based on the research conducted for last year’s article (https://connexun.medium.com/top-2020-ai-tools-for-automated-journalism-1f38e757c561) seeks to explore what new tasks are made possible by artificial intelligence, which AI applications are playing a role in augmenting the journalistic process. At last, it seeks to look into how publishers are using these applications to improve the quality of news media, and how will they affect the future of journalism.
Connexun’s news api enable users to source in real time multilingual headlines, articles and dynamic summaries extracted from thousands of trusted online news outlets around the globe. It continues to augment intelligent tools that leverage the power of Machine Learning and Natural Language Processing to analyze large datasets of aggregated content. Its crawling and classifying technologies are capable of signalling trending topics online and content published by media outlets around the globe.
Endpoints such as “Entity Extraction” can be a mean to monitor content based on a specific parameter, such as location, topic, name, expression and more. Same applies for “Topic Research”, another relevant endpoint that can be used to understand the different perspectives from distinct media sources around the globe on a specific argument. In order to better understand how to use Connexun’s distinct endpoints to be more effective and efficient throughout the research process and in order to undertake media monitoring activities reach out directly email@example.com.
Furthermore, Connexun’s capability to automatically generate summaries from aggregated articles can save journalists and writers a great amount of time and effort in understanding and generating original content. The Milan based startup is also taking Natural Language Generation (NLG) seriously, and thanks to Andrii Elyiv is seeking to launch soon collaborations and joint projects to start generating automized content. To keep up-to-date with its initiatives feel free to reach out firstname.lastname@example.org. Connexun is also developing an algorithm able to automatically recognize fact and opinion based content. The solution intends to fight fake news and increase public trust of media sources.
Connexun is sourcing multilingual headlines, articles and dynamic summaries from over 20.000 trusted information sites from hundreds of countries, in a great amount of distinct languages with its news & information API. It takes pride in employing its own artificial intelligent engine to value in its rankings sources dealing international matters. Its technology allows the browsing of multilingual headlines, news content and dynamic summaries based on key variables such as origin, country of interest, geolocation and more.
· Narrative Science’s Quill & Lexio — Natural Language Generation (NLG)
Narrative Science is a data storytelling and natural language generating (NLG) company creating technologies that turn data into plain-English stories (numbers into stories). Its mission is straightforward: “Data should be understandable for everyone. Stop creating unused dashboards and start data storytelling today”. Its offering includes first of all Quill, which transforms raw data into stories and embeds them directly into users’ favorite dashboards. Quill can be used as an extension of Qlik, Tableau and Power BI.
Lexio, on the other hand, is a language-based augmented analytics product that turns business data into interactive plain-English stories. Lexio can be employed together with Salesforce for example. It can be a very useful tool for automizing company reporting in particular. It can provide insights for those with limited data analytics abilities. It can be very useful if employees indicate a poor use and understanding of dashboards but need data insights to make decisions. It can provide an automated daily data briefing right to their inbox.
· Automated Insights — Natural Language Generation (NLG)
According to Ken Fuchs, head of Yahoo! Sports, most fantasy football users spend up to 29 hours per year reading about their teams (Daniel Faggella: “Yahoo! Uses NLG to Deliver Personal Fantasy Sports Recaps and Updates”, on Emerj.com). Yahoo! Sports uses NLG to produce over 70 million reports and match recaps, each one unique. Produced content helps to engage, monetize, and delight its user base.
Yahoo! used Automated Insights’ Wordsmith platform to showcase fantasy football data in the form of personalized reports, match previews, and match recaps. The Wordsmith platform uses natural language generation (NLG) to create personalized stories for each user. The process of creating a fantasy football match recap through an AI platform is a task that requires high levels of both variability and complexity in the content.
The stories are produced at great speed, scale and level of customization. The solution is indeed ideal also for the generation of ads, such as the Toyota ads integrated with AI content. The advertising benefits for Yahoo! were more than anecdotal, of course. Personalized, engaging content helps Yahoo! sell advertising and sponsorships at higher rates than standard content could support.
· Heliograf Smart Software, The Washington Post — Natural Language Generation (NLG)
The Washington Post has been experimenting another natural language generating (NLG) technology. Heliograf smart software automates news writing. It was tested during the Olympics in 2016 and was employed to put together news stories by analyzing data regarding the games and then matching the data to relevant phrases in a story template to develop content which could be published across different platforms. The software can also be used on behalf of journalists to signal any anomalies it finds within the data.
Automated products, such as Heliograf, got their original start in more data-grounded domains like sports and finance. It wrote 850 articles in a year, including 500 on the US elections, which generated more than 500.000 online hits. This is how artificial intelligence, developed by the American newspaper, can support journalists and reporters. Today continuous improvements have enabled the Heliograph software to automatically write articles according to the Washington Post’s editorial line.
· BBC’s Juicer — Semantic Discovery
The Juicer is a news aggregation and content extraction API. It takes news content, automatically tags it, then provides a fully featured API to access this content and data. The machine surveils c. 850 global news outlets’ RSS feeds (unlike Connexun, that uses its own proprietary crawler on a broader range of sources) and aggregates and extracts news articles. It takes articles from the BBC and other news sites, automatically parses them and tags them with related DBpedia entities.
After assigning semantic tags to the stories, it classifies them to one of four categories: organizations, locations, people and things. If a journalist is looking for the latest stories on President Bolsonaro or articles associated with companies in the travel industry, Juicer quickly searches the web and provides a list of related content. BBC Lab is also experimenting with adding this capability to video content by overlaying facts on different parts of an image or shot.
For more information visit BBC newslabs:
The platform is capable of converting data into interesting narratives. Wordsmith is an AI-powered tool that generates written analytics by transforming provided data. The tools that Wordsmith offers cater to those insights that relate to the humans, organization structure and the overall goals of the enterprise. It was not invented necessarily merely for reporting and journalistic purposes.
Well known companies, including Yahoo, Microsoft, Tableau, PwC are making use of this tool in order to generate around 1.5 billion pieces of content every year. Below is the screenshot of a sample write-up by Wordsmith. Wordsmith also has open API. Wordsmith’s paid plan starts from $250 per month for 1.000 articles (Staenz.com).
· Article Forge
Article Forge uses insightful algorithms to automatically rewrite the articles just the way a human being does. The intelligent algorithms automatically research on any topic, read an infinite number of articles and then writes the article in its own words. It also works as per the search engine optimization and works in accordance.
The tool also helps with easy scheduling and it also automatically posts content to the WordPress sites. The price of the tool is available for $324 yearly and $57 monthly along with 5 days no risk and money back guarantee (Staenz.com).
Articoolo is a content creation tool seeking to write articles just like humans would. Firstly, it understands the concept of the given topic. For example, if you want to write an article regarding a “variety of sketch pens”, then the algorithm of the tool will understand what a “Sketchpen” is and then start writing the article.
Once it gets the idea of the topic then the tool will search for the related resources and will extract the relevant keywords. And based on the search for the keywords it attempts to construct a reasonable piece of text (Staenz.com).
· Reuters — Data visualization technology (with Graphiq) and news tracer
Reuters developed another relevant tool to enrich data-driven news stories. In 2016, Reuters partnered with “Graphiq” to provide news publishers with a wide range of free interactive data visualizations across a spectrum of topics including entertainment, sports and news. Publishers can access the data via Reuters Open Media Express. Once embedded on publishers’ websites, data visualizations are updated in real time.
Futhermore, Reuters (as explained by X. Liu, et al. (2017) in: “Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data”) also developed its own news tracer, automating end-to-end news production using Twitter data. It is capable of detecting, classifying and disseminating news in real time for Reuters journalists without manual intervention. Tracer is topic and domain agnostic. It does not rely on a predefined set of sources or subjects. Instead, it identifies emerging conversations from 12+ million tweets per day and selects those that are news-like. Then, it contextualizes each story by adding a summary and a topic to it. An application such as Reuter’s News Tracer can track down breaking news, so that journalists are not tied down to grunt work and can be used in parallel with Connexun’s news api.
BBC’S Juicer technology is not the only semantic discovery tool out there. In 2015 The New York Times implemented its experimental AI project known as Editor. When writing an article, a journalist could use tags to highlight a phrase, headline, or main points of the text. Over time, the computer learns to recognize these semantic tags and learns the most salient parts of an article. By searching through data in real time and extracting information based on requested categories, such as events, people, location and dates Editor can make information more accessible, simplifying the research process and providing fast and accurate fact checking.
The New York Times is also using AI in a unique approach to moderate reader comments, encourage constructive discussion and eliminate harassment and abuse. The Perspective API tool developed by Jigsaw (part of Google’s parent company Alphabet) organizes reader’s comments interactively so that viewers can quickly see which ones they may find “toxic” and which may be more illuminating. It applies sentiment analysis to comments. It is a valuable instrument to make sure users read and interact with comments they are interested in while avoiding more aggressive ones.
For more information visit www.perspectiveapi.com:
· The Guardian — Chatbot Media Interfaces
Recent years signalled the advent also of chatbots. Chatbots were mainly employed to automize the distribution of news content to the audience (rather than automize content generation). In 2016, The Guardian launched its Chatbot via Facebook. To save time scrolling through or searching for news stories, the chatbot allowed users to pick from US, UK and Australian versions of Guardian News, choose from a 6am, 7am or 8am delivery time and receive selected news stories everyday via Facebook Messenger. Much like our Quartz example below, the interface replies to chat messages with related content relevant to the users query.
The Guardian is one of the many players with its chatbot technology, you can find more examples at the following link:
· Quartz Digital News — Chatbot Media Interfaces
Quartz is experimenting a media and news app that resembles “chat”, and uses natural language processing to find articles about events, people, or topics that its users request. Aiming, once again, at automizing content distribution. Today news media has moved not just from print to desktop to mobile phones, but also to other internet-connected devices for the home and car. Users are interacting with companies through chat, voice, and other innovative new channels, and Quartz wants to find the cutting edge for how media can be consumed too. Quartz aims to develop bots and AI in applications that will interface seamlessly with all media platforms.
More recently Quartz established the Quartz AI Studio to produce articles that use machine learning to assist journalists in the reporting of those articles, such as by separating the signal from the noise in terabytes of data in a fraction of the time that it would take a team of humans to comb through them. The publication plans to use its AI Studio to help others, particularly small- and mid-sized outlets that may not be able to staff a standalone team dedicated to AI-assisted reporting. The Quartz AI Studio team will publish how-to guides and release code examples that other publications can use to start incorporating the technology into their own reporting.
· Associated Press — Semantic Discovery, AI for Analytics, Automated Journalism
Furthermore, another relevant use case is that of Associated Press, which first began using AI for the creation of news content in 2013 to draw data and produce sport and earnings reports. These days the AP newsroom uses NewsWhip to keep ahead of trending news stories on social media such as Twitter, Facebook, Pinterest, and LinkedIn. As well as tracking news stories, it can analyze a real or historical time period on any timescale scale between 30 minutes and 3 years and provide reporters with real time alerts or daily digests.
· Bloomberg & Forbes — Cyborg & Bertie
Many of the algorithms converting data into narrative news text in real-time are financially focused news stories since data is calculated and released frequently, which is why should be no surprise that Bloomberg news is one of the first adaptors of this automated content. Their program, Cyborg, churned out thousands of articles last year that took financial reports and turned them into news stories like a business reporter.
The program can dissect a financial report the moment it appears and develop an immediate news story that includes the most pertinent facts and figures. Unlike business reporters, who find working on that kind of thing a snooze, it does so without complaint. Untiring and accurate, Cyborg “helps” Bloomberg in its race against Reuters, its main rival in the field of quick-twitch business financial journalism, as well as giving it a fighting chance against a more recent player in the information race, hedge funds, which use artificial intelligence to serve their clients fresh facts.
Forbes, on the other hand, is using “Bertie”. The engine behind the new site is an innovative content management system (“CMS”). An Artificially Intelligent publishing platform, Bertie is designed specifically for its in-house newsroom of journalists, its expert contributor network, and BrandVoice partners. Bertie’s artificial intelligence gives storytellers a bionic suit — providing real-time trending topics to cover, recommending ways to make headlines more compelling and suggesting relevant imagery. The publisher used historical data to weave augmented intelligence into its publishing platform.
“We think of it as a bionic suit for our writers,” Forbes CDO Salah Zalatimo
The AP estimates that AI helps to free up about 20 percent of reporters’ time spent covering financial earnings for companies and can improve accuracy. This gives reporters more time to concentrate on the content and story-telling behind an article rather than the fact-checking and research. Therefore, all in all, this could truly benefit journalism.
Thank you for your attention!*
*For more details / to signal new AI technologies enhancing automation within the field of journalism reach us out at: email@example.com