24-12-2019 6:55 am Published by Nederland.ai Leave your thoughts

Summarizing text is a task where machine learning algorithms improve, as shown by a recent publication from Microsoft. That's good news – automatic summary systems promise to reduce the amount of messages that company employees read, which in a survey is estimated at 2.6 hours a day.

Not to be missed, a Google Brain and Imperial College London team built a system – Pre-training with Extracted Gap phrases for Abstractive SUmmarization Sequence-to-sequence, or Pegasus – that uses Google's Transformers architecture in combination with pretentious customized goals for generating abstract text. They say it achieves state-of-the-art results in 12 summary tasks that cover news, science, stories, instructions, emails, patents and bills, and that it shows “surprising” achievements field of summaries with few resources, surpassing previous top results on six data sets with only 1,000 examples.

As the researchers indicate, it is intended that the summaries of the text generate accurate and concise summaries from the input documents, as opposed to the implementing techniques. Rather than simply copying excerpts from the input, an abstract summary could produce new words or include the most important information, so that the output remains linguistically fluent.

Transformers are a kind of neural architecture that are introduced in a paper by researchers from Google Brain, the AI research department of Google. Like all deep neural networks, they contain functions (neurons) that are arranged in interconnected layers and that transmit signals from input data and slowly adjust the synaptic strength (weights) of each connection – so all AI models extract functions and learn to predict. to do. But Transformers have a unique attention. Each output element is connected to each input element, and the weights between the elements are calculated dynamically.

The team has devised a training task that masks entire, and in fact important, sentences within documents. The AI had to fill the gaps by drawing on web and news articles, including those in a new corpus (HugeNews) that the researchers have put together.

In experiments, the team selected their best performing Pegasus model – a model with 568 million parameters, or variables learned from historical data – trained on either 750 GB of text from 350 million web pages (Common Crawl) or on HugeNews, which 1, 5 billion articles spans with a total of 3.8TB collected from news and news-like websites. (The researchers say that in the case of HugeNews, a whitelist of domains ranging from high-quality news publishers to lower-quality sites was used to sow a web-crawling tool).

Pegasus achieved a high linguistic quality in terms of fluency and coherence, according to the researchers, and no countermeasures were needed to limit disfluences. In addition, in an environment with few resources and with only 100 sample items, it generated summaries of a quality comparable to a model that had been trained on a complete data set ranging from 20,000 to 200,000 items.

Source: https://venturebeat.com/2019/12/23/google-brains-ai-achieves-state-of-the-art-text-summarization-performance/

Tags: , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

6 − two =

The maximum upload file size: 256 MB. You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here