0 Members and 2 Guests are viewing this topic.
Powerful transformer models have been widely used in autoregressive generation, where they have advanced the state-of-the-art beyond recurrent neural networks (RNNs). However, because the output words for these models are incrementally predicted conditioned on the prefix, the generation requires quadratic time complexity with regard to sequence length.As the performance of transformer models increasingly relies on large-scale pretrained transformers, this long sequence generation issue has become increasingly problematic. To address this, a research team from the University of Washington, Microsoft, DeepMind and Allen Institute for AI have developed a method to convert a pretrained transformer into an efficient RNN. Their Transformer-to-RNN (T2R) approach speeds up generation and reduces memory cost.
Overall, the results validated that T2R achieves efficient autoregressive generation while retaining high accuracy, proving that large-scale pretrained models can be compressed into efficient inference models that facilitate downstream applications.
Whatever business a company may be in, software plays an increasingly vital role, from managing inventory to interfacing with customers. Software developers, as a result, are in greater demand than ever, and that's driving the push to automate some of the easier tasks that take up their time.
A machine capable of programming itself once seemed like science fiction. But an exponential rise in computing power, advances in natural language processing, and a glut of free code on the internet have made it possible to automate at least some aspects of software design.Trained on GitHub and other program-sharing websites, code-processing models learn to generate programs just as other language models learn to write news stories or poetry. This allows them to act as a smart assistant, predicting what software developers will do next, and offering an assist. They might suggest programs that fit the task at hand, or generate program summaries to document how the software works. Code-processing models can also be trained to find and fix bugs. But despite their potential to boost productivity and improve software quality, they pose security risks that researchers are just starting to uncover.
"Our framework for attacking the model, and retraining it on those particular exploits, could potentially help code-processing models get a better grasp of the program's intent," says Liu, co-senior author of the study. "That's an exciting direction waiting to be explored."In the background, a larger question remains: what exactly are these black-box deep-learning models learning? "Do they reason about code the way humans do, and if not, how can we make them?" says O'Reilly. "That's the grand challenge ahead for us."
Jonny Cheetham, Sales Director: Graph databases are a rising tide in the world of big data insights, and the enterprises that tap into their power realize significant competitive advantages.So how might your enterprise leverage graph databases to generate competitive insights and derive significant business value from your connected data? This webinar will show you the top five most impactful and profitable use cases of graph databases.
Multimodal Neurons in Artificial Neural NetworksWe’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.
Our discovery of multimodal neurons in CLIP gives us a clue as to what may be a common mechanism of both synthetic and natural vision systems—abstraction. We discover that the highest layers of CLIP organize images as a loose semantic collection of ideas, providing a simple explanation for both the model’s versatility and the representation’s compactness.