0 Members and 1 Guest are viewing this topic.
The advent of Transformers in 2017 completely changed the world of neural networks. Ever since, the core concept of Transformers has been remixed, repackaged, and rebundled in several models. The results have surpassed the state of the art in several machine learning benchmarks. In fact, currently all top benchmarks in the field of natural language processing are dominated by Transformer-based models. Some of the Transformer-family models are BERT, ALBERT, and the GPT series of models.In any machine learning model, the most important components of the training process are:The code of the model — the components of the model and its configurationThe data to be used for trainingThe available compute powerWith the Transformer family of models, researchers finally arrived at a way to increase the performance of a model infinitely: You just increase the amount of training data and compute power.This is exactly what OpenAI did, first with GPT-2 and then with GPT-3. Being a well funded ($1 billion+) company, it could afford to train some of the biggest models in the world. A private corpus of 500 billion tokens was used for training the model, and approximately $50 million was spent in compute costs.While the code for most of the GPT language models is open source, the model is impossible to replicate without the massive amounts of data and compute power. And OpenAI has chosen to withhold public access to its trained models, making them available via API to only a select few companies and individuals. Further, its access policy is undocumented, arbitrary, and opaque.
The bottom line here is: GPT-Neo is a great open source alternative to GPT-3, especially given OpenAI’s closed access policy.
IBM’s AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks. Called Project CodeNet, the dataset takes its name after ImageNet, the famous repository of labeled photos that triggered a revolution in computer vision and deep learning.While there’s a scant chance that machine learning models built on the CodeNet dataset will make human programmers redundant, there’s reason to be hopeful that they will make developers more productive.
With Project CodeNet, the researchers at IBM have tried to create a multi-purpose dataset that can be used to train machine learning models for various tasks. CodeNet’s creators describe it as a “very large scale, diverse, and high-quality dataset to accelerate the algorithmic advances in AI for Code.”The dataset contains 14 million code samples with 500 million lines of code written in 55 different programming languages. The code samples have been obtained from submissions to nearly 4,000 challenges posted on online coding platforms AIZU and AtCoder. The code samples include both correct and incorrect answers to the challenges.One of the key features of CodeNet is the amount of annotation that has been added to the examples. Every one of the coding challenges included in the dataset has a textual description along with CPU time and memory limits. Every code submission has a dozen pieces of information, including the language, the date of submission, size, execution time, acceptance, and error types.The researchers at IBM have also gone through great effort to make sure the dataset is balanced along different dimensions, including programming language, acceptance, and error types.
Machine learning algorithms have gained fame for being able to ferret out relevant information from datasets with many features, such as tables with dozens of rows and images with millions of pixels. Thanks to advances in cloud computing, you can often run very large machine learning models without noticing how much computational power works behind the scenes.But every new feature that you add to your problem adds to its complexity, making it harder to solve it with machine learning algorithms. Data scientists use dimensionality reduction, a set of techniques that remove excessive and irrelevant features from their machine learning models.Dimensionality reduction slashes the costs of machine learning and sometimes makes it possible to solve complicated problems with simpler models.
Deepmind, Co-founder and CEO, Demis Hassabis discusses how we can avoid bias being built into AI systems and what's next for DeepMind, including the future of protein folding, at WIRED Live 2020. "If we build it right, AI systems could be less biased than we are."
Talking to people while they are asleep can influence their dreams – and in some cases, the dreamer can respond without waking up.Ken Paller at Northwestern University in Evanston, Illinois, and his colleagues found that people could answer questions and even solve maths problems while lucid dreaming – a state that typically occurs during rapid eye-movement (REM) sleep when the dreamer is aware of being in a dream, and is sometimes able to control it.“We asked questions where we knew the answer because what we wanted to do is determine whether we were having good communication. We had to know if they were answering correctly,” says Paller.The team asked dreamers yes-no questions relating to their backgrounds and experiences, along with simple maths problems involving addition and subtraction. The dreamers weren’t aware of what questions they would be asked before they went to sleep.The dreamers, who had a range of experience with lucid dreaming, answered the questions correctly 29 times, incorrectly five times, and ambiguously 28 times by twitching their face muscles or moving their eyes. They didn’t respond on 96 occasions.
The millennia-old idea of expressing signals and data as a series of discrete states had ignited a revolution in the semiconductor industry during the second half of the 20th century. This new information age thrived on the robust and rapidly evolving field of digital electronics. The abundance of automation and tooling made it relatively manageable to scale designs in complexity and performance as demand grew. However, the power being consumed by AI and machine learning applications cannot feasibly grow as is on existing processing architectures.THE MACIn a digital neural network implementation, the weights and input data are stored in system memory and must be fetched and stored continuously through the sea of multiple-accumulate operations within the network. This approach results in most of the power being dissipated in fetching and storing model parameters and input data to the arithmetic logic unit of the CPU, where the actual multiply-accumulate operation takes place. A typical multiply-accumulate operation within a general-purpose CPU consumes more than two orders of magnitude greater than the computation itself.GPUsTheir ability to processes 3D graphics requires a larger number of arithmetic logic units coupled to high-speed memory interfaces. This characteristic inherently made them far more efficient and faster for machine learning by allowing hundreds of multiple-accumulate operations to process simultaneously. GPUs tend to utilize floating-point arithmetic, using 32 bits to represent a number by its mantissa, exponent, and sign. Because of this, GPU targeted machine learning applications have been forced to use floating-point numbers.ASICSThese dedicated AI chips are offer dramatically larger amounts of data movement per joule when compared to GPUs and general-purpose CPUs. This came as a result of the discovery that with certain types of neural networks, the dramatic reduction in computational precision only reduced network accuracy by a small amount. It will soon become infeasible to increase the number of multiply-accumulate units integrated onto a chip, or reduce bit- precision further. LOW POWER AIOutside of the realm of the digital world, It’s known definitively that extraordinarily dense neural networks can operate efficiently with small amounts of power.Much of the industry believes that the digital aspect of current systems will need to be augmented with a more analog approach in order to take machine learning efficiency further. With analog, computation does not occur in clocked stages of moving data, but rather exploit the inherent properties of a signal and how it interacts with a circuit, combining memory, logic, and computation into a single entity that can operate efficiently in a massively parallel manner. Some companies are beginning to examine returning to the long outdated technology of analog computing to tackle the challenge. Analog computing attempts to manipulate small electrical currents via common analog circuit building blocks, to do math.These signals can be mixed and compared, replicating the behavior of their digital counterparts. However, while large scale analog computing have been explored for decades for various potential applications, it has never been successfully executed as a commercial solution. Currently, the most promising approach to the problem is to integrate an analog computing element that can be programmed,, into large arrays, that are similar in principle to digital memory. By configuring the cells in an array, an analog signal, synthesized by a digital to analog converter is fed through the network.As this signal flows through a network of pre-programmed resistors, the currents are added to produce a resultant analog signal, which can be converted back to digital value via an analog to digital converter. Using an analog system for machine learning does however introduce several issues. Analog systems are inherently limited in precision by the noise floor. Though, much like using lower bit-width digital systems, this becomes less of an issue for certain types of networks.If analog circuitry is used for inferencing, the result may not be deterministic and is more likely to be affected by heat, noise or other external factors than a digital system. Another problem with analog machine learning is that of explain-ability. Unlike digital systems, analog systems offer no easy method to probe or debug the flow of information within them. Some in the industry propose that a solution may lie in the use of low precision high speed analog processors for most situations, while funneling results that require higher confidence to lower speed, high precision and easily interrogated digital systems.
Microsoft recently announced ZeRO-Infinity, an addition to their open-source DeepSpeed AI training library that optimizes memory use for training very large deep-learning models. Using ZeRO-Infinity, Microsoft trained a model with 32 trillion parameters on a cluster of 32 GPUs, and demonstrated fine-tuning of a 1 trillion parameter model on a single GPU.The DeepSpeed team described the new features in a recent blog post. ZeRO-Infinity is the latest iteration of the Zero Redundancy Optimizer (ZeRO) family of memory optimization techniques. ZeRO-Infinity introduces several new strategies for addressing memory and bandwidth constraints when training large deep-learning models, including: a new offload engine for exploiting CPU and Non-Volatile Memory express (NVMe) memory, memory-centric tiling to handle large operators without model-parallelism, bandwidth-centric partitioning for reducing bandwidth costs, and an overlap-centric design for scheduling data communication. According to the DeepSpeed team:QuoteThe improved ZeRO-Infinity offers the system capability to go beyond the GPU memory wall and train models with tens of trillions of parameters, an order of magnitude bigger than state-of-the-art systems can support. It also offers a promising path toward training 100-trillion-parameter models.
The improved ZeRO-Infinity offers the system capability to go beyond the GPU memory wall and train models with tens of trillions of parameters, an order of magnitude bigger than state-of-the-art systems can support. It also offers a promising path toward training 100-trillion-parameter models.
Researchers at IE University’s Center for the Governance of Change asked 2,769 people from 11 countries worldwide how they would feel about reducing the number of national parliamentarians in their country and giving those seats to an AI that would have access to their data.The results, published Thursday, showed that despite AI’s clear and obvious limitations, 51% of Europeans said they were in favor of such a move.Outside Europe, some 75% of people surveyed in China supported the idea of replacing parliamentarians with AI, while 60% of American respondents opposed it.LONDON — A study has found that most Europeans would like to see some of their members of parliament replaced by algorithms.Researchers at IE University’s Center for the Governance of Change asked 2,769 people from 11 countries worldwide how they would feel about reducing the number of national parliamentarians in their country and giving those seats to an AI that would have access to their data.
Transformer architectures have come to dominate the natural language processing (NLP) field since their 2017 introduction. One of the only limitations to transformer application is the huge computational overhead of its key component — a self-attention mechanism that scales with quadratic complexity with regard to sequence length.New research from a Google team proposes replacing the self-attention sublayers with simple linear transformations that “mix” input tokens to significantly speed up the transformer encoder with limited accuracy cost. Even more surprisingly, the team discovers that replacing the self-attention sublayer with a standard, unparameterized Fourier Transform achieves 92 percent of the accuracy of BERT on the GLUE benchmark, with training times that are seven times faster on GPUs and twice as fast on TPUs.Transformers’ self-attention mechanism enables inputs to be represented with higher-order units to flexibly capture diverse syntactic and semantic relationships in natural language. Researchers have long regarded the associated high complexity and memory footprint as an unavoidable trade-off on transformers’ impressive performance. But in the paper FNet: Mixing Tokens with Fourier Transforms, the Google team challenges this thinking with FNet, a novel model that strikes an excellent balance between speed, memory footprint and accuracy.
Last week, Google Research held an online workshop on the conceptual understanding of deep learning. The workshop, which featured presentations by award-winning computer scientists and neuroscientists, discussed how new findings in deep learning and neuroscience can help create better artificial intelligence systems.
The cognitive and neuroscience communities are trying to make sense of how neural activity in the brain translates to language, mathematics, logic, reasoning, planning, and other functions. If scientists succeed at formulating the workings of the brain in terms of mathematical models, then they will open a new door to creating artificial intelligence systems that can emulate the human mind.A lot of studies focus on activities at the level of single neurons. Until a few decades ago, scientists thought that single neurons corresponded to single thoughts. The most popular example is the “grandmother cell” theory, which claims there’s a single neuron in the brain that spikes every time you see your grandmother. More recent discoveries have refuted this claim and have proven that large groups of neurons are associated with each concept, and there might be overlaps between neurons that link to different concepts.These groups of brain cells are called “assemblies,” which Papadimitriou describes as “a highly connected, stable set of neurons which represent something: a word, an idea, an object, etc.”Award-winning neuroscientist György Buzsáki describes assemblies as “the alphabet of the brain.”
Trial and error would be much cheaper, hence more efficient, if we could do it in a virtual environment, like computer simulation, if we can get it to be adequately accurate and precise in representing objective reality. Adequately accurate and precise virtual representation of objective reality is what we commonly called knowledge. It's a form of data compression. At the most fundamental level, knowledge consist of two types of data: nodes and edges. They are the data points and the relationship among them, respectively.
In information systems design and theory, single source of truth (SSOT) is the practice of structuring information models and associated data schema such that every data element is mastered (or edited) in only one place. Any possible linkages to this data element (possibly in other areas of the relational schema or even in distant federated databases) are by reference only. Because all other locations of the data just refer back to the primary "source of truth" location, updates to the data element in the primary location propagate to the entire system without the possibility of a duplicate value somewhere being forgotten.Deployment of an SSOT architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional or unintentional denormalization of any explicit data model) pose a risk for retrieval of outdated, and therefore incorrect, information. A common example would be the electronic health record, where it is imperative to accurately validate patient identity against a single referential repository, which serves as the SSOT. Duplicate representations of data within the enterprise would be implemented by the use of pointers rather than duplicate database tables, rows, or cells. This ensures that data updates to elements in the authoritative location are comprehensively distributed to all federated database constituencies in the larger overall enterprise architecture.SSOT systems provide data that are authentic, relevant, and referable.
What is a single source of truth (SSOT)?Single source of truth (SSOT) is a concept used to ensure that everyone in an organization bases business decisions on the same data. Creating a single source of truth is straightforward. To put an SSOT in place, an organization must provide relevant personnel with one source that stores the data points they need.Data-driven decision making has placed never-before-seen levels of importance on collecting and analyzing data. While acting on data-derived business intelligence is essential for competitive brands today, companies often spend far too much time debating which numbers, invariably from different sources, are the right numbers to use. Metrics from social platforms may paint one picture of a company’s target demographics while vendor feedback or online questionnaires may say something entirely different. How are corporate leaders to decide whose data points to use in such a scenario?Establishing a single source of truth eliminates this issue. Instead of debating which of many competing data sources should be used for making company decisions, everyone can use the same, unified source for all their data needs It provides data that can be used by anyone, in any way, across the entire organization.
In practice, we may think that we can make a statement precisely without leaving any uncertainty using finite bits of information. For example, x-x=0, and x/x=1, with seemingly 0 uncertainty.On the other hand, to write ratio between circumference and diameter of a circle in decimal number accurately without uncertainty, infinite digits are required. What makes the difference here?
Sporting 1.75 trillion parameters, Wu Dao 2.0 is roughly ten times the size of Open AI's GPT-3.
When Open AI's GPT-3 model made its debut in May of 2020, its performance was widely considered to be the literal state of the art. Capable of generating text indiscernible from human-crafted prose, GPT-3 set a new standard in deep learning. But oh what a difference a year makes. Researchers from the Beijing Academy of Artificial Intelligence announced on Tuesday the release of their own generative deep learning model, Wu Dao, a mammoth AI seemingly capable of doing everything GPT-3 can do, and more.First off, Wu Dao is flat out enormous. It's been trained on 1.75 trillion parameters (essentially, the model's self-selected coefficients) which is a full ten times larger than the 175 billion GPT-3 was trained on and 150 billion parameters larger than Google's Switch Transformers. With all that computing power comes a whole bunch of capabilities. Unlike most deep learning models which perform a single task — write copy, generate deep fakes, recognize faces, win at Go — Wu Dao is multi-modal, similar in theory to Facebook's anti-hatespeech AI or Google's recently released MUM. BAAI researchers demonstrated Wu Dao's abilities to perform natural language processing, text generation, image recognition, and image generation tasks during the lab's annual conference on Tuesday. The model can not only write essays, poems and couplets in traditional Chinese, it can both generate alt text based off of a static image and generate nearly photorealistic images based on natural language descriptions. Wu Dao also showed off its ability to power virtual idols (with a little help from Microsoft-spinoff XiaoIce) and predict the 3D structures of proteins like AlphaFold.“The way to artificial general intelligence is big models and big computer,” Dr. Zhang Hongjiang, chairman of BAAI, said during the conference Tuesday. “What we are building is a power plant for the future of AI, with mega data, mega computing power, and mega models, we can transform data to fuel the AI applications of the future.”
At this point it should be clear that any new information must be related to preexisting common knowledge for it to be meaningful.
A common example I often see is road closures by government which are not accurately represented in Google Maps.
SummaryDeep learning has emerged as the technique of choice for identifying hidden patterns in cell imaging data but is often criticized as “black box.” Here, we employ a generative neural network in combination with supervised machine learning to classify patient-derived melanoma xenografts as “efficient” or “inefficient” metastatic, validate predictions regarding melanoma cell lines with unknown metastatic efficiency in mouse xenografts, and use the network to generate in silico cell images that amplify the critical predictive cell properties. These exaggerated images unveiled pseudopodial extensions and increased light scattering as hallmark properties of metastatic cells. We validated this interpretation using live cells spontaneously transitioning between states indicative of low and high metastatic efficiency. This study illustrates how the application of artificial intelligence can support the identification of cellular properties that are predictive of complex phenotypes and integrated cell functions but are too subtle to be identified in the raw imagery by a human expert. A record of this paper’s transparent peer review process is included in the supplemental information.