0 Members and 1 Guest are viewing this topic.
Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions. Shortcut opportunities come in many flavors and are ubiquitous across datasets and application domains. A few examples are visualized here:At a principal level, shortcut learning is not a novel phenomenon: variants are known under different terms such as “learning under covariate shift”, “anti-causal learning”, “dataset bias”, the “tank legend” and the “Clever Hans effect”. We here discuss how shortcut learning unifies many of deep learning’s problems and what we can do to better understand and mitigate shortcut learning.What is a shortcut?In machine learning, the solutions that a model can learn are constrained by data, model architecture, optimizer and objective function. However, these constraints often don’t just allow for one single solution: there are typically many different ways to solve a problem. Shortcuts are solutions that perform well on a typical test set but fail under different circumstances, revealing a mismatch with our intentions.
Shortcut learning beyond deep learningOften such failures serve as examples for why machine learning algorithms are untrustworthy. However, biological learners suffer from very similar failure modes as well. In an experiment in a lab at the University of Oxford, researchers observed that rats learned to navigate a complex maze apparently based on subtle colour differences - very surprising given that the rat retina has only rudimentary machinery to support at best somewhat crude colour vision. Intensive investigation into this curious finding revealed that the rats had tricked the researchers: They did not use their visual system at all in the experiment and instead simply discriminated the colours by the odour of the colour paint used on the walls of the maze. Once smell was controlled for, the remarkable colour discrimination ability disappeared.Animals often trick experimenters by solving an experimental paradigm (i.e., dataset) in an unintended way without using the underlying ability one is actually interested in. This highlights how incredibly difficult it can be for humans to imagine solving a tough challenge in any other way than the human way: Surely, at Marr’s implementational level there may be differences between rat and human colour discrimination. But at the algorithmic level there is often a tacit assumption that human-like performance implies human-like strategy (or algorithm). This “same strategy assumption” is paralleled by deep learning: even if DNN units are different from biological neurons, if DNNs successfully recognise objects it seems natural to assume that they are using object shape like humans do. As a consequence, we need to distinguish between performance on a dataset and acquiring an ability, and exercise great care before attributing high-level abilities like “object recognition” or “language understanding” to machines, since there is often a much simpler explanation:Never attribute to high-level abilities that which can be adequately explained by shortcut learning.
The consequences of this behaviour are striking failures in generalization. Have a look at the figure below. On the left side there are a few directions in which humans would expect a model to generalize. A five is a five whether it is hand-drawn and black and white or a house number photographed in color. Similarly slight distortions or changes in pose, texture or background don’t influence our prediction about the main object in the image. In contrast a DNN can easily be fooled by all of them. Interestingly this does not mean that DNNs can’t generalize at all: In fact, they generalize perfectly well albeit in directions that hardly make sense to humans. The right side of the figure below shows some examples that range from the somewhat comprehensible - scrambling the image to keep only its texture - to the completely incomprehensible.
Deep learning systems used for applications such as autonomous driving are developed by training a machine learning model. Typically, the performance of the deep learning system is limited at least in part by the quality of the training set used to train the model.In many instances, significant resources are invested in collecting, curating, and annotating the training data. Traditionally, much of the effort to curate a training data set is done manually by reviewing potential training data and properly labeling the features associated with the data.The effort required to create a training set with accurate labels can be significant and is often tedious. Moreover, it is often difficult to collect and accurately label data that a machine learning model needs improvement on. Therefore, there exists a need to improve the process for generating training data with accurate labeled features.Tesla published patent 'Generating ground truth for machine learning from time series elements'Patent filing date: February 1, 2019Patent Publication Date: August 6, 2020The patent disclosed a machine learning training technique for generating highly accurate machine learning results. Using data captured by sensors on a vehicle a training data set is created. The sensor data may capture vehicle lane lines, vehicle lanes, other vehicle traffic, obstacles, traffic control signs, etc.
Currently, we produce ∼1021 digital bits of information annually on Earth. Assuming a 20% annual growth rate, we estimate that after ∼350 years from now, the number of bits produced will exceed the number of all atoms on Earth, ∼1050. After ∼300 years, the power required to sustain this digital production will exceed 18.5 × 1015 W, i.e., the total planetary power consumption today, and after ∼500 years from now, the digital content will account for more than half Earth’s mass, according to the mass-energy–information equivalence principle. Besides the existing global challenges such as climate, environment, population, food, health, energy, and security, our estimates point to another singular event for our planet, called information catastrophe.
In conclusion, we established that the incredible growth of digital information production would reach a singularity point when there are more digital bits created than atoms on the planet. At the same time, the digital information production alone will consume most of the planetary power capacity, leading to ethical and environmental concerns already recognized by Floridi who introduced the concept of “infosphere” and considered challenges posed by our digital information society.27 These issues are valid, regardless of the future developments in data storage technologies. In terms of digital data, the mass–energy–information equivalence principle formulated in 2019 has not yet been verified experimentally, but assuming this is correct, then in not the very distant future, most of the planet’s mass will be made up of bits of information. Applying the law of conservation in conjunction with the mass–energy–information equivalence principle, it means that the mass of the planet is unchanged over time. However, our technological progress inverts radically the distribution of the Earth’s matter from predominantly ordinary matter to the fifth form of digital information matter. In this context, assuming the planetary power limitations are solved, one could envisage a future world mostly computer simulated and dominated by digital bits and computer code.
When the neural network is being developed, called the training phase, GPT-3 is fed millions and millions of samples of text and it converts words into what are called vectors, numeric representations. That is a form of data compression. The program then tries to unpack this compressed text back into a valid sentence. The task of compressing and decompressing develops the program's accuracy in calculating the conditional probability of words.
In July, Debuild cofounder and CEO Sharif Shameem tweeted about a project he created that allowed him to build a website simply by describing its design. In the text box, he typed, "the google logo, a search box, and 2 lightgrey buttons that say 'Search Google' and 'I'm Feeling Lucky." The program then generated a virtual copy of the Google homepage.This program uses GPT-3, a "natural language generation" tool from research lab OpenAI, which was cofounded by Elon Musk. GPT-3 was trained on massive swathes of data and can spit our results that mimic human writing. Developers have used it for creative writing, designing websites, writing business memos, and more. Now, Shameem is using GPT-3 for Debuild, a no-code tool for building web apps just by describing what they look like and how they work.With this program, the user just needs to type in and describe what the application will look like and how it will work, and the tool will create a website based on those descriptions.
San Francisco-based AI research laboratory OpenAI has added another member to its popular GPT (Generative Pre-trained Transformer) family. In a new paper, OpenAI researchers introduce GPT-f, an automated prover and proof assistant for the Metamath formalization language.While artificial neural networks have made considerable advances in computer vision, natural language processing, robotics and so on, OpenAI believes they also have potential in the relatively underexplored area of reasoning tasks. The new research explores this potential by applying a transformer language model to automated theorem proving.