Metalearning the Structure of Information
Category: Machine Intelligence
<!-- gdoc-inlined -->
Practical Examples of Encoding Structure
- Attentional ShapeContextNet for Point Cloud Recognition
- Aggregated Residual Transformations for Deep Neural Networks
- Why does deep and cheap learning work so well?
- Symmetry Regularization
- Group Equivariant Convolutional Networks
- Spatial Transformer Networks
- Stochastic Video Generation with a Learned Prior
- Relational inductive biases, deep learning, and graph networks
- Distant Transfer for Continual Learning [Marc Pickett]
- Relational Deep Reinforcement Learning
I’m extremely surprised that I haven’t seen this comprehensive mode of thinking made explicit anywhere. If I’m actually the first person to come up with this abstraction (and the surprise of & questions asked by people like Pedro Domingos, Ryan Adams suggests that it’s rare) then I have a serious duty at hand. This is key to algorithm learning and creating low bias models against the curse of dimensionality. This in an ensemble model.
Types of structure in information:
- Hierarchical / Compositional / Combinatorial Structure
- Relational / Graphical Structure
- Recursive Structure
- Temporal / Sequential Structure
- Clustering Structure
- Discreteness - quantized
- Continuity - distribution
- Smoothness
- Sparsity
- Locality
- Linearity / Polynomial / Exponential Structure
Principles of Structure:
- Simplicity vs. complexity
- Bias - Variance Decomposition
- Abstraction - level of abstraction at which more or less structure, or different types of structure are present
- Framed as Compression
- Degree of Compression
- Directionality
- Discrete vs. Continuous
- Abstraction - fine vs. coarse grain structure
- Similarity, say, with a feature or set of features
- Randomness, degree to which there is structure, compressibility of data
- Homogeneity - degree to which the same operations can be run over objects in the structure
- Dimensionality - Interactions between features vs. single feature structure
Examples
- Hierarchical / Compositional / Combinatorial
- Images
- Language
- Set of axioms to euclidean geometry
- Organization's’ management structure
- Relational / Graphical
- Social Network
- Worldview (Tension with Hierarchical)
- Recursive (Top-down hierarchical)
- Trees
- Temporal
- Periodicity
- Messages’ bursting structure
- Quantized, like hitting lights for predicting arrival time
- Making food in a kitchen (Tempo)
- Dancing (Rhythmic / Periodic)
- Option / Permanence - School choice, Tatoos, Relationships
- Discreteness
- Categories - Number of Fields in an Academy
- Binary - Graduated or Not Graduated, Accepted or not Accepted, Given an offer or Not Given an Offer
- Continuity
- Intensity of emotion
- Amount of time on a task
- Causal
- Counterfactual - If I had done x, simulation.
- Imagination - If I do x, simulation.
Hierarchical Structure
- Abstraction
- Images
- Objects - Object Parts - Shapes - Lines / Curves
- Audio
- Words - Phonemes
- Businesses / Governments
- Sciences
- Physics
- Chemistry
- Biology
- Ontology of Species
- Organ Systems - Organs - Tissues - Cells - Nuclei + Organelles
- Brain
- Natural Language
- Fields - Concepts - Words (Combinatorial as well)
- Paragraph - Sentence - Phrase - Word - Character
- Time
- Centuries - Decades - Years - Months - Weeks - Days - Hours - Minutes - Seconds
- Measurement
- Kilometers - Meters - Centimeters - Millimeters
- Object Oriented Systems
- Classes - Objects
- Economy
- GDP - consumer spending + investment + Government Spending + Exports - Imports
Relational / Graphical Structure
- Object Oriented Structure
- Object (Entity)
- X is a Y relationships (Classification, Inheritance)
- X has a Y relationships (Composition / Aggregation)
- Properties of an Object
- Causal Graph - X leads to Y
- Dependency - X depends on Y
- Subject - Object relationships (in sentences)
- Linking verbs - ‘is’, ‘has’, ‘are’, ‘being’, ‘sense’ etc. between Object and Subject
- Co-occurrence
- Ex. Words mentioned in concert with one another
- Link - are connected
- Linkage Distribution
- Locality
- Edge Density
Temporal Structure
- Periodicity
- Hierarchical Periodicity
- Seasonality
- Burstiness
- Stationary vs. Non-Stationary Distributions
- Permanence / Option Structure
- Quantized
- Ex. hitting lights when predicting arrival time
- Autoregression / Autocovariance
- Feedback
- Positive Feedback
- Negative Feedback
- Length of feedback loops
- Synchronicity vs. Asynchronicity
- Discrete vs. Continuous
- Exponential Decay vs. Windowing
- Continuity vs. Discreteness
- Stability & Equilibrium
- Derivatives - change over time
- Objectness - these pixels move together
- Asymmetry between past and future
- Exclusive ability to directly impact present
- Strong predictor of causality / anti-causality
Relevant Links
- https://sites.google.com/site/icml18limitedlabels/
- https://arxiv.org/pdf/1608.08225.pdf
Papers
Notes
Regularizers impose a smoothness inductive bias, and weight decay / L2 regularization happens to impose smoothness.
But at the end of the day, we do induction. We realized that this bias worked well in the past and impose it on new data.
Big diff between having a causal model (true relationship is smooth, so imposing that prior will lead to a more efficient search in function space) and just predicting that it will work well because it worked on past data (with no model for why it’s working) There are different inductive biases imposed by every form of regularization, which should be listed and maximized.
- Dropout
- Algorithmic. Cuts the signal for inputs to a network.
- Does dropout work for linear models? For trees? How to deal with it at test time?
- Algorithmic. Cuts the signal for inputs to a network.
- Norm Penalties
- L1 (Sharp)
- L2 (Smooth) / Weight Decay
- Model averaging
- (the averaging step, not the step where variance is created through bagging, alternate parameterization, feature elimination (al rf & erf), etc.
- Intelligent Initialization
- Noise Injection
- Early Stopping
- Constraints on optimization
- Train and test time data augmentation
- Multi-task learning
- Multi-class as multi-task
- Pruning
- Weight Sharing
- Stochastic Optimization
- All models? All Priors?
Source: Original Google Doc