Metalearning the Structure of Information

Practical Examples of Encoding Structure

Attentional ShapeContextNet for Point Cloud Recognition
Aggregated Residual Transformations for Deep Neural Networks
Why does deep and cheap learning work so well?
Symmetry Regularization
Group Equivariant Convolutional Networks
Spatial Transformer Networks
Stochastic Video Generation with a Learned Prior
Relational inductive biases, deep learning, and graph networks
Distant Transfer for Continual Learning [Marc Pickett]
Relational Deep Reinforcement Learning

I’m extremely surprised that I haven’t seen this comprehensive mode of thinking made explicit anywhere. If I’m actually the first person to come up with this abstraction (and the surprise of & questions asked by people like Pedro Domingos, Ryan Adams suggests that it’s rare) then I have a serious duty at hand. This is key to algorithm learning and creating low bias models against the curse of dimensionality. This in an ensemble model.

Types of structure in information:

Hierarchical / Compositional / Combinatorial Structure
Relational / Graphical Structure
Recursive Structure
Temporal / Sequential Structure
Clustering Structure
Discreteness - quantized
Continuity - distribution
Smoothness
Sparsity
Locality
Linearity / Polynomial / Exponential Structure

Principles of Structure:

Simplicity vs. complexity
Bias - Variance Decomposition
Abstraction - level of abstraction at which more or less structure, or different types of structure are present
Framed as Compression
1. Degree of Compression
Directionality
Discrete vs. Continuous
Abstraction - fine vs. coarse grain structure
Similarity, say, with a feature or set of features
Randomness, degree to which there is structure, compressibility of data
Homogeneity - degree to which the same operations can be run over objects in the structure
Dimensionality - Interactions between features vs. single feature structure

Examples

Hierarchical / Compositional / Combinatorial
1. Images
2. Language
3. Set of axioms to euclidean geometry
4. Organization's’ management structure
Relational / Graphical
1. Social Network
2. Worldview (Tension with Hierarchical)
Recursive (Top-down hierarchical)
1. Trees
Temporal
1. Periodicity
2. Messages’ bursting structure
3. Quantized, like hitting lights for predicting arrival time
4. Making food in a kitchen (Tempo)
5. Dancing (Rhythmic / Periodic)
6. Option / Permanence - School choice, Tatoos, Relationships
Discreteness
1. Categories - Number of Fields in an Academy
2. Binary - Graduated or Not Graduated, Accepted or not Accepted, Given an offer or Not Given an Offer
Continuity
1. Intensity of emotion
2. Amount of time on a task
Causal
1. Counterfactual - If I had done x, simulation.
2. Imagination - If I do x, simulation.

Hierarchical Structure

Abstraction
Images
1. Objects - Object Parts - Shapes - Lines / Curves
Audio
1. Words - Phonemes
Businesses / Governments
Sciences
1. Physics
2. Chemistry
3. Biology
  1. Ontology of Species
  2. Organ Systems - Organs - Tissues - Cells - Nuclei + Organelles
  3. Brain
Natural Language
1. Fields - Concepts - Words (Combinatorial as well)
2. Paragraph - Sentence - Phrase - Word - Character
Time
1. Centuries - Decades - Years - Months - Weeks - Days - Hours - Minutes - Seconds
Measurement
1. Kilometers - Meters - Centimeters - Millimeters
Object Oriented Systems
1. Classes - Objects
Economy
GDP - consumer spending + investment + Government Spending + Exports - Imports

Relational / Graphical Structure

Object Oriented Structure
1. Object (Entity)
2. X is a Y relationships (Classification, Inheritance)
3. X has a Y relationships (Composition / Aggregation)
4. Properties of an Object
Causal Graph - X leads to Y
Dependency - X depends on Y
Subject - Object relationships (in sentences)
1. Linking verbs - ‘is’, ‘has’, ‘are’, ‘being’, ‘sense’ etc. between Object and Subject
Co-occurrence
1. Ex. Words mentioned in concert with one another
Link - are connected
1. Linkage Distribution
Locality
Edge Density

Temporal Structure

Periodicity
1. Hierarchical Periodicity
2. Seasonality
Burstiness
Stationary vs. Non-Stationary Distributions
Permanence / Option Structure
Quantized
1. Ex. hitting lights when predicting arrival time
Autoregression / Autocovariance
Feedback
1. Positive Feedback
2. Negative Feedback
3. Length of feedback loops
Synchronicity vs. Asynchronicity
1. Discrete vs. Continuous
Exponential Decay vs. Windowing
1. Continuity vs. Discreteness
Stability & Equilibrium
Derivatives - change over time
Objectness - these pixels move together
Asymmetry between past and future
Exclusive ability to directly impact present
Strong predictor of causality / anti-causality

Relevant Links

https://sites.google.com/site/icml18limitedlabels/
https://arxiv.org/pdf/1608.08225.pdf

Papers

Notes

Regularizers impose a smoothness inductive bias, and weight decay / L2 regularization happens to impose smoothness.

But at the end of the day, we do induction. We realized that this bias worked well in the past and impose it on new data.

Big diff between having a causal model (true relationship is smooth, so imposing that prior will lead to a more efficient search in function space) and just predicting that it will work well because it worked on past data (with no model for why it’s working) There are different inductive biases imposed by every form of regularization, which should be listed and maximized.

Dropout
1. Algorithmic. Cuts the signal for inputs to a network.
  1. Does dropout work for linear models? For trees? How to deal with it at test time?
Norm Penalties
1. L1 (Sharp)
2. L2 (Smooth) / Weight Decay
Model averaging
1. (the averaging step, not the step where variance is created through bagging, alternate parameterization, feature elimination (al rf & erf), etc.
Intelligent Initialization
Noise Injection
Early Stopping
Constraints on optimization
Train and test time data augmentation
Multi-task learning
1. Multi-class as multi-task
Pruning
Weight Sharing
Stochastic Optimization
All models? All Priors?

Source: Original Google Doc