DOI: 10.3389/fpsyg.2020.623237 One-Liner (thrice) Used features extracted by VGGish from raw acoustic audio against a SVM, Perceptron, 1NN; got 59.1\% classif. accuracy for dementia Then, trained a CNN on raw wave-forms and got 63.6\% accuracy Then, they fine-tuned a VGGish on the raw wave-forms and didn’t report their results and just said “we discovered that audio transfer learning with a pretrained VGGish feature extractor performs better” Gah! Novelty Threw the kitchen sink to process only raw acoustic input, most of it missed; wanted 0 human involvement. It seems like last method is promising. Notable Methods fine-tuning VGGish against raw acoustic waveforms to build a classifier via a CNN. Key Figs Their fancy network Its just a CNN afaik with much maxpooling; could have used some skipped connections. I wonder if it overfit? Their actual training results Looks generally pretty bad, but a run of their DemCNN seem to have gotten state-of-the-art results. Not sure where transfer training data went. New Concepts VGGish Notes Accuracy question According to this the state of the art at the time from pure audio was 56.6%? For a binary classifier isn’t that just doing nothing? So somebody did get better before?