Import AI 211: In AI dogfight, Machines: 5, Humans: 0; Baidu releases a YOLO variant; and the Bitter Lesson and video action recognition

by Jack Clark

Which is the best system for video action recognition? Simple 2D convnets, says survey:
… Richard Sutton’s ‘bitter lesson’ strikes again…
Researchers with MIT have analyzed the performance of fourteen different models used for video action recognition – correctly labeling something in a video, a generically useful AI capability. The results show that simple techniques tend to beat complex ones. Specifically, the researchers benchmark a range of 2D convolutional networks (C2Ds) against temporal segment networks (TSNs), Long-Term Recurrent Convolutional Neural Nets (LCRNs) and Temporal Shift Modules (TSMs). They find the simple stuff – 2D convnets – perform best.

The bitter lesson results: Convolutional net models “significantly outperform” the other models they test. Specifically, the Inception-ResNet-v2, ResNet50, DenseNet201, and MobileNetv2 are all top performers. These results also highlight some of the ideas in Sutton’s ‘bitter lesson‘ essay – namely that simpler things that scale better tend to beat the smart stuff. “2D approaches can yield results comparable to their more complex 3D counterparts, and model depth, rather than input feature scale, is the critical component to an architecture’s ability to extract a video’s semantic action information,” they write.
  Read more: Accuracy and Performance Comparison of Video Action Recognition Approaches (arXiv).

###################################################

Free education resources – fast.ai releases a ton of stuff:
…What has code, tutorials, and reference guides, costs zero bucks, and is made by nice people? This stuff!…
The terrifically nice people at fast.ai have released a rewrite of their fastai framework, bringing with it new libraries, as well as an educational course – practical deep learning for coders – as well as an O’Reilly book and a ‘Practical Data Ethics’ course.

Why this matters: fastai is a library built around the idea that the best way to help people learn technology is to make it easy for them to build high performance stuff while learning about things. “Fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable,” they write.
  Read more: fast.ai releases new deep learning course, four libraries, and 600-page book (fast.ai website).

###################################################

Amazon Echo + School = Uh-oh:
Mark Riedl, an AI professor, says on the first day of COVId-induced remote school for their 1st grader, the teacher read a story about a character named Echo the Owl. “It kept triggering peoples’ Amazon Echos,” Mark writes. “One of the echos asked if anyone wanted to buy a bible”.
    Read the tweet here (Mark Riedl, twitter).

###################################################

ICLR develops a code of conduct for its research community:
…Towards an AI hippocratic oath…
ICLR, a popular and prestigious AI conference, has developed a code of ethics that it would like people who submit papers to follow. The code of ethics has the following core tenets researchers should follow:
– Contribute to society and to human well-being.
– Uphold high standards of scientific excellence.
– Avoid harm.
– Be honest, trustworthy and transparent.
– Be fair and take action not to discriminate.
– Respect the work required to produce new ideas and artefacts.
– Respect privacy.
– Honour confidentiality. 

Ethics are all well and good – but how do you enforce them? Right now, the code doesn’t seem like it’ll be enforced, though ICLR does write “The Code [sic] should not be seen as prescriptive but as a set of principles to guide ethical, responsible research.” In addition, it says people that submit to ICLR should familiarize themselves with the code and use it as “one source of ethical considerations”.

Why this matters: Machine learning is moving from a frontier part of research rife with experimentation to a more mainstream part of academia (and, alongside this, daily life). It makes sense to try and develop common ethical standards for AI researchers. ICLR’s move follows top AI conference NeurIPS requesting researchers write detailed ‘broader impacts’ segments of their papers (Import AI 189) and computer science researchers such as Brent Hecht arguing researchers should try to discuss the negative impacts of their work (Import AI 105).
  Read more: ICLR Code of Ethics (official ICLR website).

###################################################

AI progress is getting faster, says one Googler:
Alex Irpan, a software engineer at Google and part-time AI blogger, has written a post saying their AI timelines have sped up, due to recent progress in the field. In particular, Alex thinks it’s now somewhat more tractable to think about building AGI than it was in the past.

AGI – from implausible to plausible: “For machine learning, the natural version of this question is, “what problems need to be solved to get to artificial general intelligence?” What waypoints do you expect the field to hit on the road to get there, and how much uncertainty is there about the path between those waypoints?,” Irpan writes. “I feel like more of those waypoints are coming into focus. If you asked 2015-me how we’d build AGI, I’d tell you I have no earthly idea. I didn’t feel like we had meaningful in-roads on any of the challenges I’d associate with human-level intelligence. If you ask 2020-me how we’d build AGI, I still see a lot of gaps, but I have some idea how it could happen, assuming you get lucky,” Irpan writes.
  Read more: My AI Timelines Have Sped Up (Alex Irpan, blog).

###################################################

Baidu publishes high-performance video object detector, Yolo-PP:
…After YOLO’s creator swore off developing the tech, others continued…
Baidu has published YOLO-PP, an object detection system. YOLO-PP isn’t the most accurate system in the world, but it does run at an extremely high FPS with reasonable accuracy. The authors have released the code on GitHub.

What does YOLO get used for? YOLO is a fairly generic object detection network designed to be run over streams of imagery, so it can be used for things like tracking pedestrians, labeling objects in factories, annotating satellite imagery, and more. (Its notable that a team from Baidu is releasing a YOLO model, as one can surmise this is because Baidu uses this stuff internally. In potentially related news, a Baidu team recently won the multi-class multi-movement vehicle counting and traffic anomaly detection components of a smart city AI challenge

What they did: YOLO is, by design, built for real world object detection, so YOLO models have grown increasingly baroque over time as developers build in various technical tricks to further improve performance. The Baidu authors state this themselves: “This paper is not intended to introduce a novel object detector. It is more like a receipt, which tell you how to build a better detector step by step.” Their evidence is that their PP-YOLO model gets a score of 45.2% mAP on the COCO dataset while running inference faster than YOLOv4.
  Specific tricks: Some of the tricks they use include a larger batch size, spatial pyramid pooling, using high-performance pre-trained models, and more.

YOLO’s strange, dramatic history: PP-YOLO is an extension of YOLO-v3 and in benchmark tests has better performance than YOLO-v4 (a successor to YOLO-v3 developed by someone else). Joseph Redmon, the original YOLO developer (see: YOLOv3 release im Import AI 88), stopped doing computer vision research over worries about the military and privacy infringing applications. But YOLO continues to be developed and used by others in the world, and progress in object detection over video streams creeps forward. (The development of PP-YOLO by Baidu and YOLOv4 by a Russian software developer provides some small crumb of evidence for my idea – written up for CSET here – that the next five years will lead to researchers affiliated with authoritarian nations originating a larger and larger fraction of powerful AI surveillance tech.)
  Read more: PP-YOLO: An Effective and Efficient Implementation of Object Detector (arXiv).
Get the code for PP-YOLO from here (Baidu PaddlePaddle, GitHub).

###################################################

DARPA’s AlphaDogFight shows us the future of AI-driven warfare:
In the battle between humans and machines, humans lose 5-0…
This week, an AI system beat a top human F-16 pilot in an AI-vs-Human simulated dogfight. The AI named ‘Heron’ won Five to Zero against a human named ‘Banger’. This is a big deal with significant long-term implications, though whether entrenched interests in the military industrial complex will adapt to this new piece of evidence remains to be seen.
  You can watch the match here, at about 4 hours and 40 mins into the livestream (official Darpa video on YouTube).

State knowledge & control limits: The competition isn’t a perfect analog with real-world dogfighting – the agent had access to the state of the simulation and, like any reinforcement learning-driven agent, it ended up making some odd optimizations that took advantage of state knowledge. “The AI aircraft pretty consistently skirted the edge of stalling the aircraft,” writes national security reporter Zachary Fryer-Biggs in a tweet thread. “The winning system from Heron did show one superhuman skill that could be very useful – it’s ability to keep perfect aim on a target.”

Why this matters: This result is a genuinely big deal. A lot of military doctrine in the 2010s (at least, in the West) has been about developing very powerful ‘Centaur’ human pilots who fuse with technology to create something where the sum of capability is greater than the constituent parts – that’s a lot of the philosophy behind the massively-expensive F-35 aircraft, which is designed as a successor to the F-16.
  Results like the AlphaDogFight competition bring this big bet into focus – are we really sure that humans+machines are going to be better at fighting than just machines on their own? The F-16 is a much older combat platform than the (costly, notoriously buggy, might kill people when they eject from the plane) F-35, so we shouldn’t take these results to be definitive. But we should pay attention to the signal. And in the coming years it’ll be interesting to see how incumbents like the Air Force respond to the cybernetic opportunities and challenges of AI-driven war.

Why this might not matter: Of course, the above result takes place in simulation, so that’s important. I think an analogy we could use would be if a top US fighterpilot battled against a foreign fighterpilot in a simulator and both had to use US hardware and the US person lost, people would get a bit worried and question their training methods and the strengths/weaknesses of their airforce curriculum. It could be the case that the system demoed here will fail to transfer over to reality, but I think that’s fairly unlikely – solving stuff in simulation is typically the step you need to take before you can solve stuff in reality, and the simulators used here are extraordinarily advanced, so it’s not like this took place in a (completely) unreal gameworld.
  Read more: AlphaDogFight Trials Go Virtual for Final Event (DARPA).
  Watch the video: AlphaDogfight Trials Final Event (YouTube).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

UK exam algorithm fiasco
High school students in the UK did not sit their final exams this summer. Since these grades partly determine acceptance to university, it was important that they still be graded in each subject. So the exam regulator decided to use an algorithm to grade students. This did not go well, and follows an earlier scandal where the International Baccalaureate organization used an algorithm to automatically grade 160,000 students by machine (went wrong, covered in Import AI 205).

The problem: Every year, teachers provide estimated grades for their students. Taking these predictions at face value was deemed unsatisfactory, since they have been shown to consistently overestimate performance, and to favour students from more advantaged backgrounds. So doing so was expected to unfairly advantage this cohort over past and future cohorts, and unfairly advantage more privileged students within the cohort. Instead, an algorithm was designed to adjust estimated grades on the basis of each school’s results over the last 3 years. The full details of the model have not been released.

What happened: 39% of results were downgraded from teacher predictions. Students from the most disadvantaged backgrounds were downgraded more than others; students from private schools were upgraded more than others. This prompted outrage — students protested, chanting “fuck the algorithm” at protests. After a few days, the regulator decided to grant students whichever was highest of their teacher-predicted and algorithm-generated grades.

Matthew’s view: This is a clear example how delegating a high-stakes decision to a statistical model can go wrong. As technology improves, we will be able to delegate more and more decisions to AI systems. This could help us make better decisions, solve complex problems that currently elude us, and patch well-known flaws in human judgement. Getting there will involve implementing the right safeguards, demonstrating these potential benefits, and gaining public confidence. The exam fiasco is the sort of sloppy foray into algorithmic decision-making that undermines trust and hardens public attitudes against technology.

Read more: How did the exam algorithm work? (BBC)

Read more: The UK exam debacle reminds us that algorithms can’t fix broken systems (MIT Tech Review).

 

Quantifying automation in the Industrial Revolution

We all know that the Industrial Revolution involved the substantial substitution of human labour for machine labour. This 2019 paper from a trio of economists paints a clear quantitative picture of automation in this period, using the 1899 US Hand and Machine Labor Study.

The dataset: The HML study is a remarkable data-set that has only recently been analyzed by economic historians. Commissioned by Congress and collected by the Bureau of Labor Statistics, the study collected observations on the production of 626 manufactured units (e.g. ‘men’s laced shoes’) and recorded in detail the tasks involved in their production and relevant inputs to each task. For each unit, this data was collected for machine-production, and hand-production.

Key findings: The paper looks at transitions between hand- and machine- labour across tasks. It finds clear evidence for both the displacement and productivity effects of automation on labour:

  • 67% of hand tasks transitioned 1-to-1 to being performed by machines and a further 28% of hand tasks were subdivided or consolidated into machine tasks. Only 4% of hand tasks were abandoned. 
  • New tasks (not previously done by hand) represented one-third of machine tasks.
  • Machine labour reduced total production time by a factor of 7.
  • The net effect of new tasks on labour demand was positive — time taken up by new machine-tasks was 5x the time lost on abandoned hand-tasks


Matthew’s view: The Industrial Revolution is perhaps the most transformative period in human history so far, with massive effects on labour, living standards, and other important variables. It seems likely that advances in AI could have a similarly transformative effect on society, and that we are in a position to influence this transformation and ensure that it goes well. This makes understanding past transitions particularly important. Aside from the paper’s object-level conclusions, I’m struck by how valuable this diligent empirical work from the 1890s, and the foresight of people who saw the importance in gathering high-quality data in the midst of this transition. This should serve as inspiration for those involved in efforts to track metrics of AI progress.
  Read more: “Automation” of Manufacturing in the Late Nineteenth Century (AEA Web)

###################################################

Tech Tales:

CLUB-YOU versus The Recording Studio
Or: You and Me and Everyone We Admire.

[A venture capitalist office on Sand Hill Road. California. 2022.]

The founder finished his pitch. The venture capitalist stared at him for a while, then said “okay, there’s definitely a market here, and you’ve definitely found traction. But what do we do about the elephant in the room?”
“I’m glad you asked,” he said. “We deal with the elephant by running faster than it.”
“Say more,” said the venture capitalist.
“Let me show you,” said the founder. He clicked to the next slide and a video began to play, describing the elephant – The Recording Studio.

The Recording Studio was an organization formed by America’s top movie, music, radio, and podcast organizations. Its proposition was simple: commit to recording yourself acting or singing or performing your art, and The Recording Studio would synthesize a copy of your work and resell it to other streaming and media services.
You’re a successful music artist and would like to branch out into other media, but don’t have the time. What are your options?, read one blog post promoting The Recording Studio.
“Sure, I was nervous about it at first, but after I saw the first royalty checks, I changed my mind. You can make real money here”,” read a testimonial from an actor who had subsequently become popular in South East Asia after someone advertized butter by deepfaking their Recording Studio-licensed face (and voice) onto a cow.
We’re constantly finding new ways to help your art show up in the world. Click here to find out more, read some of the copy on the About page of The Recording Studio’s website.

But artists weren’t happy about the way the studio worked – it was constantly developing more and more powerful systems that meant it needed less and less of an individual artist’s time and content to create a synthetic version of themselves and their outputs. And that meant The Recording Studio was paying lower rates to all but the superstars on its platform. The pattern was a familiar one, having been first proved out by the earlier success (and ensuring artistic issues) of Spotify and YouTube.

Now, the video changed, switching to show a person walking in a field, with a headsup display on some fashionable sunglasses. The over-the-shoulder camera angle shows a cornfield at the golden hour of sunset, with a floating head of Joseph Gordon Levitt floating in the heads up display. Now, you hear Joseph Gordon Levitt’s voice – he’s answering questions about the cornfield, posed to him by the goal. Next, the screen fades to black and the text ‘a more intimate way to get to know your fans’ appears. The screen fades to black, then the phrase CLUB-YOU appears on the screen. The video stops.

“CLUB-YOU is a response to The Recording Studio,” the Founder said. “We invert their model. Instead of centralizing all the artists in one place and then us figuring out how to make money off of them, we instead give the artists their ability to run their own ‘identity platforms’ where they can record as much or as little of themselves as they like, and figure out the levels of access they want to give people. And the word “People” is important – CLUB-YOU is direct-to-consumer: download the app, get access to some initial artist profiles, and use our no-code interface to customize stuff for your own needs. We don’t need to outthink The Recording Studio, we just need to outflank them, and then the people will figure out the rest – and the artists will be on our side.”

The venture capitalist leant back in his chair for a while, and thought about it a bit. Then his technical advisor drilled into some of the underlying technology – large-scale generative models, fine-tuning, certain datasets that have already been trained over creating models sufficient for single-artist customization, technologies for building custom applications in the cloud and piping them to smartphones, various encryption schemes, DMCA takedown functionality, and so on.

The VC and the founder talked a little more, and then the meeting ended and suddenly CLUB-YOU had $100Million dollars, and it went from there.

In the ensuing years, The Recording Studio continued to grow, but CLUB-YOU went into an exponential growth pattern and, surprisingly, attained a growing cool cache the large it got, whereas The Recording Studio’s outputs started to seem too stiled and typical to draw the attention of younger people. The world began to flicker with people orbited by tens to hundreds of CLUB-YOU ghosts. Some artists switched entirely from acting to recording enough of their own thoughts they could become the personal mentors to their fans. Others discovered new talents by looking at what their fans did – one radio host learned to sing, after seeing a viral video of a CLUB-YOU simulacra of themselves singing with a couple of teenager girls; they got good enough at singing that this became their full career and they dropped radio – except appearing as a guest.

The real fun began when, during CLUB-YOU’s second major funding round, the Founder pitched the idea of users being able to pitch business ideas to artists – the idea being that users would build things, discover stuff they liked, then license a portion of proceeds to the artist, with the artist having the ability to set deal-terms upfront. That’s when the really crazy stuff started happening.

Things that inspired this story: Deepfakes; the next few years of generative model development; ideas about how market-based incentives interact with AI.