×
all 19 comments

[–]gwern[S] 11 points12 points  (10 children)

[–]gwern[S] 10 points11 points  (9 children)

Current key points:

  • November was also AlphaStar
  • the stream is showing 2 of the 5 AS vs human replays (TLO & Mana), selected for interest; the rest will be available for download. There will be 1 live match against Mana with the newest AS.

    • Catalyst map, Protoss vs Protoss (Is it not trained on any other maps or match-ups? EDIT: apparently not, specialized individual NNs)
  • the discussion of map visibility/attention is confusing. What exactly is AS seeing and has access to?

  • 'relentless' is a word used very often of AG, OA5, and AS, I've noticed

  • architecture: 'AlphaStar League' sounds like PBT? and then DM's 'Nash' stuff is used to select a subset of the best least-exploitable agents

    Compute: 3 wall-clock days for imitation learning (very roughly human-level results, Vinyals says?); 7 days for the 'AlphaStar League'. Agents get ~200 years of SC2 samples to finetune in the league, so perhaps can roughly estimate total compute from how many individuals you need in PBT... EDIT: Silver says 16 TPUs roughly equivalent to 60 GPUs for training (the NN itself presumably, with a lot more CPU cores for the SC2 environment workers)

    As expected, a relatively small NN - ~50ms forward pass on a GPU, runnable on a desktop in realtime.

  • A short history of the AS development with the matches with TLO & Mana: https://youtu.be/UuhECwm31dM

  • There is an ongoing AmA with Vinyals et al; they will answer questions tomorrow, so think'em up and type'em in.

  • Interesting to compare reactions to the two sets of games. Most watchers were not that impressed by the TLO shutout, pointing out he was barely in the top 100 SC2 players and playing the wrong race anyway, but seemed to be much more impressed by beating Mana; on the other hand, I was very impressed by beating TLO (because it meant the approach worked) and was unimpressed by beating Mana because all that really meant was dumping some more compute into training & maybe tweaking it some.

    From my point of view, people are vastly overrating the small absolute differences between individual human players and underrating the immense amount of work which goes into providing an approach which works at all to reach human level, after which it takes a lot less work to surpass human level... I suppose this is an example of "the narcissism of small differences" - to a sheep, other sheep look very distinct.

EDIT: current DM writeup: https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/


Matches

September 2018, TLO results: 5-0 AS (note: TLO is not a Protoss specialist, on the other hand, that was an earlier AS playing)

  1. swift AS victory over TLO, 1-base push after enduring TLO attacks on workers
  2. AS victory, carriers
  3. AS victory, massively heavy on disruptor units
  4. AS victory
  5. AS victory

December 2018, Mana results: (Mana is a Protoss specialist, playing against a further improved AS) 5-0

  1. AS victory
  2. AS victory
  3. AS victory
  4. AS victory; epic stalkers vs immortals battle at the end, but didn't work out for Mana...
  5. AS victory; described as especially bizarre

January 2019, Mana exhibition match: 0/1, AS lost.

  1. The one loss is interesting. What went wrong there? Did I imagine the same issues as OA5 and AG's delusions?

Total: 10/11. Wow.

[–][deleted]  (5 children)

[deleted]

    [–]gwern[S] 6 points7 points  (0 children)

    At least one reason to not stream them all live is becoming apparent - it'd be way longer to sit through 10 full games, assuming the players could even do them back to back, and there would be less opportunity for commentary.

    [–]aquamarlin391 2 points3 points  (3 children)

    The rate may be the same, but those Mana PoV clips show that AlphaStar does not use the camera like a human: no scrolling (which is inefficient), ability to constantly swap between multiple locations even in the heat of battle.

    [–][deleted]  (2 children)

    [deleted]

      [–]gwern[S] 3 points4 points  (0 children)

      It's confusing because apparently the camera setup changed between versions and it's unclear exactly how much it had to learn for each one. Hopefully the paper will clear things up. Still, compared to OA5 getting the whole raw visible map encoded for it, I think we can agree that it makes the victories all the more impressive.

      [–]aquamarlin391 1 point2 points  (0 children)

      Very interested in seeing how the model performs with 1st PoV camera.

      EDIT: Mana 1-0

      AlphaStar cannot deal with warp prism drop harass.

      [–][deleted]  (1 child)

      [deleted]

        [–]gwern[S] 0 points1 point  (0 children)

        After that, Mana attacked AlphaStar's base with its entire army and as the commentators said something like: "Where are AlphaStar's units?" I dunno what it did there with its army.

        Yeah, I noticed that, and then I think Mana saw a whole set of AS units just go by not doing anything, and that was weird.

        [–]aquamarlin391 8 points9 points  (2 children)

        STALKERS ARE ALL YOU NEED

        [–]aquamarlin391 2 points3 points  (1 child)

        Curious how unit selection is done. Insane stalker micro.

        [–]hyperforce 6 points7 points  (0 children)

        With superhuman micro, ranged unit are probably overfit for their mobility and opportunity to attack (kiting). This feels similar to OpenAI favoring ranged nuke champs over melee ones.

        [–]djangoblaster2 1 point2 points  (2 children)

        At one point he said 50ms response time. But earlier in the same livestream David Silver said 350ms response time.

        [–]tihokan 4 points5 points  (0 children)

        Yeah that could have sparked some confusion, my understanding is that the feedforward pass through the network is 50ms, but they add some extra delay to ensure it doesn't have completely super-human reactions, resulting in total 350 response time in total.

        [–]Roboserg 0 points1 point  (0 children)

        350 ms on average, 50 ms inference time, read the DeepMinds blog

        [–][deleted] 1 point2 points  (0 children)

        Jan Leike had actually brought it up about a year ago during an interview...

        https://www.reddit.com/r/reinforcementlearning/comments/850kgl/jan_leike_dmfhi_interview_on_ai_safety_research/dvtt7e0

        [–]aquamarlin391 0 points1 point  (2 children)

        lol they will only show replays?

        big disappointment

        [–]gwern[S] 6 points7 points  (0 children)

        Nope, they're doing one live match with Mana against the latest AS, they just said.

        [–]aquamarlin391 1 point2 points  (0 children)

        Attention is applied on the whole map. Insane camera control.

        [–]physixer 0 points1 point  (0 children)

        Could someone update on the DeepMind Starcraft II tech timeline?

        I know they had some success last year, but there was some qualification (like the AI did well on PvP but not teams or something).