We are Oriol Vinyals and David Silver from DeepMind’s AlphaStar team, joined by StarCraft II pro players TLO and MaNa! Ask us anything

NikEy · 2019-01-25T16:17:13+00:00

Hi guys, really fantastic work, extremely impressive!

I'm an admin at the SC2 AI discord and we had a few questions in our #research channel that you may hopefully be able to shed light on:

From the earlier versions (and in fact, the current master version) of pysc2 it appeared that the DM development approach was based on mimicking human gameplay to the fullest extent, e.g. the bot was not even able to get info on anything outside of the screen-view. With this version you seemed to have relaxed these constraints, since feature layers are now "full map size" and new features have been added. Is that correct? If so, then how does this really differ from taking the raw data from the API and simply abstracting them into structured data as inputs for the NNs? The blog even suggests that you take raw data and properties directly as data in list form and feed it into the NNs - which seems to suggest that you're not really using feature layers anymore at all?
When I was working with pysc2 it turned out to be an incredibly difficult problem to maintain knowledge of what has been built, is in-progress, has completed, and so on, since I had to pan the camera view all the time to get that information. How is that info kept within the camera_interface approach? Presumably a lot of data must still be available in full via raw data access (e.g. counts of unitTypeID, buildings, etc) even in camera_interface mode?
How many games needed to be played out in order to get to the current level? Or in other words: how many games is 200 years of learning in your case?
How well does the learned knowledge transfer to other maps? Oriol mentioned on discord that it "worked" on other maps, and that we should guess which one it worked best on, so I guess it's a good time for the reveal ;) In my personal observations AlphaStar did seem to rely quite a bit on memorized map knowledge. Is it likely that it could execute good wall-offs or proxy cheeses on maps that it has never seen before? What would be the estimated difference in MMR when playing on a completely new map?
How well does it learn the concept of "save money for X", e.g. Nexus first. It is not a trivial problem, since if you learn from replays and take the non-actions (NOOPs) from the players into account, the RL algo will more often than not think that NOOP is the best decision at non-ideal points in the game. So how do you handle "save money for X" and do you exclude NOOPs in the learning stage?
What step size did you end up using? In the blog you write that each frame of StarCraft is used as one step of input. However, you also mention an average processing time of 50ms, which would exceed real time (which requires < 46ms given 22.4fps). So do you request every step, or every 2nd, 3rd, maybe dynamic?

I have lots more questions, but I guess I'll better ask these in person the next time ;)

Thanks!

gwern · 2019-01-24T21:42:32+00:00

what was going on with APM? I was under the impression it was hard-limited to 180 WPM by the SC2 LE, but watching, the average APM for AS seemed to go far above that for long periods of time, and the DM blog post reproduces the graphs & numbers mentioned without explaining why the APMs were so high.
how many distinct agents does it take in the PBT to maintain adequate diversity to prevent catastrophic forgetting? How does this scale with agent count, or does it only take a few to keep the agents robust? Is there any comparison with the efficiency of the usual strategy of historical checkpoints in?
what does total compute-time in terms of TPU & CPU look like?
the stream was inconsistent. Does the NN run in 50ms or 350ms on a GPU, or were those referring to different things (forward pass vs action restrictions)?
have any tests of generalizations been done? Presumably none of the agents can play different races (as the available units/actions are totally different & don't work even architecture-wise), but there should be at least some generalization to other maps, right?
what other approaches were tried? I know people were quite curious about whether any tree searches, deep environment models, or hierarchical RL techniques would be involved, and it appears none of them were; did any of them make respectable progress if tried?

Sub-question: do you have any thoughts about pure self-play ever being possible for SC2 given its extreme sparsity? OA5 did manage to get off the ground for DoTA2 without any imitation learning or much domain knowledge, so just being long games with enormous action-spaces doesn't guarantee self-play can't work...
speaking of OA5, given the way it seemed to fall apart in slow turtling DoTA2 games or whenever it fell behind, were any checks done to see if the SA self-play lead to similar problems, given the fairly similar overall tendencies of applying constant pressure early on and gradually picking up advantages?
At the November Blizzcon talk, IIRC Vinyals said he'd love to open up their SC2 bot to general play. Any plans for that?
First you do Go dirty, now you do Starcraft. Question: what do you guys have against South Korea?

dojoteef · 2019-01-24T22:53:49+00:00

1) Are there any plans to train an agent using only pixel inputs and physical mouse/keyboard actions now that you have demonstrated AlphaStar's current ability against professional players? 2) Have you gained any insights from this experience that you believe translates to other reinforcement learning problems where humans interact with AI-controlled agents?

4567890 · 2019-01-24T23:15:07+00:00

For the pro players, say you are coaching AlphaStar. What would you say are the best and worst aspects of its game? Do you think its victories were more from decision making or mechanics?

celeritasCelery · 2019-01-24T22:55:38+00:00

How does it handle invisible units? Human players can see the shimmer if they are looking really close. But if AI could see that, invisibility would be almost useless. However if it can't see them at all, it seems it would give a big advantage to mass cloaked unit strategies, since an observer would have to present to notice anything.

DreamhackSucks123 · 2019-01-24T23:57:55+00:00

Many people are attributing AlphaStar's single loss to the fact that the algorithm had restricted vision in the final match. I personally dont find this to be a convincing explanation because the warp prism was moving in and out of the fog of war, and the AI was moving its entire army back and forth in response. This definitely seemed like a gap in understanding rather than a mechanical limitation. What are your opinions about the reason why AlphaStar lost in this way?

Inori · 2019-01-25T00:39:29+00:00

Your agent contains quite a number of advanced approaches, including some very unconventional such as the transformer body. What was the process of building it like, e.g. was every part of the agent added incrementally, improving the overall performance at each step? Were there parts that initially degraded the performance, and if yes then how were you able to convince others (yourself?) to stick with it?
Speaking of the transformer body, I'm really surprised that essentially throwing away the full spatial information worked so well. Have you given any thought as to why it worked so well, relative to something like the promising DRC / Conv LSTM?
What is the reward function like? Specifically, I'm assuming it would be impossible to train with pure win/loss, but have you applied any special reward shaping?

Very impressive work either way! GGWP!

Bleyddyn · 2019-01-24T22:52:30+00:00

Are there any areas you would recommend for ML/RL hobbyists to focus on? Areas where it might be possible to make useful contributions without having more than desktop level compute resources?

TovarishGaming · 2019-01-24T23:17:07+00:00

First of all, thank you for your hard work and for being a part of today's awesome event!

@Deepmind team: We saw AlphaStar do some Blink Stalker micro today that everyone seemed to agree was simply above-human possibility. Do you expect to see this with other races? I imagine Zerg spreading creep tumors exactly every 4 seconds will lead to insane creep spread or things like that. What are you most excited to see?

@TLO: you said initially that you were still confident you would win while playing as Zerg. After seeing Mana's match today, and knowing that Deepmind will continue to learn exponentially, do you still feel confident in your rematch?

EnderSword · 2019-01-24T23:27:13+00:00

I was wondering if you've seen the AI Agents show any signs of 'deceiving' its opponent, through hiding buildings, cancelling something after it was scouted, giving a false impression of an attack in one area or mimicking a build order only to change it etc...?

kroken81 · 2019-01-24T22:52:16+00:00

How large is the "memory" of alphastar, how much data does it have to draw from while playing?

Hinrek · 2019-01-24T23:42:35+00:00

Hey guys! I was in awe while watching the stream, amazing stuff! Are you considering to release any AlphaStar vs. AlphaStar games?

denestra · 2019-01-24T22:49:06+00:00

@Deepmind team: Will we be able to play against AlphaStar at some point in the future?

harmonic- · 2019-01-24T23:11:35+00:00

Agents like AlphaGo and AlphaZero were trained on games with perfect information. How does a game of imperfect information like Starcraft affect the design of the agent? Does AlphaStar have a "memory" of its prior observations similar to humans?

p.s. Huge fan of DeepMind! thanks for doing this.

mlearner13 · 2019-01-24T22:44:34+00:00

Will you cap the next iterations to more human like capabilities?

4567890 · 2019-01-24T23:07:47+00:00

Several times you equate human APM with AlphaStar's APM. Are you sure this is fair? Isn't human APM inflated with warm-up click rates, double-entering commands, imperfect clicks, and other meaningless inputs? Meanwhile aren't all of AlphaStar's inputs meaningful and super accurate? Are the two really comparable?

The presentation and blog post references "average APM," but isn't burst APM something worth containing too? I would argue Human burst APM is from meaningless input, while I suspect AlphaStar's burst APM is from micro during the heavy battle periods. You want a level playing field and a focus on decision making, but are you sure AlphaStar wasn't using its burst APM and full map access to reach superhuman levels of unit control for short periods when it mattered most?

SwordShieldMouse · 2019-01-24T21:34:18+00:00

What is the next milestone after Starcraft II?

p2deeee · 2019-01-25T01:38:40+00:00

Could we see the visualization for the game Mana won? Would be interesting to see its win probability evaluation

did you ever consider a "gg" functionality when win probability <1%?

weiqiplayer · 2019-01-24T22:42:47+00:00

How long until AlphaStarZero (training from scratch without imitation learning) comes out?

Firefritos · 2019-01-24T22:50:32+00:00

Was there any particular reason Protoss was chosen to be the race for the AI to learn?

SeriousGains · 2019-01-24T23:38:13+00:00

@Deepmind team: We didn't see much of Alphastar's ability to use AoE caster spells like the oracle's statis ward, high templar's storm or sentry's force field. Is this a greater challenge to teach the AI or have these strategies been deemed inefficient through Alpha league play tests? Also, will we ever get to see an unlimited APM version of Alphastar?

@TLO: Do you think caster units have the potential to give AI's a greater advantage over human players? For example would Alphastar playing as zerg be even more difficult to beat with ravagers, infestors and vipers?

SoulDrivenOlives · 2019-01-24T23:26:29+00:00

I got the impression that different agents trained in the Alphastar league develop strategies that are quite inflexible. It seemed that the agents were unable to switch to a different composition in the middle of a game. I also noticed that the agents did not play in a reactionary style but rather wanted to be the aggressor and dictate the pace of the game. The progress of Deepmind's Starcraft team left me quite speechless but also somewhat disappointed in the seeming brittleness and hard-codedness of the agents.

To Orio Vinyals and David Silver: Are my impressions wrong? Are the alphastar agents able to change unit compositions mid-game as a reaction to new information they scout during the game? If not, what are your plans to allow that kind of long term strategical thinking to develop? You mentioned during the stream that the different agents learn independent of another and therefore don't share knowledge. This seems quite inelegant. How satisfied are you with the current training methods of Alphastar? I can't quite put my finger on it but something about the league seems clumsy.

To Mana and TLO: Do you think that Alphastar's unwillingness to switch compositions was sub-optimal play? How do you think the agent would deal with a late game zerg tech switch after a big fight?

Bonus question to Orio Vinyals and David Silver: Any plans on implementing a handicap on Alphastar's mouse accuracy? A perfect army control both in intent and execution will win most games even if the agent's play is not smart. Perhaps Deepmind should differentiate between Alphastar's intent and actual execution in some way? Maybe it could try to do the correct thing but be unable to execute with a 100% precision? It would act much more human that way.

ogs_kfp_t · 2019-01-24T23:00:07+00:00

During the Go match- commentators said: “it’s pretty close” while AlphaGo thought 70% win.

I had a deja vu.

Was that indeed repeated today?

Mogarfnsmash · 2019-01-24T23:53:12+00:00

For Blizzcon 2019, is it possible that we will have Pro players vs Alphastar show matches on the main stage? Or will the technology not be there yet?

sluggathugga · 2019-01-25T16:38:04+00:00

For MaNa,

You adopted the mass probe strategy for the final game. Did you practice it against humans first, or was it a decision just for this game? Do you think it worked in your favour (I can't help but imagine yes, given the level of Oracle harass)? And will you be using it in PvP?

LolThisGuyAgain · 2019-01-24T22:44:04+00:00

Loved the games tonight!

Was just wondering how the AI uses vision. For example, if it sees a glimmer moving across, is it able to recognise that it is a cloaked unit, or if it sees a widow mine pot hole, can it recognise that there's a widow mine there without detection (given enough training)? I guess i wanna know what input it receives

JesusK · 2019-01-25T02:07:28+00:00

AlphaStar seemed to end up going for the same group of units that it could abuse with perfect Micro, while the average was in the 300-400, during some micro intensive moments it would spike heavily and control in inhuman ways. Also when talking about APM you have to remember most APMs a player does are spam to check for information rather than micro.

While it was still doing decisions on how to proceed based on partial information, it was clear that it relied heavily on this micro units and this seemed to be the norm, so it was a lot less of adapting to what the opponent was doing, or countering unit compositions and more of checking if they could win with the stalker army they had.

Thus, I was wondering if you considered heavily limiting the APM, in an attempt promote the AI into going for more tactical maneuvers and builds instead.

Even more, if you could train an AI to play at lets say only 100 APM, and then drop it in the league with the the AlphaStar we saw, it would need to come up with different approaches to win games given that it cannot just compete in a stalker vs stalker game thus promoting more tactical adaptation from AlphaStar.

Is this even possible or in consideration trying to push the AI into paths that do not allow it to abuse micro?

asgardian28 · 2019-01-24T21:35:29+00:00

Thanks for combining my 2 favorite hobbies Sc2 and machine learning, awesome and congrats!

With AlphaZero you demonstrated on chess and go that the computer playing itself only, without being biased by human games, was yielding a superior agent.

Still wilt AlphaStar, you started with imitation learning (presumably to get some baseline agent that doesnt pulls all its workers from mining for instance)

Do you intend to develop an AlphaStarZero, removing the imitation learning phase, or is it a necessary phase in learning Sc2?

SuccessfulPackage · 2019-01-24T22:56:38+00:00

What would be your estimate of the time for the AI to learn the other match ups and other map ?
Would it be interesting to put the AI in a situation she never faced before (new map with weird layout, etc) to see if the previous experiences would be of any use to her ?
Do you think one day we will be able to create a model that enable AI to conceptualize the game based on his experience? (would be cool to see if AI develop bias too)
If we were to flood AI with stupid strategy (drone rushing all the time) would it mess up with weights and make the AI dumber?

Good luck for the rest ! May the peak of humanity evolution Serral guard our race !

TY for your work :3

iMPoopi · 2019-01-24T23:01:04+00:00

Hi, congrats for the amazing work so far!

I have a few questions regarding the latest game (vs MaNa, exhibition game on the camera interface):

It seems that the AI was ahead after the initial harass, and after dealing with MaNa two zealots counter harass (a lot more income, superior army value albeit maybe not as strong composition).

Do you think that if you were to replay the game again and again from that point in time (in human time since MaNa would have to play again and again as well), the agent in its current iteration would be able to win at least one game? The majority of the games? Or MaNa could find "holes" at least one time, or even again and again?

Do you think your current approach would be able to decide that making a phoenix to defend against the warp prism harass would be better than keep making oracles?

Does the agent only try to maximize the global probability of winning, and only makes decision based on that, or can he isolate certain situations (for example the warp prism harass) as critical situations that needs to be sorted in an entirely different way than the game overall.

For example are you able to know if the AI kept queuing oracles because overall it'll still help win the game, or if it made oracle because it was the "plan" for this game?

How high level can the explainability of your agent go?

Thank you for your time, and congratulations again for the amazing work in our beloved game and in machine learning.

QuestionMonkey · 2019-01-24T23:19:07+00:00

Would you ever consider open-sourcing your code?
In the visualization you showed, the times when the white arrow appeared (and the agent was "looking at" the game) didn't follow a consistent rhythm. As I understand LSTM networks, each set of layers in the network corresponds to a given "time step". What defines these time steps in AlphaStar? Are there certain things that define a new time step?

emanuelpalomar · 2019-01-24T23:34:39+00:00

What happened in the live game? When MaNa harassed AlphaStar's base, it just walked back and forth aimlessly with it's Stalkers, it looks like it's very brittle when pushed outside its training distribution even a little bit and doesn't really understand the game?

Since adverserial attacks (pun intended) like this are a common problem with neural networks, how do you intend to address this weakness?

koglerjs · 2019-01-25T05:55:48+00:00

Sorry for the wall of text. It starts with a sincere, very simple question:

Why didn't you cast game 5 vs MaNa?

It was an extraordinary demonstration of AlphaStar's capabilities. It:

tried to gas steal
multiple times!
essentially ran a distraction, successfully, with the single pylon
feigned a two-base build back home (or just looked like he wanted to get rushed)
all while displaying a very weird very aggressive totally new proxy build, because I defy you to find a pro game where someone built such an aggressive stargate and it is arguably the perfect place to put the stargate if you think it through long enough.

You can't ask AlphaStar if that's what it intended to do, and it could just be random noise, but every other game AlphaStar built its base high ground. Low ground cyber core and gateway says second base or "rush me" but it also helps the proxy because you're skipping warpgates because you reeeaally want that robo up ASAP and then you need gas for immos, so every bit of walking you can save is a bonus. MaNa doesn't know that this build does not allow him the time for those low ground buildings to be rushed, so he just sees something weird, twice. Pylon, low ground. What could it be? He gets a full scout and should suspect a proxy, but you have to kill the pylon. That's just a standard response. I am nobody and know nothing about 1s but I think that 99% of pros would have killed the pylon with both stalkers, and the bar for tricks is: do they work once?

I think Tasteless and Rotterdam would have flipped their shit if they'd seen this game on air. The stargate??? AlphaStar put down a stargate?? There? If they haven't seen it yet, tell them I want them to cast their first time watching it. Please. This stargate... it's creative. A person who executed this build in a high-profile match would be rightly termed imaginative.

Try not to be sad, but the game is going to change. Computer provided builds for human players will change this game. I thought I was prepared for DeepMind to be good at StarCraft, but I didn't realize what it would mean. I don't think AI is going to take over the world exactly, but it might be a bit like a cheat code for knowing things when we no longer discover things ourselves.

Heck, if you're any sort of streamer, go watch it and stream it now, and don't read anymore.

Today was historic for many reasons, but I wouldn't be surprised if Game 5 got overlooked at first. "Micro was like we expected it to go! Build orders are working! Wait you didn't restrict the map access?" Really cool stuff, but Game 5 is a first.

Watch that game and tell me if I'm wrong, I'm an idiot barely plat because I need something in my life I'm actively against being a tryhard at. Because the Robo Stargate 1-2 Punch (AlphaStar can't name it, so I can try) has some amount of nuance to it. The thing about these kinds of builds is they only work reliably once. They're like 0-day exploits, sorta. They get known in competitive play and people learn how to play against it.

Your machine came up with a new build. And it surprised MaNa with it... and it tricked him. It just did what was optimal: in games where it built that pylon, other AIs fought that pylon first, and won the time for the pylon to be up and the zealot and stalker to arrive. It doesn't know that AIs are known for doing dumb things like building pylons where they shouldn't, it couldn't possibly try to take advantage of that fact.

A possible counter to this build is to never use two units to kill a pylon built right in your face, otherwise the 3rd proxy pylon gets up in your natural and shield batteries are keeping a small handful of units alive while they're fighting you on your doorstep. MaNa even forces a cancel on some batteries in his natural. But if you scout it sooner, you're still going to have to contend with a zealot and a stalker covered by batteries, they'll just be further away. Those batteries don't get much use in Game 5, which means they're arguably waste, which means AlphaStar was prepared to fight there. This build as countered by MaNa has unused cushion.

I don't know if I have a question for you, but I have a question for MaNa: What did you think when that phoenix came out on top of your prism?

To do a Tastosis thing, "cover the name up and tell me who's playing," and I would have said "Perhaps not even sOs is bold enough to build a stargate in the face of his opponent like that." The only play I've seen with this degree of killer instinct was when Maru decided you were dead, as he would if you got really lucky and beat him in the first match of a best of 3, and then you faced his proxy twice, and the best of 3 was over. This is like an sOs build that Maru decided to use. It's too weird for sOs--too weird for sOs!--and so he'd say "Maru, I can't use this, people would laugh at me" and Maru would play one game as Protoss.

I've rewatched the replay a number of times. Everyone go watch it. Tell me I'm wrong. I don't know that much, I like 4s and I like area effect damage and getting good at 1s would just take too long. The proxy could have been scouted. Trick builds don't work if they're scouted, that's why they're tricks, but we haven't seen this build tested yet.

But I tell you that stargate was as deliberate as AlphaStar can plan to be. The phoenix prevented warp prism drop play from keeping the assault from ending. Void ray is a nice touch for any stalkers which come out.

People save builds like this to pull out in the high pride tournaments where more than just money is on the line. GSL quals are supposedly secret to keep a lid on builds like these. They're like 0-day exploits. Maybe this isn't as astoundingly good a build as I think it is, maybe now that it's an option, people can handle it, because that's the thing about these builds.

Once you see the trick, it doesn't work anymore. But now the work of figuring out how to beat this build begins. For at least a few days, the Masters league should be a bloodbath of people attempting the trick. "Why are you building a pylon in my face?" It's almost rude. But it has a purpose, and that purpose is literally to occupy your units while the build encroaches closer to your throat.

It might not adjust the meta. Probably won't. Meta's still settling a bit anyway from the patch.

But here's my point, and here's why I wrote all of this.

AlphaStar can write and demonstrate a fantastically complicated trick build. This build might not enter into commonly faced threats, but some build from one of your machines will soon do so, and it will adjust the meta.

And after that DeepMind will simply define the meta.

And it's just not going to be the same.

P.S. It's more fitting to call it the AlphaStargate. Congratulations on your success. This was cool to see.

Cock-tail · 2019-01-24T21:16:27+00:00

So there was an obvious difference between the live version of AlphaStar and the recordings. The new version didn't seem to care when its base was being attacked. How did the limited vision influence that?
The APM of AlphaStar seems to go as high as 1500. Do you think that is fair, considering that those actions are very precise when compared to those performed by a human player?
How well would AlphaStar perform if you changed the map?
An idea: what if you increase the average APM but hard cap the maximum achievable APM at, say, 600?
How come AlphaStar requires less compute power than AlphaZero at runtime?

Felewin · 2019-01-24T22:45:44+00:00

Is it possible that, by learning via mimicry of human replays, the AI is gimped – biased by learning bad habits? Or will it ultimately overcome them, given enough experience, meaning that the human replays are more useful than starting from scratch as a way of jumpstarting the learning process?

Dinok410 · 2019-01-25T00:37:18+00:00

While most of the focus seems to be on the visual aspects of the game, I'm curious if Alpha uses any form of sound recognition to help assess the situation. At least for me as a player I've encountered several situations in which sounds actually let you react faster than sight, be it the sound of units firing or dying, a medivac dropping units on the corner of your screen, or even the basic announcer stuff. So, does it utilise sound in any way or, if not, are there plans to implement it in the future?

TheSkunk_2 · 2019-01-25T02:51:39+00:00

Incredibly impressive and entertaining showcase. Congratulations!

Question 1: Distinct Agents

Is there a plan to move away from distinct, separate agents that the team curates (or randomly choses) to play in a specific order for a series? In my opinion it detracts from the accomplishment. I think the final live match against Mana is a good example of a flaw of this approach: other agents frequently used Phoenix, but because this particular agent is separate and distinct, it only built Stalkers and Oracles and never built a Phoenix to handle the Warp Prism.

Part of being a professional SC2 pro-gamer is mind-gaming your opponent and deciding which builds to play in which maps in a series. Some of the most ballsy SC2 pros have had to make the incredibly difficult decision do to an incredibly risky cheese in the deciding match of a series. The AI consciously deciding what builds to use on what maps of the series would be truly impressive.

Similarly, it seems to have less ability to switch up builds this way. In a real match, a player might initially have one plan, but decide to be cheesy if they scout their opponent being greedy, or decide to stop cheesing if it's scouted. However, by hand-picking Agents that were incentivized to develop specific builds, developers are essentially hand-picking the AI play-style before the match. The ramifications of this can be seen in many of the show-matches where the AI stuck with it's play-style even when it was not a good idea to do so.

Question 2: Short-Term Memory

In one of the matches, the AI can be seen using Phoenix to lift up a Stalker from Mana's advancing army, and then dropping it when it realized the rest of the army is coming. It did this repeatedly, wasting tons of valuable phoenix energy. This made me wonder if the AI has any kind of short-term memory -- does it literally forget the army exists as soon is it goes out of vision? Do you have any other comment on this particular mishap?

Question 3: Future Improvement

What are the DeepMind team's goals for fixing the AI's current weaknesses? You could simply use the current model, but train it longer - months instead of weeks - and beat stronger players, but this (to me) would be a disappointing approach. Are future goals to learn new maps, new races, new match-ups, and simply brute-force train the Agent(s) to until they can beat the raining world champion, or are there plans to to adjust the agent on a systematic level to shore up it's weaknesses in scouting, map vision, adaptability, and make it less reliant on individual separate agents and superhuman micro/multitasking? Is a version that starts from ground zero instead of using imitation learning planned?

Question 4: I Want to Play It

You'll here this a lot -- the SC2 community wants to play DeepMind themselves. I believe you expressed a desire to make this happen, so I wanted to frame this more specifically: what hurdles do you see preventing this from happening? For us laymen, what technical challenges are involved in say, publishing a separate ladder or game client where players can play against a changing rotation of DeepMind agents? If the main obstacle is the added development costs and time needed to make this happen, has DeepMind and Blizzard considered something like a WarChest to fund the inclusion of DeepMind client/ladder?

Question 5: Mana and TLO's Thoughts

Congratulations on the show! I would love for either of you to write a more detailed blog about your experience and post it to teamliquid.

My question is what you think of AlphaStars play in retrospect. Do you see abusable facets of it's play now (in hindsight) that you didn't originally see during the match? Do you agree with some fans that it's micro/multitasking (particularly the 1,500 APM 3-pronged Stalker surround micro) is unfair and needs to be limited more? How much of the AI's success would you attribute to the sheer unexpectedness of it's play-style and the general unfamiliarity of the play environment (TLO not being aware he was facing distinct agents each match, the fact that you both had to play on an old patch) and how much of it is the inherent strategy/play-style of the agent?

legoboomette · 2019-01-24T22:51:28+00:00

Why don't you continue training the agents after these exhibitions? Like with AlphaZero as well, it would be interesting to know just how good they would get after more than just a few hours or a week of training.

Imnimo · 2019-01-24T23:44:20+00:00

Many of DeepMind's recent high-profile successes have demonstrated the power of self-play to drive continuous improvement in agent strength. In competitive games, the intuitive value of self-play is clear - it provides an opponent of an appropriate difficulty which never gets too far ahead or falls too far behind. I'm curious about your thoughts on applying the self-play dynamic to cooperative games such as communication learning and mutli-agent coordination tasks. In these settings, is there an additional risk of self-play leading to convergence to trivial or mediocre strategies, due to the lack of a drive to exploit an opponent and avoid being exploitable? Or could a self-play system like AlphaZero be slotted into a cooperative setting pretty much as-is?

LetoAtreides82 · 2019-01-24T23:46:19+00:00

Thank you for demonstrating your project to us, I very much enjoyed it. I was a former StarCraft 2 player and often I had dreamed of what it would be like to have an AI master the game. I have a few questions:

Can we expect another demonstration soon, perhaps another 5 games against Mana with a much improved version of the camera interface agent?
Are there plans to challenge one of the current top three players in the world in the near future?
Would it be possible to make a version of AlphaStar that can play on the StarCraft 2 ladder against actual players? It'd be interesting to see how high of an MMR it can achieve on the ladder.

DeadWombats · 2019-01-25T00:54:21+00:00

One moment that particularly impressed me was when AlphaStar decided it was losing a fight and recalled its army to prevent more losses.

How does AlphaStar gauge confidence during a battle? What circumstances does it consider before deciding to pursue or retreat?

OldConfusedMan · 2019-01-24T23:16:28+00:00

Fantastic accomplishment today; have to admit I was cheering for AlphaStar! I work in machine translation; can you explain a bit about the natural language techniques you borrowed for your AlphaStar, and what innovations you believe could flow back to machine translation

fhuszar · 2019-01-25T09:22:59+00:00

Isn't the unit-level micro-management aspect inherently unfair in favour of computers in StarCraft?

In Go, any sequence of moves AlphaGo makes, Lee Sedol can easily imitate, and vice versa. This is because there is no critical sensorimotor control element there.

In StarCraft, when you play with a mouse and keyboard, there is a motor component. Any sequence of moves that a human player makes, AlphaStar can "effortlessly" imitate, because from its perspective it's just a sequence of symbols. But a human player might struggle to imitate an action sequence of AlphaStar, because a particular sequence of symbols might require unreasonable or very difficult motor sequence.

The metaphor I have in mind here is playing a piano: keystrokes-per-minute is not the only metric that describes the difficulty of playing a particular piece. For a human, hitting the same key 1000 times is a lot easier than playing a random sequence of 1000 notes. From a computer's perspective, hitting the same key 1000 times and playing a random sequence of 1000 notes is equally difficult from an execution standpoint (whether you can learn the sequence or not is besides my point now)

Nostrademous · 2019-01-25T02:40:10+00:00

Any plans to release a full write-up of the algorithms and their inter-connections and inter-relations?
Seems 20-year research is new again all over the place (i.e., your work on Relational Deep RL for example), what do you think will be the next old-becomes-new approach?
What are you thoughts on hierarchical RL approaches?
In you writeup (https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/) you seem to indicate Dota2 as an "easier" game than SC2, is that your stance or just bad wording? and how come?

Thanks

AxeLond · 2019-01-25T17:40:33+00:00

I looked over one of the replays, game 4 vs MaNa and I could spend the whole game looking over just how AlphaStar handles it's workers. 2/3 and 16/16 workers on gas/minerals is 100% efficient and 3/3 and up to 24/16 has diminishing returns. It seems to have very hard priorities like on 2 bases it maintains exactly 48 workers with 17 on minerals in the main base and 19 in the natural. After not building probes for almost 2 minutes when the third base finishes it build 1 extra probe from 48 to 49 and doesn't build any more for the rest of the game.

If you see an agent doing something specific like this is it possible to dissect it's brain to find out if there's a specific rule it follows to decide if it should build a 49th worker or not, or is it just impossible to understand what types of rules the neural network follows to decide what it should do in specific situations?

donkeykong1774 · 2019-01-24T21:26:17+00:00

Firstly, amazing job! Congrats to everyone on the team. This is an incredible feat, and it's a joy to watch the decision making and especially the reactions of those playing and commenting.

Could you go into some more detail on the networks used (especially the LSTMs), and what the visualization with the 3 regions with the pink colormaps meant. How does the network compare to the DQN networks used for playing atari, and the MCTS network used in Alphago zero?
How did you evaluate which 5 versions of AlphaStar were the least likely to be exploited? Were they simply the 5 strongest players?
I seem to recall someone mentioned briefly that there were reaction times of 50ms from AlphaStar? That seems faster than human capabilities.
Is there a version of AlphaStar trained purely using self-play, like Alphago Zero?
What did the likelihood of winning plot look like for the last live game? Did the game realize it had lost at the same time as the commentators? How did this compare for the other games?

Felewin · 2019-01-24T22:43:33+00:00

Do you expect the behavior to eventually condense toward a particular strategy, or to further diverge into counter-strategies? What trend do you see for amount of strategic diversity as experience increases?

AspiringInsomniac · 2019-01-24T23:25:22+00:00

- How hard is it to generalize the algorithms used to be able to train an NN to perform on a generic map? ie. have an NN that can play on a map that it hasn't trained on (but otherwise all other mechanics the same)

- You mentioned that there are a variety of agents and you handpicked them to play against the human player for some interesting play. Is there any plan to create a "superagent" which would be an AI to pick strategies based on trying to win series of games? One could imagine an AlphaStar League where matches are best-of-N's.

Voltz_sc2 · 2019-01-24T23:26:28+00:00

Do you have plans to continue training AlphaStar? Because in it's current state, while it is impressive, has no chance on taking on the world champion Joona Sotala.

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Isn't the unit-level micro-management aspect inherently unfair in favour of computers in StarCraft?

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Welcome to Reddit,

Want to add to the discussion?

Isn't the unit-level micro-management aspect inherently unfair in favour of computers in StarCraft?