[MUSIC PLAYING] LILY PENG: Hi everybody. My name is Lily Peng. I’m a physician by training and
I work on the Google medical– well, Google AI
health-care team. I am a product manager. And today we’re going to talk to
you about a couple of projects that we have been
working on in our group. So first off, I think
you’ll get a lot of this, so I’m not going to
go over this too much. But because we
apply deep learning to medical information,
I kind of wanted to just define a few terms
that get used quite a bit but are somewhat poorly defined. So first off, artificial
intelligence– this is a pretty broad term and it
encompasses that grand project to build a nonhuman
intelligence. Machine learning is
a particular type of artificial
intelligence, I suppose, that teaches machines
to be smarter. And deep learning
is a particular type of machine learning which
you guys have probably heard about quite a bit and will
hear about quite a bit more. So first of all, what
is deep learning? So it’s a modern reincarnation
of artificial neural networks, which actually was
invented in the 1960s. It’s a collection of simple
trainable units, organized in layers. And they work together to solve
or model complicated tasks. So in general, with smaller
data sets and limited compute, which is what we had
in the 1980s and ’90s, other approaches
generally work better. But with larger data sets
and larger model sizes and more compute power, we
find that neural networks work much better. So there’s actually
just two takeaways that I want you guys
to get from this slide. One is that deep learning
trains algorithms that are very accurate
when given enough data. And two, that deep
learning can do this without feature engineering. And that means without
explicitly writing the rules. So what do I mean by that? Well in traditional
computer vision, we spend a lot of
time writing the rules that a machine should follow to
make a certain prediction task. In convolutional
neural networks, we actually spend very
little time in feature engineering and
writing these rules. Most of the time we
spend in data preparation and numerical optimization
and model architecture. So I get this
question quite a bit. And the question is, how
much data is enough data for a deep neural network? Well in general, more is better. But there are diminishing
returns beyond a certain point. And a general rule
of thumb is that we like to have about 5,000
positives per class. But the key thing is
good and relevant data– so garbage in, garbage out. The model will predict very
well what you ask it to predict. So when you think about
where machine learning, and especially deep learning,
can make the biggest impact, it’s really in
places where there’s lots of data to look through. One of our directors, Greg
Corrado, puts it best. Deep learning is really good for
tasks that you’ve done 10,000 times, and on the 10,001st time,
you’re just sick of it and you don’t want to do it anymore. So this is really great for
health care in screening applications where you
see a lot of patients that are potentially normal. It’s also great where
expertise is limited. So here on the right
you see a graph of the shortage of
radiologists kind of worldwide. And this is also true for
other medical specialties, but radiologists
are sort of here. And we basically see a worldwide
shortage of medical expertise. So one of the
screening applications that our group has worked on
is with diabetic retinopathy. We call it DR
because it’s easier to say than diabetic
retinopathy. And it’s the fastest growing
cause of preventable blindness. All 450 million people with
diabetes are at risk and need to be screened once a year. This is done by taking
a picture of the back of the eye with a special
camera, as you see here. And the picture looks
a little bit like that. And so what a doctor does when
they get an image like this is they grade it on a scale of
one to five from no disease, so healthy, to
proliferate disease, which is the end stage. And when they do grading, they
look for sometimes very subtle findings, little things
called micro aneurysms that are outpouchings in the
blood vessels of the eye. And that indicates
how bad your diabetes is affecting your vision. So unfortunately in
many parts of the world, there are just not enough
eye doctors to do this task. So with one of our
partners in India, or actually a couple of
our partners in India, there is a shortage of 127,000
eye doctors in the nation. And as a result,
about 45% of patients suffer some sort of vision loss
before the disease is detected. Now as you recall, I
said that this disease was completely preventable. So again, this is something
that should not be happening. So what we decided to
do was we partnered with a couple of
hospitals in India, as well as a screening
provider in the US. And we got about 130,000 images
for this first go around. We hired 54 ophthalmologists
and built a labeling tool. And then the 54
ophthalmologists actually graded these images
on this scale, from no DR to proliferative. The interesting thing was
that there was actually a little bit of variability in
how doctors call the images. And so we actually got about
880,000 diagnoses in all. And with this labelled data set,
we put it through a fairly well known convolutional neural net. This is called Inception. I think lot of you guys
may be familiar with it. It’s generally used to classify
cats and dogs for our photo app or for some other search apps. And we just repurposed
it to do fundus images. So the other thing
that we learned while we were
doing this work was that while it was
really useful to have this five-point
diagnosis, it was also incredibly useful
to give doctors feedback on housekeeping
predictions like image quality, whether this is a
left or right eye, or which part of
the retina this is. So we added that to
the network as well. So how well does it do? So this is the first
version of our model that we published in a medical
journal in 2016 I believe. And right here on
the left is a chart of the performance of
the model in aggregate over about 10,000 images. Sensitivity is on the y-axis,
and then 1 minus specificity is on the x-axis. So sensitivity is a
percentage of the time when a patient has a
disease and you’ve got that right, when the
model was calling the disease. And then specificity
is the proportion of patients that don’t have
the disease that the model or the doctor got right. And you can see
you want something with high sensitivity
and high specificity. And so up and to the right– or up and to the left is good. And you can see
here on the chart that the little dots
are the doctors that were grading the same set. So we get pretty
close to the doctor. And these are board-certified
US physicians. And these are ophthalmologists,
general ophthalmologists by training. In fact if you look
at the F score, which is a combined measure of both
sensitivity and specificity, we’re just a little better
than the median ophthalmologist in this particular study. So since then we’ve
improved the model. So last year about December
2016 we were sort of on par with generalists. And then this year– this is a new paper
that we published– we actually used
retinal specialists to grade the images. So they’re specialists. We also had them argue
when they disagreed about what the diagnosis was. And you can see when we
train the model using that as the ground truth, the
model predicted that quite well as well. So this year we’re
sort of on par with the retina specialists. And this weighted
kappa thing is just agreement on the
five-class level. And you can see that,
essentially, we’re sort of in between the
ophthalmologists and the retina specialists, in fact
kind of in between the retinal specialists. Another thing that
we’ve been working on beyond improving the
models is actually trying to have the
networks explain how it’s making a prediction. So again, taking a
playbook or a play out of the playbook
from the consumer world, we started using this
technique called show me where. And this is where
using an image, we actually generate
a heat map of where the relevant pixels are for
this particular prediction. So here you can see a
picture of a Pomeranian. And the heat map
shows you that there is something in the
face of the Pomeranian that makes it look Pomeranian-y. And on the right here, you
kind of have an Afghan hound, and the network’s
highlighting the Afghan hound. So using this very
similar technique, we applied it to
the fundus images and we said, show me where. So this is a case
of mild disease. And I can tell it’s
mild disease because– well, it looks
completely normal to me. I can’t tell that there
is any disease there. But a highly
trained doctor would be able to pick out little
thing called microaneurysms where the green spots are. Here’s a picture of
moderate disease. And this is a little
worse because you can see some bleeding at the ends here. And actually I don’t
know if I can signal, but there’s a bleeding there. And the heat map– so here’s a heat map. You can see that it
picks up the bleeding. But there’s two
artifacts in this image. So there is a dust spot,
just like a little dark spot. And then there is
this little reflection in the middle of the image. And you could tell
that the model just ignores it, essentially. So what’s next? We trained a model. We showed that it’s
somewhat explainable. We think it’s doing
the right thing. What’s next? Well, we actually have to deploy
this into health-care systems. And we’re partnering with
health-care providers and companies to bring
this to patients. And actually Dr. Jess Mega,
who is going to speak after me, is going to have a little
more details about this effort there. So I’ve given the
screening application. And here’s an
application in diagnosis that we’re working on. So in this particular example,
we’re talking about a disease– well, we’re talking
about breast cancer, but we’re talking about
metastases of breast cancer into nearby lymph nodes. So when a patient is
diagnosed with breast cancer and the primary breast
cancer is removed, the surgeon spends
some time taking out what we call lymph nodes
so that we can examine to see whether or not the
breast cancer has metastasized to those nodes. And that has an impact on
how you treat the patient. So reading these lymph nodes
is actually not an easy task. And in fact about in 24% of
biopsies when they went back to look at them, the 24% had
a change in nodal status. Which means that if it was
positive, it was read negative, and it was negative,
read positive. So that’s a really big deal. It’s one in four. The interesting
thing is that there was another study
published that showed that a pathologist
with unlimited time, not overwhelmed
with data, actually is quite sensitive, so
94% sensitivity in finding the tumors. When you put time
constraint on the patient, their sensitivity– or
sorry, on the provider, on the pathologist,
the sensitivity drops. And people will
start overlooking where little metastases may be. So in this picture there’s a
tiny metastasis right there. And that’s usually small things
like this that are missed. And this is not surprising
given that so much information is in each slide. So one of these
slides, if digitized, is about 10 gigapixels. And that’s literally a
needle in a haystack. The interesting thing is that
pathologists can actually find 73% of the cancers if they
spend all their time looking for it with zero false
positives per slide. So we trained a model that
can help with this task. It actually finds about
95% of the cancer lesions and it has eight false
positives per slide. So clearly an
ideal system is one that is very sensitive using the
model, but also quite specific, that relies on the pathologist
to actually look over the false positives and
calling them false positives. So this is very
promising and we’re working on validation
in the clinic right now. In terms of reader
studies, how this actually interacts with the doctor
is really quite important. And clearly there are
applications to other tissues. I talked about lymph nodes,
but we have some early studies that actually show that this
works for prostate cancer, as well, for Gleason grading. So in the previous
examples we talked about how deep learning can
produce the algorithms that are very accurate. And they tend to make calls that
a doctor might already make. But what about predicting things
that doctors don’t currently do from imaging? So as you recall from the
beginning of the talk, one of the great things
about deep learning is that you can train
very accurate algorithms without explicitly
writing rules. So this allows us to make
completely new discoveries. So the picture on the
left is from a paper that we published
recently where we trained deep-learning
models to predict a variety of cardiovascular risk factors. And that includes age,
self-reported sex, smoking status, blood pressure,
things that doctors generally consider right now to assess the
patient’s cardiovascular risk and make proper treatment
recommendations. So it turns out
that we can not only predict many of these
factors, and quite accurately, but we can actually directly
predict a five-year risk of a cardiac event. So this work is quite
early, really pulmonary, and the AUC for this
prediction is 0.7. What that number is means is
that if given two pictures, one picture of a patient that did
not have a cardiovascular event and one picture of a patient
who did, it is right about 70% of the time. Most doctors is
around 50% of time, because it’s kind of
a random– like it’s hard to do based on a
retinal image alone. So why is this exciting? Well normally when
a doctor tries to assess your risk for
cardiovascular disease, there are needles involved. So I don’t know if anyone
has gotten blood cholesterol screening. You fast the night before and
then we take some blood samples and then we assess your risk. So again, I want to emphasize
that this is really early on. But these results
support the idea that we may be able
to use something like an image to make new
predictions that we couldn’t make before. And this might be able
to be done in sort of a noninvasive manner. So I’ve given a few
examples, three examples of how deep learning can really
increase both availability and accuracy in health care. And one of the things that
I want to kind of also acknowledge here is the
reason why this has become more and more exciting is,
I think, because TensorFlow is open source. So this kind of open standard
from general machine learning is being applied everywhere. So I’ve given examples of work
that we’ve done at Google, but there’s a lot of work that’s
being done across the community at other medical centers
that are very similar. And so we’re really
excited about what this technology can bring
to the field of health care. And with that, I’d like
to introduce Jess Mega. Unlike me, she is a real doctor. And she’s the chief
medical officer at Verily. JESSICA MEGA: Well thank
you all for being here. And thank you Lily
for kicking us off. I think the excitement
around AI and health care could not be greater. As you heard, my
name is Jess Mega. I’m a cardiologist and
am so excited to be part of the Alphabet family. Verily grew out of
Google and Google X. And we are focused solely on
health care and life sciences. And our mission is to take
the world’s health information and make it useful so that
patients live healthier lives. And the example that I’ll talk
about today focuses on diabetes and really lends itself to the
conversation that Lily started. But I think it’s very
important to pause and think about
health data broadly. Right now, any individual
who’s in the audience today has about several
gigabytes of health data. But if you think about
health in the years to come and think
about genomics, molecular technologies,
imaging, sensor data, patient-reported data,
electronic health records and claims, we’re
talking about huge sums of data, gigabytes of data. And at Verily and
at Alphabet, we’re committed to stay ahead of this
so that we can help patients. The reason we’re focusing
initially some of our efforts on diabetes is this is
an urgent health issue. About 1 in 10
people has diabetes. And when you have
diabetes, it affects how you handle sugar
glucose in the body. And if you think
about prediabetes, the condition before
someone has diabetes, that’s one in three people. That would be the entire center
section of the audience today. Now what happens when
your body handles glucose in a different way, you
can have downstream effects. You heard Lilly talk about
diabetic retinopathy. People can have problems
with their heart, kidneys, and peripheral neuropathy. So this is the type of disease
that we need to get ahead of. But we have two main issues
that we’re trying to address. The first one is
an information gap. So even the most adherent
patients with diabetes– and my grandfather
was one of these– would check his blood
sugar four times a day. And I don’t know if
anyone today has been able to have any of the snacks. I actually had some of
the caramel popcorn. Did anyone have any of that? Yeah, that was great,
right, except probably our biology and our glucose
is going up and down. So if I didn’t check my
glucose in that moment, we wouldn’t have
captured that data. So we know biology is
happening all of the time. When I see patients in the
hospital as a cardiologist, I can see someone’s heart
rate, their blood pressure, all of these vital
signs in real time. And then people go home, but
biology is still happening. So there’s an information
gap, especially with diabetes. The second issue
is a decision gap. You may see a care provider
once a year, twice a year, but health decisions are
happening every single day. They’re happening
weekly, daily, hourly. And how do we decide
to close this gap? At Verily we’re focusing
on three key missions. And this can be true for almost
every project we take on. We’re thinking
about how to shift from episodic and reactive care
to much more proactive care. And in order to do that
and to get to the point where we can really use
the power of that AI, we have to do three things. We have to think about
collecting the right data. And today I’ll be talking about
continuous glucose monitoring. How do you then organize this
data so that it’s in a format that we can unlock and activate
and truly help patients? So whether we do this
in the field of diabetes that you’ll hear about today
or with our surgical robots, this is the general premise. The first thing to think about
is the collection of data. And you heard Lily say
garbage in, garbage out. We can’t look for insights
unless we understand what we’re looking at. And one thing that has been
absolutely revolutionary is thinking about extremely
small biocompatible electronics. So we are working on
next-generation sensing. And you can see a
demonstration here. What this will lead
to, for example, with extremely small continuous
glucose monitors where we’re partnering to create
some of these tools, this will lead to
more-seamless integration. So again, you don’t just
have a few glucose values, but we understand how
your body is handling sugar, or someone
with type 2 diabetes, in a more continuous fashion. It also helps us
understand not only what happens at a
population level but what might happen
on an individual level when you are ingesting
certain foods. And the final thing is to really
try to reduce costs of devices so that we can really
democratize health. The next aim is, how do we
organize all of this data? And I can speak both as a
patient and as a physician. The thing that people will
say is, data’s amazing, but please don’t overwhelm
us with a tsunami of data. You need to organize it. And so we’ve
partnered with Sanofi on a company called Onduo. And the idea is
to put the patient in the center of
their care and help simplify diabetes management. This really gets to
the heart of someone who is going to be
happier and healthier. So what does it actually mean? What we try to do
is empower people with their glucose control. So we turned to the American
Diabetes Association and look at the glucose
ranges that are recommended. People then get a
graph that shows you what your day looks like
and the percentage of time that you are in range– again, giving a
patient or a user that data so they can be the
center of their decisions– and then finally tracking
steps through Google Fit. The next goal then is
to try to understand how glucose is pairing with
your activity and your diet. So here there’s an
app that prompts for the photo of the food. And then using image recognition
and using Google’s TensorFlow, we can identify the food. And this is where the
true personal insights start to become real. Because if you eat
a certain meal, it’s helpful to understand
how your body ends up relating to it. And there’s some really
interesting preliminary data suggesting that the
microbiome may change the way I responded to
a banana, for example, or you might respond. And that’s important
to know because all of a sudden those general
recommendations that we make as a doc– so if someone
comes to see me in clinic and they have type 2
diabetes I might say, OK, here are the things
you need to do. You need to watch
your diet, exercise, take your oral medications. I need you to also
take insulin, exercise. You’ve got to see your
foot doctor, your eye doctor, your
primary-care doctor, and the endocrinologist. And that’s a lot to integrate. And so what we try
to do is also pair all of this information in a
simple way with a care lead. This is a person that helps
someone on their journey as this information is surfaced. And if you look in the middle
of what I’m showing you here on what the care lead and
what the person is seeing, you’ll see a number
of different lines. And I want us to drill
down and look into that. This is showing you the
difference between the data you might see in an
episodic glucose example or what you’re seeing with
the continuous glucose monitor enabled by this new sensing. And so let’s say we drill down
into this continuous glucose monitor and we look
at a cluster of days. This is an example. We might start to see patterns. And as Lily mentioned, this
is not the type of thing that an individual patient, care
lead, or physician would end up digging through,
but this is where you start to unlock the
power of learning models. Because what we can
start to see is a cluster of different mornings. We’ll make a
positive association that everyone’s eating
incredibly healthy here at Google I/O, so maybe that’s
a cluster of the red mornings. But we go back into our regular
lives and we get stressed and we’re eating a
different cluster of foods. But instead of, again,
giving general advice, we can use different
models to point out, it seems like
something is going on. With one patient,
for example, we were seeing a cluster
around Wednesdays. So what’s going
on on Wednesdays? Is it that the person
is going and stopping by a particular location,
or maybe there’s a lot of stress that day. But again, instead of
giving general care, we can start to target care
in the most comprehensive and actionable example. So again, thinking about
what we’re talking about, collecting data, organizing
it, and then activating it and making it
extremely relevant. So that is the way we’re
thinking about diabetes care, and that is the way
AI is going to work. We heard this morning
in another discussion, we’ve got to think
about the problems that we’re going to solve and
use these tools to really make a difference. So what are some other
ways that we can think about activating information? And we heard from Lily
that diabetic retinopathy is one of the leading
causes of blindness. So even if we have
excellent glucose care, there may be times where you
start to have end organ damage. And I had mentioned
that elevated glucose levels can end up affecting
the fundus and the retina. Now we know that
people with diabetes should undergo screening. But earlier in the
talk I gave you the laundry list of what
we’re asking patients to do who have diabetes. And so what we’re trying to
do with this collaboration with Google is figure
out, how do we actually get ahead of the
product and think about an end-to-end
solution so that we realize and bring down the
challenges that exist today. Because the issue, in terms
of getting screened, one of it is accessibility,
and the other one is having access to optometrists
and ophthalmologists. And this is a problem
in the United States as well as in developing worlds. So this is a problem,
not something just local. This is something that we
think very globally about when we think about the solution. We looked at this data
earlier and this idea that we can take algorithms and
increase both the sensitivity and specificity of diagnosing
diabetic retinopathy and macular edema. And this is data that
was published in “JAMA” as Lily nicely outlined. The question then
is, how do we think about creating this product? Because the beauty of working
at places like Alphabet and working with partners
like you all here today is we can think about, what
problem are we solving, create the algorithms. But we then need to step
back and say, what does it mean to operate in the
space of health care and in the space
of life science? We need to think about the image
acquisition, the algorithm, and then delivering
that information both to physicians
as well as patients. So what we’re doing is
taking this information and now working with
some of our partners. There’s a promising pilot that’s
currently ongoing both here as well as in India, and
we’re so encouraged to hear the early feedback. And there are two
pieces of information I wanted to share with you. One is that looking at
this early observations, we’re seeing higher
accuracy with AI than with a manual greater. And the thing that’s
important as a physician– I don’t know if there are any
other doctors in the room, but the piece I always
tell people is there’s going to be room for
health-care providers. What these tools are doing is
merely helping us do our job. So sometimes people ask
me, is technology and AI going to replace physicians or
replace the health-care system? And the way I think about it
is, it just augments the work we do. If you think about
the stethoscope– so I’m a cardiologist,
and the stethoscope was invented about
200 years ago. It doesn’t replace
the work we do. It merely augments
the work we do. And I think you’re going to see
a similar theme as we continue to think about ways of
bringing care in a more effective way to patients. So the first thing here is that
the AI was performing better than the manual grader. And then the second
thing is to think about that base of patients. How do we truly
democratize care? And so the other encouraging
piece from the pilot was this idea that
we could start to increase the base of patients
treated with the algorithm. Now as it turns
out, I would love to say that it’s really easy
to do everything in health care and life science. But as it turns out,
it takes a huge village to do this kind of work. So what’s next? What is on the path
to clinical adoption? And this is what makes
it incredibly exciting to be a doctor working with
so many talented technologists and engineers. We need to now partner with
different clinical sites that I noted here. We also partner
deeply with the FDA, as well as regulatory
agencies in Europe and beyond. And one thing at Verily
that we’ve decided to do is to be part of what’s called
the FDA precertification program. We know that bringing new
technologies and new algorithms into health care is
critical, but we now need to figure out how to
do that in a way that’s both safe and effective. And I’m proud of us
at Alphabet for really staying ahead of
that and partnering with groups like the FDA. The second thing that’s
important to note is that we partner
deeply at Verily with Google as well as other
partners like Nikon and Optus. All of these pieces
come together to try to transform care. But I know that if
we do this correctly, there’s a huge opportunity not
only in diabetes but really in this entire world
of health information. It’s interesting
to think about it as a physician who spends
most of my time taking care of patients in the
hospital, how can we start to push more
of the access to care outside of the hospital? But I know that
if we do this well and if we stay ahead of
it, we can close this gap. We can figure out ways to
become more preventative. We can collect the
right information. We can create the
infrastructure to organize it. And most importantly, we will
figure out how to activate it. But I want everyone
to know here, this is not the type of
work that we can do alone. It really takes
all of us together. And we at Verily, we at
Google, and we at Alphabet look forward to partnering
with all of you. So please help us
on this journey. Lily and I will be
here after these talks. We’re happy to chat
with all of you. And thank you for
spending time at I/O. [MUSIC PLAYING]