[MUSIC PLAYING] KURT ERICSON: My name is Kurt. I’m a product manager at Google. I work on the health care
and life science team. And I focus on Cloud
Healthcare API. We’re very lucky today
to be joined by David from Stratus Medicine, CEO. He’ll be speaking
in a little bit. And let me also say,
actually, before we jump in, I must freely admit we were
just talking about this, when David and I saw that we
were at the 5 PM slot and not in the main venue,
we were more than a little concerned that no
one would show up. So I just want to thank everyone
from the bottom of my heart for not forcing us to stand up
here and talk to an empty room. So thank you. [APPLAUSE] KURT ERICSON: In
today’s session, we’re going to be talking
about three things. In the first bit, I’m
going to focus on the role that Cloud Healthcare
API has in health care infrastructure to
approach, again, serverless machine learning. I’ll then talk about a
general solution architecture that we use and we
take with customers who are looking to implement
a similar solution. And then I’m going to
hand over to David, who’s going to talk about,
with a heavy dose of pragmatism and insight from the
field, what this looks like in a real deployment. So let’s get started. I think it’s useful
in health care to help contextualize
conversations about caring for patients, working
with providers in the context of, in
this case, a nurse. So this is Kristi. She’s formerly an ICU nurse,
currently a nurse practitioner. And those of you
in the front row here may notice that Kristi
bears a striking resemblance to me. It’s a keen observation. Kristi is, in fact,
my little sister. Aww. Now the reason I
mention that is Kristi and I have this little game
we play of one-upsmanship at family gatherings and
such where she tells me all the things that she’s
seen in clinical settings, and I try and counter with all
the things I’ve seen in health care IT. And needless to say, you
know, Kristi, again, I see you rolling inside
and out of the ED. Kristi has seen things. Kristi usually wins. But I asked her, hey, I’m
going to be giving this talk. I’m going to be talking
about data driven problems. Can you give me
some examples where you think there ought to
be more data in the EHR, your medical records system,
and you just can’t quite find it at the moment you
need to make a decision? And she looked at me
and sort of said, wow, sighed, where do you
even want to start? And I mean, when all
of us in this room are familiar with health
care IT and clinical decision support, cognitive
assistance, you quickly realize there is an abundance
of opportunity here. There are thousands of problems
that need to be solved. I’m going to give you
a couple examples. In this case, assume
there’s a small inpatient clinic, relatively rural. Doesn’t have a lot
of specialists, or to the extent it does,
they only visit once a week. And this is problematic. Because if a
patient is admitted, gets their usual labs, and it
shows an elevated troponin– troponin is an early
indicator for cardiac arrest. The question is, that
doesn’t always happen. So should you refer this patient
from this rural care facility to a larger care facility
that has cardiac specialists? I don’t know. Let me give you another example. Assume a patient just suffered
from a cerebrovascular accident. It was a stroke. They’re laying in the
bed, just recovering. One physician says, oh, we need
to keep the blood pressure up. We need to basically
soak the brain in blood, oxygen-rich blood. Help it recover. Another physician says,
no, no, no, no, no. Look at the CT scan. Look at that– see that
bulge by the heart? If we keep the
blood pressure up, we’re going to shear the aorta. What do you do? What should the
blood pressure be? Right? These are just two
examples of the myriad of problems that exist in
clinical decision making. And the question
is, is there data, or is there a process
to sift through the data and arrive at a support and
a solution that can augment and extend the clinician? And I want to touch on a paper– this was a paper
actually presented. When I introduced health
care API last year at Next, I mentioned this paper. Because at the time,
it had just come out. This is a paper from
our Google AI team. And it shows among other
things that, first of all, you can map a bunch
of data to FHIR. We’ll come back to
that in a second. But secondarily, it shows
that having mapped your data to FHIR, you can make highly
sensitive and specific predictions for adverse
clinical outcomes. So again, I mentioned
that last year. Since last year,
we’ve been working with customers and
partners to start to bring these tools
into clinical workflow. One of those examples
I’ve highlighted here. This one’s public. So again, not disclosing it
if it’s not already public. But this is with
a partnership we worked with Emory, institution
in the United States, where they had an issue
with bloodborne infection in their ICU– so sepsis. Now there are a
lot of algorithms out there for addressing sepsis. But they had a particular
set of conditions that meant they needed to build a model. This team saw the work
and saw TensorFlow. And said, hey, can I build
a pipeline like that? Could I ingest my
labs and vitals? Could I deploy a solution? And in fact, they did. And not only did
they do it, they then won a grant from the HHS,
Health and Human Services, to expand the program. This is great, because it shows
that this pipeline is actually feasible in real
clinical settings. So with that in mind– so I’ve set up the problem. I’ve now described why are we
even approaching this problem. Why are we building
what we’re building? And I think this
is where I’m going to shift into talking
about Cloud Healthcare API. And I think a lot of
people who approach data science, and especially
data science and medicine, you go into it with this mindset
that there are these three phases, right? These are common to any
data science project– discovery, model training,
model deployment. The question is, for
people who are newer, are these equal in time? Most people who are, again, new
to the field, say, oh, yeah. Sure. You know, I’ll just
do some discovery. I’ll train them. I’ll deploy it. These are not equal in time. In fact, most of us in this room
who have lived this experience know that we spend an enormous
amount of time and energy in that first
spot, in discovery. Training turns out
to be the easy part. Training turns out to be
the part that goes fast. And then there’s deployment. Yeah, there’s a lot of
issues around patient safety and security of the data. But at the end of
the day, it still pales in comparison
to the amount of time we’re spending in discovery. Let me put this problem
a slightly different way. On the one hand, you’ve got
all these clinical systems that institutions have been
investing in for the last 10, 20, 30 years. And you’ve got them over
here, they’re largely on prem. Over here, you’ve got
this thing called Cloud. Now Cloud is fabulous. Cloud is where we got
TensorFlow at scale. Cloud’s where we’ve got chips
designed for TensorFlow. Cloud’s where we have
BigQuery running. But there’s this enormous
gulf between these two worlds. Because you can’t just rip
these systems out and move them to Cloud. So the question is, how
do we fill that gap? And that’s the role
of the Healthcare API. The whole purpose of
the Healthcare API is to help bridge these existing
systems with the capabilities of Google Cloud. And in particular, and
I like to emphasize this point, because it’s often
lost in these discussions, at the end of the day,
we see an important step in deploying Healthcare API– is that we need to
necessarily speak and support the formats and protocols
that are already native to the industry, data
that these institutions already have, they’re
already generating. Because if we can speak
these formats and protocols, we can bring this data
into Google and into Cloud. Specifically, and
again, this touches on this particular topic today
and our specific use case, we want to apply a variety
of analytics and machine learning services to this data. Which again, the role
of Healthcare API is to facilitate
that integration. Now briefly, I just
want to touch on this to answer the question–
we get a lot of questions about, well, where does this fit
in the Google product roadmap in terms of offering
the services of Gmail, fully managed by Google SaaS? Is it raw infrastructure,
raw cloud, VMs and so forth on [? JSON ?] Compute Engine? And it really sits
in the middle. It shouldn’t really matter
from your perspective, if you’re throwing one resource
in, 100 million resources in, one DICOM image, 100
million DICOM images, it should scale
transparently for you. That’s our goal. Now from an API perspective,
we call it Cloud Healthcare API because it’s very
much an API surface. You have this hierarchy
of your project. You control the project. Within your project,
you specify a location. This is critical in
health care, because that informs the API for
where to store data for the rest of that path. You create a data set,
a data set as a grouping of multiple modalities of data. So you can have imaging
data over here and FHIR data here and V2 data over here
for clinical messaging. And finally, you have
your stores, right? Stores implement the
modalities of specific API. And I emphasize the
point that it’s an API. Because if you’re
building an application, you’re just addressing an
API in a very familiar way. It’s no different than any
other API you interact with. The key difference is
that, under the hood, it’s scaling and implementing
these specific modality data types for health care. And of course, I
would be remiss– I don’t want to spend too
much, belabor the point, we have entire
sessions on this topic. But I do want emphasize
this whole architecture is designed to support the
storage of protected health information. So yes, it is covered by our
Business Associates agreement. OK. Now, I love this slide. And every time I
show this slide, people are, like, oh,
that’s so boring, Kurt. Why do you put
this in your decks? And I like it a lot, because
what it shows in [INAUDIBLE]—- and let me explain what’s
going on in this example here. But in these four examples,
it shows, oh, we’ve created a data set
using G Cloud, our CLI. We’ve created a v2 store
for clinical messages. We’ve created a DICOM
store for clinical imagery. And we’ve created a FHIR
store for FHIR data. And we did this
in four commands. And it doesn’t matter. Again, you can send it
one image, 100,000 images. I take the fact that it’s
boring as a complement. We have made the ability to
create this infrastructure so boring that this is the
trivial part of the process. And that’s great. The more and more we can make
this type of work mundane, the more time developer
teams that we work with can spend in the
application space, helping patients,
helping providers. And that’s where the value
is going to come from. And briefly, I just want to
cover some of the configuration options. These end up to be useful. And you’ll see why as I start
to talk about the solution architecture. But in this
particular case, when we’re talking about
HL7v2 store, there is one configuration option
for segment terminators. I don’t want to spend
too much time on that. But for those who have
deployed v2 in the field, know that you can configure
that in a v2 store. The piece I want to
mention explicitly here is that every single
store in Cloud Healthcare API can be associated with
the Pub/Sub topic. This becomes important. I’ll get to this in a minute. DICOM– same idea–
it can be associated with a Pub/Sub topic. Every time a new image
is sent to a DICOM store, it generates a notification
saying, hey, I got a new image. Same thing with FHIR– you look at a FHIR store, we
have a number of parameters that we support for various
FHIR’s very complex specs. So I don’t want to dwell
on the spec itself. But the thing is,
when you’re looking at the FHIR specification, know
that the different parameters that are in the
spec can be tuned via configuration on the store. Again, the important
point that I want to emphasize for
our discussion today is the fact that you
can associate a Pub/Sub topic with the store. So what do I mean by that? Again, you have your
health care store. It stores health
care specific data. When data is written,
when data changes, it generates a notification
to call Pub/Sub. That triggers a series
of notifications that any number of
applications that you build can be subscribed to. And from there,
this is an example of what that message looks like. So it’s just like any
other Pub/Sub message, namely, there’s a data
element that’s BASE64 encoded, like all Pub/Sub
messages on Google Cloud. That contains the path of
the resource that changed. So any subscribed application
can grab the data that changed. This will become a
central point in a second. Now, I emphasized earlier
that it’s just an API. So to get data in and out,
you simply call the API. In this particular case,
we have a sample resource. We’re going to pull it
out for some application. That’s great. Now this is where
things get interesting. I said at the beginning,
one of the key roles of Cloud Healthcare
API is the ability to export data and project
it from its native data type to BigQuery so you can run
SQL-based analysis against it. That, again, is
just an API method. You invoke on the store. Say, hey, take this FHIR
data, project it to BigQuery. Take this DICOM data, project
the metadata to BigQuery. And that, under the
hood, will trigger a job that we project that to
your own BigQuery store. For the case of FHIR,
there is a community supported specification
for an analytical schema. We implement this specification. So to the extent possible,
this an important point, we are implementing industry
standard specifications on Google Cloud technology. So again, the projection
is based on the schema. And to trigger it, you just
set an additional parameter, highlighted in
blue on this slide, to say I want my projection to
follow the analytical schema based on the community spec. Last point I want to make about
interacting with FHIR store is that all of
these FHIR stores, rather, can export and
import directly to BigQuery. I actually should have
mentioned this point earlier. They can export and
import to BigQuery and be connected to your
business intelligence tools. So you’ve got R.
You’ve got Tableau. You’ve got all these services
that you’re used to running. They can interact directly
with that projected data. So you don’t have to teach
your data science team to use a different tool to
connect to the projected data. Ah, this is the point
I was trying to make. When you’re ready
to export your data for the purpose of
training a model, we have a convenient
API call to let you say, hey, store, take all my data
and drop it in a BigQuery. And you can see again,
through a single API call, that that method will
take your data, whether it’s the DICOM data, or V2
data, or FHIR data, export it to Cloud storage
in this particular case, so you can train the model. Or in case you want to
move it to back on prem, or take it from on
prem and shove it in the Cloud Healthcare API. All of this is possible
as an API method. Now the last point
I want to make before we dive into the
solution architecture is to talk about the fact that
it’s in Beta, finally, yay! Over the last year, one point of
emphasis has been performance. One of the points of emphasis
has been conformance. So those are the two things that
our team is largely focused on. And again, I
mentioned this before, it’s in scope for the BAA and
importantly, for term service. OK. So I’ve set the stage for
why Cloud Healthcare API. I want to spend a
little bit of time now talking through the
architecture for how we help these teams
deploy a solution which sets the pattern that
you’ll see repeated when David speaks about his
particular implementation. So before I dive
into that, there is one essential
point I like to make. Because this is a
recurring theme as we talk about these
implementations. And that is namely
the focus on FHIR. So FHIR serves two important
roles in our approach. And that is namely, it’s a
very extensible data model, which is incredibly convenient. The second is that it’s a graph. So any graph-based
querying and technology use can work against FHIR data. And then third is
actually the fact that it’s also an
API specification. So you can use it as a
transactional target, in addition to being the target
for labeling and training data. Also there’s a recent– this
actually came out last week. CMS announced that
they’re transitioning some of their services
over the next few years, not immediately. But the initial wave, the
demo, was released last week. So you can actually
extract claims data in FHIR bulk format. For those of you who
are paying attention to the regulatory environment
during [? HIMS, ?] you probably also noticed
that both CMS and ONC released proposed rules
with a very heavy emphasis on exposing APIs. And the ONC rules– 700 pages– mentioned FHIR
a little over 300 times. So there’s a clear– both data science
reason to be taking this approach, and a clear push
from the regulatory environment to say maybe this is something
worth investigating further. For the purpose of
our conversation today, I’m going to talk about
a very specific implementation. And that is something
called CDS Hooks. In the clinical space, CDS
Hook is one implementation of clinical decision support. There are, of course, many– best practice alerts,
[? flow sheet ?] integrations, [? clin KB ?] integrations. The point is there are a lot
of disparate integrations that are possible. I’m going to focus
on this pattern. Because I want to set up
the discussion for what you’ll see when David
presents his work. The CDS hook is fundamentally
no more than a web hook. So it triggers an
alert, an event occurs inside a care facility. That triggers an alert,
just like any other web hook you’re used to. Now in the case of
a CDS Hook, there are a number of events
that trigger that. Patient-view is the most
commonly implemented, to the extent CDS
Hooks are implemented. But there’s others. When an order is reviewed,
a medication prescribed, the order-select in down,
which I put in gray, those are technically
in the spec. We haven’t seen them
widely implemented. But I mention them in the
interest of thoroughness. Now the CDS service
is in turn expected to respond with a
payload that’s designed to trigger a native
rendering in the source EHR. So in the case of
Cerner, there’s a specific UI element
that gets rendered when this web hook is answered. This gives you, at
least for the purpose of our discussion,
a target when we’re talking about an
architecture for delivering clinical decision support,
cognitive assistance, into workflow. The other convenient
property about it, which I like for discussion
and developer purposes, is that there’s an open sandbox. So you could leave the
session, go test it on the sandbox that’s
developed by Cerner. So Cerner is one of the larger
EHR vendors in the world. They have a process to
just test these locally– again, convenient when you’re
doing development activities. So let’s return to
the three phases I set up at the beginning. You have discovery,
you have training, and you have deployment. In the case of discovery, we
actually spent a lot of time on this already. I talked about the
ability to project data from the health care
API to BigQuery. So you’re ingesting
these v2 messages. You’re ingesting
this DICOM data. And you want to project
it for the purpose of discovering what are some
abnormalities in my data. Connect your R process to
do some sifting around, oh, do I have some
biases expressed because of my collection procedures? All that work is enabled through
your projection into BigQuery. For training– this is another
area which I’ll point you to some open source work
that our friends at Google AI have performed– there’s a
whole step-by-step process we’ve identified, which
is, again, we’ve tested a number of scenarios. But the idea is how
do we take FHIR data, how do we label it, annotate
it, and then how do we train a model? This has a step-by-step guide
on how you would do that after exploding the data. And the basic flow,
which I’ll cover for the purpose of when
you’re deploying on Cloud, is you take your FHIR data. You export it to GCS. You then label it
however you see fit, using, again, the tools I
just showed on the last slide. And you create a model which
you can upload to ML Engine. Now the nice thing
about the work that my AI team showed
on the last slide was that that also
works locally. So if you’re looking to
develop this kind of tooling, you can test it directly
on your machine. Now in terms of deployment,
what we’re largely talking about is OK, how do we
implement the CDS Hook? How are we going to build a
solution that pieces together various parts of Cloud in a
way that answers that web hook request? So in this particular case,
I’ve used the Kubernetes engine logo for this EHR relay. And that’s because you’re
going to be ingesting clinical data somehow. And I say somehow,
because for the purpose of our example, what we want is
a stream of FHIR observations. We all know there are
very few EHR systems streaming FHIR
observations today. What you’re
realistically getting is an HL7v2 message,
streaming labs and vitals. In this case, I used
an RU for an example. So the question is
OK, great, now how do I turn that into
a FHIR observation? There is some work on our side
for ingesting HL7v2 message– [? this is an ?] open source
project that speaks MLLP, so Minimum Lower
Layer Protocol is the protocol necessary
to ingest v2 messages. We have an open source
container that you can run on Kubernetes engine– it’s why I used the Kubernetes
engine logo earlier– to ingest these
clinical messages. Now we also work with partners. So you may have interface
engines in your institution today that are capable
of directly integrating with Cloud Healthcare API. One of them– this is
a team we work with, built a integration
for Mirth or NextGen, to ingest v2 messages directly
from an existing Mirth installation. So that a system that already
invested in Mirth slash NextGen could easily
connect their data to Cloud. So somehow, some way,
using either those services or one of your existing
interface engines, you’re streaming v2 messages
to Cloud Healthcare API. And I mentioned earlier that
it’s connected to Pub/Sub. And this is the
essential feature. Because now, you
have a mechanism to know when new data was
written to your v2 store. And because you have a
mechanism to know that, you can ultimately trigger data
flow, again, either directly. Or this is an area of
investment for our team, how can we make this more
of a managed service? But that maps that v2
data, grabs the message, and turns it into
a FHIR observation. So this is the target. Somehow you have
this little engine running that’s mapping v2
messages to FHIR on a real time basis. So we go back to
the example, where let’s assume for
the sake of argument and for the sake of
my diagram, that we’re streaming observations from a
source which may be this data flow process under the hood. Now we’ll go back to
the notification idea. So now you’ve got
these [? odd ?] FHIR observations being
written to the store. Again, you can lean on the fact
that Pub/Sub is sending out notifications about
this new data. So your Cloud Function
can directly attach. And why it’s Cloud
Function necessary here? Well, because somehow you
need to invoke the model with the right FHIR data. So because the
Pub/Sub notification contained information about
what FHIR resource was mutated, it can grab the patient
everything bundle from Healthcare API,
which effectively is a representation,
everything about that data, and send it off to ML
engine for a prediction. Once it receives
that prediction, it can then generate
a risk assessment and write the risk assessment
back to the FHIR store. So now you’ve got this pipeline
of your ingesting data. The model that you
trained in Phase 2 is running on ML engine, looking
at the Patient Everything bundle which, again, trained
using a TensorFlow sequence example. And it’s generating
a prediction, which you’re converting
into a risk assessment. So that whole process is
running asynchronously, right? As new data is streaming in,
that whole engine is churning. But now you’ve got the
CDS Hook component. Because the hospital says,
oh, I have a new patient, or a clinician
looks at a patient. That triggers a notification
to a Cloud Function. Because for those familiar
with Cloud Functions, one of the triggers you can
use is an HTTPS invocation. So now it’s, oh,
I’ve got a web Hook request from this hospital. I know how to handle this. Let me reach out to the
Cloud Healthcare API, grab the risk assessment,
and format my payload. And I skipped over
the fact that CDS Hook has an authentication
authorization step. But that’s in there. That would be in your business
logic for the Cloud Function. But what you’re
generating here is now you’ve got that asynchronous
loop running generating risk assessments in a synchronous
loop to inspect the data and generate a response
for the care facility in the immediate workflow. Again, I want a
point of emphasis, we have an open
source project that shows how that
pipeline is built. And this project
actually also has a UI, though I want to caution people. What this demo does
is it says, I’m going to go to country X.
Do I need an immunization? This is not a great problem
for machine learning. I freely admit that. And the reason is
because if you’re going to go to country
X, you would simply consult your relevant
authority and get your appropriate immunizations. You don’t really
want to predict that. But the underlying
pipeline is the same. So we wanted to pick a
topic where it was clear that the pipeline is feasible
to show you how the pieces fit together, even if the
problem space– should you or should you not
get an immunization– is relatively trivial. That pipeline is consistent
across all the implementations of the solution that we’ve seen. Also for imaging
data, I have focused on FHIR and clinical data for
the purpose of this discussion. But there’s a Codelab
also available on GitHub for running this exact
same flow, except using imaging data. So again, whether you’re using
streaming clinical data or data from an EHR, you have
a consistent system and architecture to apply
that’s able to generate these sensitive and
specific predictions. And with that, I
went to hand it over to David who’s going to
talk about this in practice. DAVID BURDICK: Fantastic. Thanks, Kurt. Hi. My name is David Burdick. And I’m the CEO of
Stratus Medicine. At Stratus Medicine, we
partner with health systems to help them accelerate
their innovation programs. We leverage technology
and data science to really make impact
in health care. And I’m going to
talk to you today about a case study where
we’ve actually applied these health care API tools. So like I mentioned, we apply
technology and data science to health care. And we’ve worked with
both internal innovation teams and external vendors. And we’re leveraging medical
records and genomics, images, and machine learning
methods to really impact clinical care, clinical
decision support, health system administration. And what we’ve
seen frequently is that innovators in this
space don’t have access to the tools that would really
accelerate their workflows. It’s oftentimes relative
to data security, data use, and privacy, which are
all very important things. So how can we help health
care IT move to the Cloud? Right now they’re in this
world of virtual machines. They’ve built a nice
security workflow. It works for them. The boundaries are well-defined. But they can’t incrementally
move towards infrastructure as a service or
platform as a service where there’s a lot of value,
because ad hoc use is risky. You’re not going to turn
your developers loose with an API key. And in fact, when you do, you
get things like the data breach at U Dub medicine this spring,
which I was involved in– well, my name was involved in– where somebody just put a bunch
of– about a million patients’ worth of data into an unsecured
cloud bucket and Google indexed it. So it’s expensive for
IT to productize what the public clouds have created. And it’s hard for them
to support new trends like serverless or Kubernetes. And so our viewpoint
is that you really want health care to jump
from the VM, Gen 2 computing world to a software as a
service product that wraps up the public cloud
offerings into kind of an opinionated workflow
that’s specifically tailored to health care
and health care application deployment. So now, what we can
help health systems do is create a secure workflow
for application development and deployment, enforce
security best practices, do active breach prevention. And because a lot
of health systems don’t want to hire data
scientists and software engineers in-house, we can
provide professional services for that. So if we stand on the
shoulders of the public clouds with their infrastructure
and platform as a service, and we narrow that down to
a specific SaaS offering, then we can focus
on applications that provide real clinical value. And we want the
teams building those to be able to iterate
quickly and not do it with specialized skills. So let’s talk more about
how we can do innovation with these modern
tools, and specifically, clinical decision support. So for the last
five years, we have been a strategic partner of
the Sun Yat-Sen Cancer Center in Taiwan. They’re one of the premiere
cancer centers in Asia. They have a 325 bed hospital
that provides both inpatient and outpatient oncology care. And what’s really
interesting is they have been operating profitable
bundled payment for over 18 years. These are five year
long episodes of care. And in doing that,
their outcomes are as good or better
than top US institutions. In doing that, they’ve seen
what the keys to success are in the value
based care world. They need to leverage
data appropriately, have really good care
management, and effective use of their resources both in terms
of what treatments they use and the use of their staff. So the case study that
we’re going to talk about is distant metastasis
in breast cancer. So distant metastasis
is when a breast cancer spreads to other organs. And the typical
progression goes like this. You have the initial
adjuvant treatment. So that is your
first line treatment. And then, in some patients,
unfortunately, the cancer spreads to other organs. There’s a regimented follow
up, so every three months initially, then six months. But unfortunately,
there is this period, there’s this kind of problem
here where the cancer spreads. But the patient might
come into the clinic before a regularly scheduled
follow up with some symptoms. And are those symptoms
related to a recurrence? Or are they just pneumonia
or some other comorbidity? So it’s up to the physician’s
discretion to say, well, should I use advanced imaging? Right? That’s expensive and invasive
imaging, maybe a chest X-ray, and determine is this
related to their breast cancer that they had four years
ago, or is this just something else? So how do we help physicians
make that decision? And how do we standardize that
across the different medical oncologists in the group? So the goal here
is to create kind of the earliest identification
of recurrence in follow up. And we’ll create a
model that stratifies the risk of recurrence for
each individual patient, with the result
being that we have effective and efficient
use of advanced imaging, and we can reduce by
three to six months the time from the
actual cancer recurrence to start of
recurrence treatment. So generically, the
process looks like this. We have a DICOM
store for images. We leverage FHIR for
modeling the medical record. And we use registries
for curated parts of the medical record. We take these
three data sources, feed them into a machine
learning framework, and bring actionable
insight to the clinician. Let’s focus a little bit
on FHIR and registries. So like I said, we’re going to
model the entire medical record in FHIR here. And then we’ll
use the registries that are extracted from
that medical record to be a specific clinical area. So in breast cancer,
this would be what are the patient demographics? What is the pathologic
staging for that tumor? What was the initial
adjuvant treatment? And then what is their
stage of recurrence? Have they had a local
regional recurrence? Have they had a
distant metastasis? These registries are super
valuable to the Sun Yat-Sen. They use them for
retrospective analysis in things like SAS and Tableau. They use them for
training models. And they use them for billing. Because when you’re into
value based care reimbursement program, you need to
use these deep pieces in the medical record
in your billing. So what is FHIR? Probably most people
are familiar with it. But if you’re not, it
is an HL7 specification that both models medical records
data and also provides an API. It’s a directed
graph of resources. And these resources are things
like patient, practitioner, diagnostic report,
medication prescription. And you know FHIR means a lot
of things to different people. It’s a young spec,
less than a decade. And it was originally built for
health information exchange. So some people view it as
health system to health system. A lot of people with Apple’s
entrance to the market view it as a direct
to consumer model. But I think that it has
real value for data science. Because it can really help
us standardize our data engineering workflows. It’s a well-documented
specification where canonical data is stored
in a really consistent location throughout the spec. So for example, here’s
a procedure resource. And the procedure resource
has a subject key, which is who the procedure
was performed on. It also has a key
for the code, which is the identification
of the procedure. And we would like
it to be a SNOMED CT ontology that describes that. And there’s just one of them. And it’s CodeableConcept. The nice thing about that
is that we don’t need a data analyst to work hand-in-hand
with our programmers to understand the data
to build applications on top of the medical record. One organization that we’ve
worked with had a very large– their medical record had a
very large Oracle schema. All the tables were eight
characters long, in all caps, of course. And the knowledge
built into that schema to get the data out we needed
to create clinical applications was very, very deep. And you don’t want
that to impede every one of your
developers’ experience in accessing data to
build applications. Another great use of FHIR
is to normalize variability from data sources. So whether you’re combining
legacy data sources, or you’re combining the data
warehouse with a real time store, you can use
FHIR and its model to just normalize the data
input and have a consistent data engineering workflow. Like I mentioned, creating
structured registries from FHIR is really valuable. Of course, in the
health care APIs, you can dump
directly to BigQuery. But as most people working in
the field know, a lot of value is in unstructured data
and medical records. And so working with
health systems, we’ve found that
there’s a lot of value in writing very small targeted
programs that leverage natural language processing or
parsing, multiple ways where you can just write these
ad hoc programs to create highly structured and
really valuable data sets. So now that I’ve told
you why FHIR’s so great. Let me tell you why scalable
FHIR implementation is difficult. The specification
is really a deeply nested data structure. And when you try and
model this relationally, you end up with hundreds
of tables and thousands of foreign keys. And if you try and lay
an ORM on top of this, you can quickly overrun
the database’s capacity to do select queries,
because of a lot of that deep back linking. The spec also– these
references between this graph of medical record
models is essentially a generic foreign key. And so you have a ton
of generic foreign keys. Which, again, is
really hard to deal with at a scalable
implementation level. So we were actually talking
about building a FHIR on top of Spanner when I came
across the Google Healthcare APIs. And what’s great about
this is that they already did it for us. It’s a standards compliant
HL7 implementation. And so essentially
we can just drop this in into any of our previous
FHIR implementations. We don’t have to take
something off the shelf, manage the index size,
manage the database size. It’s fully managed
and high scalability. And this is just
an example of how I think that platform as
a service for health care can really accelerate it. Right? We’re not doing
the nuts and bolts of building our infrastructure. If we step back and look
at imaging in health care, I think in imaging
is interesting because it’s one,
it and genomics, are one of the few
high dimensional data sources in health care. So that’s where we can
really extract insight with machine learning. So of course, we can label our
images and build a classifier. In this case, it would
be in mammography. And we can take
TensorFlow off the shelf. We can train a model. And we can spin up
a TensorFlow server, or use ML engine
to serve our model. And off we go. But the problem is that data
sets in health care are small. There’s not a lot of health
information exchange. There are no
national aggregations that are widely accessible. And so transfer learning is
really key in health care. If you’re not familiar with
what transfer learning is, especially in the
context of images, it’s where you kind of
stand on the shoulders of a previous convolutional
neural network. And then you tack your
model onto the end. And so what’s encoded
over there, say, like, Inception
V3 from ImageNet, is what are the features
of an image generically? Can we take millions
or billions of images? And do line detection,
feature detection? And then we’ll just tack our
model onto the end of it. Again, we’ll sic our huge
team of data scientists inside our health system onto
that model and then deploy it. But as you guys
probably guessed, there’s not a huge team of
data scientists at every health institution. So how do we do this quickly? How do we iterate
and leverage machine learning on this high
dimensional data? For us, we found a lot of
value in AutoML Vision. So Auto ML Vision automates
that whole process for you. You don’t need specialized
TensorFlow knowledge to be able to train a
model on your DICOM images. And DICOM API also
does a nice thing, which is transcode your DICOM
images into JPEGs, which you need to do for training. And then, once it trains
and you find the right level of sensitivity and
specificity, it automatically deploys your model for you. So now that we have
our tools together, we’re going to use the
DICOM API for images, the FHIR API for a FHIR store. We use Cloud SQL
for our registries. You could also use
BigQuery and AutoML Vision. Let’s talk about how we’re
actually building this in the architectural level. So this is work that’s in
progress at the Sun Yat-Sen now. And if you’ve ever built
an ETL on health care data, you know that it’s
a real challenge. You want it to be very robust. You don’t want to
have missing data. And so how you do that can
be a little bit tricky. Our opinion is that you want
to write once and succeed. You don’t want to
have a chatty API. You really want the rate limiter
to be your internal system. And in this diagram, I’m
going to use blue arrows to denote a Pub/Sub asynchronous
message queue just for clarity. So let’s say we write a FHIR
resource into Cloud Storage. It kicks off a message
into a Pub/Sub. And the nice thing about
asynchronous message queues is that you’ll get
durable delivery, right? It’s at least once delivery. But at least once means that
it’s maybe twice or maybe three times. So we’ll leverage Cloud
Dataflow, which will give us exactly once delivery. And that’s where we’ll write
our deduplifying application. We’ll leverage the FHIR
spec’s use of identifier to query the FHIR API, for
example, for this patient. And say, hey, do you have
this patient with this MRN? If so, create it. Or if not, create it. If so, update it– and similar for the DICOM store. From there, when we create that
resource in the FHIR store, that kicks off an
asynchronous message that Cloud Function consumes
and updates the registry. And I can’t emphasize enough
how important having a living registry is that’s
constantly being updated. That is a huge value
for a health system, both in retrospective
and analytics, and also for machine learning. So now that we have all of our
medical record and our images in, we can use the FHIR
store and the registries to create a training test
and validation data set, hand that bulk over
to AutoML Vision, and have it train the model. So now we’re all ready to go. Let’s say a new FHIR resource
comes in through the API, or in this case, a new image
lands in the DICOM store. Again, we get an
asynchronous message that is consumed by a Cloud
Function that will actually run our model and deliver it. So we use the Cloud Function
to look into the registry and select appropriate
models for this patient and for the image type. We then run that model
in AutoML Vision. And we use an asynchronous
queue to deliver it back to the health systems EHR. That surfaced clinically
in the workflow. And what we have built here is
an entirely serverless system that is easy for us
to iterate and create new clinical decision
support products on. And we just have
to essentially do that upfront investment in ETL. So what we found that the
key to successful innovation is that you need to have a
SaaS cloud workflow, right? You can’t do
everything yourself. And if you can, if you can
make a secure workflow that allows you to access these Cloud
Tools, all of a sudden now, number two, you can move
fast with these platform as a service things that
I’ve been talking about. You’re not
reinventing the wheel. You’re not running a database. You’re not taking a FHIR
store off the shelf. And so now that you can
quickly create health care specific applications, you’re
able to leverage your expertise in house. The hard work is to
empower individuals in your organization to
solve clinical problems, and without specialized
skills, to really improve the patient experience in
health system operations. [MUSIC PLAYING]