Thanks for being here for Lecture 5, uh, of CS230. Um, today we have, uh, the chance to, to host, uh, a guest speaker Pranav Rajpurkar, who is a PhD student, um, in computer science advised by, uh, Professor Andrew Ng and Professor Percy Liang. So Pranav is, uh, [NOISE] is working on, um, AI and high impact projects, uh, specifically related to health care and natural language processing. And today he’s going to, to present, um, an overview of AI for healthcare and he’s going to dig into some projects he has led, uh, through case studies. So, uh, don’t hesitate to interact, I think we have a lot to learn from Pranav, and, um, he’s really an industry expert for AI for health care, um, and I, I lent you the mic Pranav. Thanks for being here. Thanks, Kian. Thanks for inviting me. Uh, can you hear me at the back, is the mic on? All right, fantastic. Well really glad to be here. Um, so I wanna cover three things today. The first is give you a sort of broad overview of what AI applications in healthcare look like. [NOISE] The second is bring you three case studies from the lab that I’m in, ah, as demonstrations of AI and health care research. [NOISE] And then finally, ah, some ways that you can get involved if you’re interested in applying AI to high impact problems in health care or if you’re from a healthcare background as well. Let’s start with the first [NOISE]. So one way we can decompose the kinds of things AI can do in healthcare is by trying to formulate levels of questions that we can ask from data. At the lowest level are what are descriptive questions. Here, we’re really trying to get at what happened. Then there are diagnostic questions where we’re asking why did it happen? If a patient had chest pains, I took their, [NOISE] x-ray, what does that chest x-ray show? If they have palpitations, what does their ECG show? Then there are predictive problems. Here, I care about asking about the future, what’s going to happen in the next six months? [NOISE] And then at the highest level, are prescriptive problems. Here, I’m really trying to ask, okay, I know this is the patient, this is the symptoms they’re coming in with, this is how, ah, their trajectory will look like in terms of, um, in terms of things that may happen that they’re at risk of, what should I do? And this is the real action point. And that’s I would say the, the gold mine but, ah, to get there requires a lot of data and a lot of steps. And we’ll talk a little bit more about that. So in CS230 you’re all well aware, ah, of the paradigm shift of, of deep learning. And if we look at machine-learning in, um, healthcare literature, we see that it has a very similar pattern is that we had this feature extraction engineer who was responsible for, um, getting from the input to a set of features that a classifier can understand, and the deep learning paradigm is to combine feature extraction and the classification into one step, ah, by automatically extracting features, which is cool. Here’s what I think will be the next paradigm, ah, shift for AI in healthcare, but also more generally. Is, ah, we still have a deep learning engineer up here, ah, that’s you, that’s me, ah, that are designing the networks, that are making decisions like a convolutional neural network is the best architecture for this problem, this specific type of architecture. There’s an RNN and CNN and whatever NN you can throw on there. But, what if we could just replace out the ML engineer as well? Ah, and I find this quite funny, because everyone, you know, in AI for healthcare, a question that I get asked a lot is, are we going to replace doctors with all these AI solutions? And nobody actually realizes that we might replace machine-learning engineers faster than we might replace doctors if this, er, is to be the case. And a lot of research is, ah, developing algorithms that can automatically learn architectures, some of which you might go through in this class. Great. So that’s the general overview. Now I wanna talk about three case studies in the lab [NOISE] of AI being applied to different problems. And because healthcare is so broad I thought I’d focus in on one narrow vertical, and let us go deep on that and that’s medical imaging. So I’ve chosen three problems, um, and one of them is a 1D problem, the second is a 2D problem is, and the third is a, is a t- 3D problem, so I thought we could- we can walk through [NOISE] all the different kinds of data here. Ah, so this is some work that was done early last year in the lab where we showed that we were able to detect arrhythmias at the level of cardiologists. Um, so arrhythmias are an important problem, affect millions of people, ah, this has especially come to light recently with the, ah, devices, like the Apple Watch which now have a ECG monitoring. Ah, and, ah, the thing about this is that sometimes you might have symptoms and know that you have arrhythmias, but other times, ah, you may not have symptoms and still have arrhythmias that can be addressed with, ah, with, ah, if, if you were to do an ECG. An ECG’s test is basically showing the heart’s electrical activity over time. The electrodes are attached to the skin, safe tests and it takes over a few minutes, and this is what it looks like when you’re hooked up to all the different electrodes. Ah, so this test is often done for a few minutes in the hospital, [NOISE] and the finding is basically that, uh, in a few minutes you can’t really capture a person’s abnormal heart rhythms. So let’s send them home for 24 to 48 hours with a Holter monitor, and let’s see what we can find. Um, there are more recent devices such as the Zio Patch, which let, um, let patients be monitored for up to two weeks, and [NOISE] it’s, it’s quite convenient. You can use it in the shower or while you’re sleeping, so you really can capture, ah, a lot of what- what’s happening in the hearts, uh, uh, ECG activity. But if we look at the amount of data that’s generated in two weeks, it’s 1.6 million heartbeats. That’s a lot, and there are very few doctors who’d be willing to go through two weeks of ECG reading for each of their patients. And this really motivates why we need automated interpretation here. But automated detection comes with its challenges. One of them is, you know, you have in the hospital several electrodes, and in more recent devices we have just one. Ah, and the way one can think of several electrodes is sort of th- the electrical activity of the heart is 3D, and, um, each one of the electrodes is giving a different 2D perspective into the 3D perspective, um, but now that we’ll have only one lead, we only have one of these perspectives available. Ah, and the second one is that the differences between the heart rhythms are very subtle. This is what a cardiac cycle looks like, and when we’re looking at, ah, [NOISE] arrhythmias or abnormal heart rhythms, ah, one’s going to look at the substructures within the cycle and then, ah, this structure between cycles as well. And the differences are, are quite subtle. So when we started working on this problem, oh, maybe I should share this story. So, uh, we started working on this problem and then it was, uh, me, my, my collaborator Awni and, uh, and Professor Ng. And one of the things that he, that he mentioned we should do he said, “Let’s just go out and read ECG books and let’s do the exercises.” And if you’re in med school, there are these, uh, books where, where you can, uh, where you can learn about ECG interpretation and then there are several exercises that you can do to test yourselves. Uh, so I went to the med school library, uh, you know, they have those, uh, uh, hand-cranked, uh, shelves at the bottom so you have to move them and then grab my book. And then we went for two weeks and did, uh, learned, so did go through two books and learned ECG interpretation and it was pretty challenging. And if we looked at previous literature to this, I think they were, sort of, drawing upon some domain knowledge here and that here we are looking at waves. How can we extract specific features from waves that doctors are also looking at? So there was a lot of feature engineering going on. And if you’re familiar with wavelet transforms, they were the, sort of, um, they were the most common approach, uh, with a lot of, sort of, like, different mother wavelets etc., etc., pre-processing band pass filters. So everything you can imagine doing with signals was done. And then you fed it into your SVM and you called it a day. With deep learning, we can change things up a bit. So on the left, we have an ECG signal and on the right as just, uh, three heart rhythms, we’re gonna call them A, B, and C. And we’re gonna learn a mapping to go straight from the input to the output. And here’s how we’re gonna break it out. We’re gonna say that every label, labels the same amount of the signal. So if we had four labels and the ECG would be split into these four, sort of, this rhythm is labeling this part. And then we’re gonna use a deep neural network. So we built a 1D convolutional neural network, uh, which runs over the time dimension of the input. Because remember, we’re getting one scalar, uh, over, over time. And then this architecture is 34 layers deep. Um, so I thought I’d talk a little bit about the architecture. Have you seen ResNets before? Okay. Uh, so should I go into this? I think you can go- they are going to learn about this next, next week. Okay. Cool. Here’s my one-minute spiel of ResNet then, is that you’re going deeper in terms of the number of layers that you’re having in a network. You should be able to represent a larger set of functions. But when we look at the training error for these very deep networks, what we find is that it’s worse than a smaller network. Now this is not the validation error, this is the training error. That means even with the ability to represent a more complex function, we aren’t able to represent the training data. So the motivating idea of residual networks is to say, “Hey, let’s add shortcuts within the network so as to minimize the distance from the error signal to each of my layers.” Uh, this is just mapped to say the same thing. So further work on ResNet showed that, okay, we have the shortcut connection, how should we make information flow through it the best? And, uh, the finding was basically that anything you, you add to the shortcut, to the highway, think of these as stop signs or, um, or, or signals on a highway. And it’s basically saying the fastest way on the highway is to not have anything but addition on it. [NOISE] And then there were a few advancements on top of that, um, like adding dropout and increasing the number of filters in the convolutional neural network, um, that we also add it to this network. Okay. So that’s the convolutional neural network. Let’s talk a little bit about data. So one thing that was cool about this project was that we got to partner up with a- uh, with a start-up that manufactures these hardware patches and we got data off of patients who were wearing these patches for up to two weeks. And this was from around 30,000 patients. Um, and this is 600 times bigger than the largest data set that, that was out there before. And for each of these ECG signals, what happens is that each of them is annotated by a clinical ECG expert who says, “Here’s where rhythm A starts and here is where it ends, so let’s mark the whole ECG that way.” Obviously, very time intensive but a good data source. And then we had a test set as well. And here we used, um, here we used a committee of cardiologists. So they’d get together sit in a room and decide, okay, we disagree on the specific point, let’s try to, let’s try to discuss which one of us is right or what this rhythm actually is. So they arrive at a ground truth after discussion. And then we can, of course, test cardiologists as well. And the way we do this, is we have them do it individually. So there’s not the same set that did the ground truth, there’s a different set of cardiologists coming in one at a time, you tell me what’s going on here and we’re gonna test you. [NOISE] So when we compared the performance of our algorithm to cardiologists, uh, we found that we were able to surpass them, um, on the F1 metrics. This is precision and recall. And when we looked at where the mistakes were made, uh, we can see that sort of the, the biggest mistake was between distinguishing two rhythms, which look very, very similar, um, but actually don’t have a difference in, um, in treatment. Here’s another case where the model is not making a mistake which the experts are making. Um, and turns out this is a costing mistake. This is saying a benign heart rhythm, uh, or what experts thought was a benign heart rhythm, was actually a pretty serious one. Um, so that’s, that’s one beauty of automation, is that we’re able to catch these, um, catch these misdiagnosis. [NOISE] Um, here are three hard blocks, uh, which are clinically relevant to catch on which the model outperforms the experts, and on atrial fibrillation which is probably the most common serious arrhythmia the same pulse. So one of the things that’s neat about this application and a lot of applications in healthcare, is what automation, uh, with deep learning, machine learning enables, is for us to be able to continuously monitor patients. And this is not something we’ve been able to do before, so a lot of even science of understanding how patients, uh, risks factors, uh, what they are, or how they change hasn’t been done before. And this is an exciting opportunity to be able to advance science as well. And the Apple Watch has recently, um, released a- their ECG monitoring. Um, and it’ll be exciting to see what new things we can find out about the health of our hearts from, uh, from these inventions. Okay. So that was our first- yeah, a question. Uh, how big of a problem did you find, uh, data privacy should be in confidence among this dataset, which is awesome, by the way. Yeah. Uh, so I repeat the question, how, uh, how difficult was it it to, uh, to, uh, sort of, um, deal with data privacy and sort of keep patients’, [NOISE] uh, pri- information private. So in, in this case, we do not- we had completely de-identified data, so it was just, um, someone’s ECG signal without any extra information about their, uh, clinical records or anything like that. So it’s, it’s very, it’s very de-identified. Um, sorry. I guess I had to ask that how did you get approval for that and were there problems in getting approval? Um, because, um, you know, there’s a lot of teachers that have concerns. So did you have to like get it like signed off by some credible authority or, what were the obstacles of getting that to be there? Oh, sure. And I think we can, we can take this question offline as well. But one of the beauties of working at Stanford is that there’s a lot of, uh, industry research collaborations and, uh, we have great infrastructure to be able to work with that. Uh, so which brings me on to my, uh, second case study. Sorry, yeah go for it. [inaudible] [NOISE] [inaudible] So how do you actually find that? That’s a good question. So just to repeat the question, how did we define the gold standard when we have experts setting the gold standard? Uh, so here’s how we did it. So one, one way to come up with a gold standard is to say, okay, if we looked at what a consensus would say, what would they say? And so we got three cardiologists in a room to set the gold standard, and then to compare the performance of experts. Uh, these were individuals who were separate from those groups of cardiologists, who sat in another room and said what they thought of the, um, of the ECG signals. So that way there’s, there’s some disagreement where the gold standard’s set by the committee. [NOISE] Great. So here we looked at how we can detect pneumonia off of chest X-rays. Uh, so pneumonia is an infection that affects, uh, millions in the, uh, US. Uh, its big global burden is actually in, um, in kids. Uh, so that’s where it’s really useful to be able to, uh, detect that automatically and well. So to detect pneumonia, there’s a chest X-ray exam. Uh, and chest X-rays are the most common, uh, imaging procedure, uh, with two billion chest X-rays done per year. And the way abnormalities are detected in chest X-rays is they present as areas of increased density. Uh, so where things should appear dark, they appear brighter or vice versa. And here’s what characteristically pneumonia looks like, where it’s like a fluffy cloud. Uh, but this is an oversimplification, of course, because, uh, pneumonia is when the alveoli fill up with pus, but the alveoli can fill up with a lot of other things as well, which lead to very different interpretations and diagnoses for the patients and treatment for the patient. So it’s quite confusing which is why radiologists trained for years to be able to do this. Um, the setup is we’ll take an input image of someone’s chest X-ray and output the binary label 01 which indicates the presence or the absence of pneumonia. And here we use a 2D convolutional neural network, uh, which is pre- pre-trained on ImageNet. Okay. So we looked at shortcut connections earlier and, uh, DenseNets had this, um, idea to take shortcut connections to the extreme. It says what happens if we connect every layer to every other layer instead of just connecting sort of one- instead of having just one shortcut which is what ResNet had. And, uh, DenseNet beat the previous state of the art, um, and has generally lower error and fewer parameters on the ImageNet challenge. So that’s what we used. Uh, for the dataset, when we started working on this project, uh, which was around October of last year, uh, there was this large dataset that was released by the NIH of 100,000 chest X-rays. And this was the largest public dataset at the time. And here each X-ray is annotated with up to 14 different pathologies. And the way this annotation works is there is an NLP system which reads a report, and then outputs for each of several pathologies, whether there is a mention, whether there is a negation like not pneumonia, for instance, um, and then annotates accordingly. And then for our test set, uh, we had four radiologists here at Stanford, independently annotate and tell us what they thought was going on in those X-rays. So one of the questions that comes up often in medical imaging, is we have, we have a model, we have several experts, but we don’t really have a ground truth. And we don’t have a ground truth for several reasons sometimes. One of them is just that it’s difficult to tell whether someone had pneumonia or not without additional information like their clinical record, or even once you gave them antibio- antibiotics, did they get treated? Uh, so really one way to evaluate whether a, uh, model is better than a radiologists or as well- doing, as well as the radiologist, is by saying, do they agree with other experts similarly? So that’s what we use here, that’s the idea. We say, okay, let’s have one of the radiologists be the, the, the prediction model we’re evaluating. And let’s set another radiologist to be ground truth. And now we’re gonna compute the F1-score once, uh, change the ground truth, do it the second time, change it again, third, and then also use the model as the ground truth and do it again. And we can use a very symmetric evaluation scheme. But this time having the model be evaluated against each of the four experts. So we do that and then we get a score for both of them. Well, for all of the experts and for the model. And we showed in our work that we were able to, uh, do better than the average radiologist at this task. Um, two ways to extend this in the future is to be able to look at patient history as well, and look at, uh, lateral radiographs and be able to improve upon this diagnosis. Um, at the time at which we released our work, um, on all 14 pathologies, we were able to outperform the previous state of the art. Okay. So model interpretation. Model interpretation. Yes, there’s a question. Going back to that slide that you had on future work. So one more slide, please, yeah. So, uh, if you have pneumonia and you present it to the doctor and you have like fever, you’re coughing, your ribs hurt from coughing too much, can’t sleep, all of those issues, that’s not included in the model. So I think my question is that if you go to a dataset and you’re trying to determine does this person have pneumonia or not? Like that’s one thing but you don’t- it’s not that you just don’t have that data, but you’re not looking at other images, let’s say does that person have cancer, or does that person have like, other infections of the body because they feel cold [inaudible]. And so all those are, um, images that you’re not really looking at. So let’s say in a tough situation, so the obvious situation isn’t really giving much to do it, right? But in a tough situation that you get a patient that has a fever and he’s coughing violently, you don’t know if it’s cancer or pneumonia or Lachlan disease. Then how do you- how do you get your algorithm to work in that condition? And also if you’re not including all those other cases, then it’s not just that, what’s the use of it? But, like you know what I’m saying? Yeah. Well, so I’m trying to keep this technical since it’s a technical class. What, is there a neural network architecture that you would use to be able to solve problem number one? Is it multi-task learning? Is it like- Sure, sure. Okay, so let me try to boil those sort of sets of questions down. Uh, so one is patients are coming in, we’re not getting access to their, uh, clinical histories. So how are we able to make this determination at all? So one thing is that when we’re training the algorithm, we’re training the algorithm on, on, uh, pathologies extracted from radiology reports. And these radiology reports are written with understanding a full clinical history and understanding of, uh, sort of what the patient presented with in terms of symptoms as well. So we’re training the model on, on these radiology reports, which had access to more information. And the second is that the utility of this is not as much in being able to compare a patient’s x-rays day-to-day, as much as there is a new patient, uh, with a set of symptoms, and can we identify things from their chest X-rays? Which brings us to model interpretation. So if you were a end user from model, um, oh I- so when I was back in undergrad, um, and I was in the lab, we were working on autonomous cars. Um, and I thought about this a lot, how many of you’ve been in an autonomous car? How many of you would trust being in an autonomous car? [NOISE] [LAUGHTER] All right. Cool. Yeah, I thought about this as well. Would I trust being in an autonomous car? And I thought it would be pretty sweet if the algorithm that was, that was in the car would tell me whatever decision it was going to make in advance. I know that’s not possible at high speeds, so that, you know, just in case I disagreed with a particular decision, uh, I could say, no abort. And, uh, and have the model sort of, you know, uh, remake its decision. And I think the same holds true in healthcare as well. Though one advantage that happens in healthcare is rather than having to make decisions within seconds like in the case of the autonomous car, there is often a larger time frame like minutes or hours that we have. And, and here it’s, it’s useful to be able to inform the clinician that’s treating the patient to say, hey, here’s what my model thought and why. So here’s the technique we use for that class activation maps which you may cover in another lecture. Uh, so I’ll just, I’ll just leave it at saying that there are ways of being able to look at what parts of the image are most evident of a particular pathology, uh, to generate these, these heat-maps. Uh, so here’s a heat map that’s generated for, uh, pneumonia. So this X-ray has pneumonia and I can, ah, and, and, um, uh, and the algorithm in the red is able to highlight the areas where it thought was most problematic for that. Uh, here’s one in which ab- it’s able to do a collapsed right lung. Here’s one in which able- it’s able to find a small cancer. And here the goal is to be able to improve healthcare delivery, where, um, in the developed world, one of the things that’s useful for is to be able to prioritize the workflow, make sure the radiologists are getting to the patients most in need of care before ones whose X-rays look more normal. Uh, but the second part which I’m, ah, quite excited about is to increase the access of medical imaging expertise globally. Uh, where right now, the, the World Health Organization estimates that about two-thirds of the world’s population does not have access to diagnostics. Um, and so we thought, “Hey, wouldn’t it be cool if we just made an app that was able to, uh, allow users to upload images of their, um, of X-rays and be able to give its diagnosis?” Ah, so this is still in the works, so I’ll show you what we’ve got running locally. And so here, I’m presented with a screen that asks me to upload an X-ray. And so I have, I have several X-rays here, um, and I’m gonna pick the one that says, ah, cardiomegaly. So cardiomegaly refers to the enlargement of the heart. [NOISE] So I uploaded it, now it’s running, the model is running in the back end. And within a couple of seconds, it’s outputted its diagnoses on the right. So you’ll see the 14 pathologies that the model is trained on being listed, and then next to them a bar. Uh, and at the top of this list is cardiomegaly, which is, um, what this patient has the, the heart is sort of extending out. And if I hover on cardiomegaly, I can see that the probability, ah, is displayed on there. And now we talked about interpretation. How do I believe that this model is actually looking at the heart rather than looking at something else? And so if I click on it, um, I get the class activation map for this, which shows that indeed it is focused on the heart, uh, to be able to, um, and, and is looking at the right thing. So I guess you can say the algorithms- heart’s in the right place. [LAUGHTER] Cool. Uh, but I thought- so this is an image that I got from the, the dataset that we were using NIH. But it’s pretty cool if an algorithm is able to generalize to populations beyond. And so I thought what we’d do is we could just look up, um, [NOISE] look up an image of cardiomegaly, uh, and download it and just see if our model’s able to- this one looks pretty large, so is this. I don’t want an annotated one. All right. That’s good. So we can do that, save it, desktop. And now we can upload it here. And it’s already re-done its thing and on the top is cardiomegaly once again. [NOISE] So it’s able to generalize to- and there’s the highlight. So it’s able to generalize to populations beyond just the ones it was trained on. So I’m very excited by that. And what I got even more excited by is, uh, we’re thinking of deploying this out in, um, out in different parts of the world, and when we got an image, uh, that showed how X-rays are read in, uh, this hospital that we’re working with in Africa, uh, this is what we saw. And so the idea that one could snap a picture and upload it, seems- and get a diagnosis seems very powerful. Um, so the third case study I wanna take you through is, um, being able to look at MR. So we’ve talked about 1D, a 1D setup where we had an ECG signal. We’ve talked about a 2D set-up with an X-ray. How many of you thinking of working on a 3D problem for your project? Okay. A few. Well, that’s good. Cool. Um, so here we looked at knee MR. So MR’s of the knee is the standard of care to evaluate knee disorders, and more MR examinations are performed on the knee than any other part of the body. Um, and the question that we sought out to answer was, can we identify knee abnormalities? Um, two of the most common ones include an ACL tear and a meniscal tear at the level of radiologists. Now with the 3D problem, one thing that we have that we don’t have in a 2D setting is the ability the he loo- to look at the same, same thing from different angles. And so when radiologists do this diagnosis, they look at three views; the sagittal, the coronal, and the axial, which are, [NOISE] which are three ways of, uh, looking through, uh, the 3D structure of the knee. And in an MR you get different types of series, ah, based on the magnetic fields, and so here are three different, um, series that are, that are used. And what we’re gonna do is output for a particular knee MR examination, the probability that it’s abnormal, the probability of an ACL tear, and the probability of a meniscal tear. The important thing to recognize here is this not a multiclass problem, and that I could have both types of tears and it’s a multi-label problem. [NOISE] So we’re gonna train a, uh, convolutional neural network for every view-pathology pair. So that’s nine convolutional networks, and then combine them together, uh, using a logistic regression. So here’s what each convolutional neural network looks like. I have a bunch of slices within a view. I’m gonna pass each of them to a feature extractor, I’m gonna get an output probability. So we had 1,400 knee MR exams from, uh, the Stanford Medical Center, and, uh, we tested on 120 of them, where the majority vote of three, uh, subspecialty radiologists established the, the ground truth. And we found that we did pretty well on, on the three tasks, and had the model be able to pick up the different abnormalities pretty well. And one can extend these, these methods of interpretive- interpretability, uh, to, uh, to 3D, 3D inputs as well. So that’s what we did here. Okay. So, uh, I, I saw this, I saw this cartoon a few, a few weeks ago and I thought it was, it was pretty funny, uh, which is a lot of machine learning engineers think, uh, that they don’t need to externally validate, which is to find out how my model works on, uh, works on data that’s not my- where my original data set came from, so there’s, uh, there’s a difference in, in distributions. Uh, but it’s really quite, uh, exciting when a model does generalize to, to datasets that it’s not seen before. And so we got this dataset that’s, that’s public from a hospital in Croatia. And here’s how it was different. So it was a different, it was a different kind of series, with different magnetic properties. Uh, it’s a different scanner and it was a different institution in a different country. And we asked, “Okay, what happens when we run this model off-the-shelf that was trained on Stanford data but tested on that kind of data?” And we found that it did relatively well without any training at all. [NOISE] But then when we trained on it, we found that we were able to outperform the previous lead best-reported result on the dataset. So there’s still some work to be done in being able to generalize, um, sort of my network here that was trained on my data to be able to work on datasets from different institutions, different countries as well, but we’re making some steps along that way, it remains a very open problem for taking. [NOISE] And it, and it is a very [inaudible] in hospital that has some sort of the [inaudible] Yeah. So we did the best we could in terms of processing. So we had- so one of the preprocessing steps that’s important is being able to, uh, get the mean of the, of the input data to be as close to the mean of the input data that you trained on. Uh, so that was one preprocessing step we tried, but we were trying to minimize that to say out of the box, how would this work? If we had never seen this data before, how would it work on that population? So one big topic in across, uh, a lot of applied fields is asking the question, okay, we’re talking about models working automatically autonomously, how would these models work in- when working together with experts in different fields? And here we asked that questions about radiologists and about imaging models. Would it be possible to be able to boost the performance if the model and the radiologists work together? And so that’s really the set-up. A radiologist with model, is that better than the radiologists by themselves? And here’s how we set it up. We set- let’s have experts read the same case twice separated by a certain set of weeks, um, and then see how they would perform on the same set of cases. And what we found, that we were able to increase the performance generally with a signifant- significant increase in specificity for ACL tears. That means if someone- if a patient came in, uh, without a, uh, without an ACL tear, I’d be able to, uh, find it better. So in the future- yes, question? Would you have any bias, the opinion of the radiologist, or is that the intended thing that you wanna kind of bias in the opinion for that it actually looks at the patient’s health? Yeah. So that’s a good question, and I, and I think how- so, uh, sort of automation bias captures a lot of this, and that once we have sort of models working with, um, experts together, can we expect that the experts will sort of take it less seriously because that’s, that’s a big concern, and start relying on what the model says and says, “I won’t even look at this exam. I’m just gonna trust what the model says blindly.” Um, that’s absolutely possible in a very open area of research. Some of the ways that people have tried to address it is to say, “You know what I’m gonna do from time to time? I’m gonna pass in an exam to the radiologist for which I’m gonna to flip the answer and I’ll know the right one. And if they get that wrong, I’ll alert them, that you’re relying too much on the model, uh, stop.” Uh, but there are a lot of more sophisticated ways to go about addressing automated bias. And as far as I know, it’s a very open field of research, especially as we’re getting into deep learning assistance. And one utility of this is to say basically that the set of patients don’t need a follow-up, let’s not send them for unnecessary surgery. Great. So I shared, uh, three case studies from the lab. The final thing I wanna do is to talk a little bit about how you can get involved if you’re interested in applications of AI to healthcare. Uh, so the first is, uh, the ability for you to just get your hands dirty with, uh, datasets and, and be able to try out your own model. So we have, uh, from our lab, released, uh, the MURA dataset, which is a large dataset of, uh, uh, bone X-rays, and the task is to be able to tell if it’s, um, if the X-rays are normal or not, and they come from different, um, parts of the- of the upper body, um, and that’s- that’s what the dataset X-rays look like. And this is a pretty interesting setup because you have more than one view, uh, so more than one angle for the same body part, for the same study, for the same patient, and the goal is to be able to combine these well, uh, into convolutional neural network and, and be able to output the probability of an abnormality. And one of the interesting things here for transfer learning as well is, do you wanna train the models differently per body part or do you wanna train them, uh, train the same model for body parts or combine certain models? Uh, so a lot of design decisions there. And this is what trai- some trained models look like. This is a model baseline that we released that’s able to identify a fracture here and a piece of hardware on the right. Um, and you can download the dataset off our website. So if you Google, uh, MURA dataset or go on our website stanfordmlgroup.github.io, you should be able to find it. Um, the second way to get involved is through the AI for Healthcare Bootcamp, which is a two-quarter long program that our lab runs, um, which provides, um, students coming out of, uh, classes like 230, an opportunity to get, uh, involved in research. And here’s, uh, students receive training from, uh, PhD students in the lab and medical school faculty, um, to work on structured products over two quarters. Um, and if you have a background in sort of, uh, AI, which you do, uh, then you’re encouraged to apply. And we’re working on a wide set of problems across radiology, uh, EHR, public health, and pathology right now. Um, this is what the lab looks like. We have a lot of fun. Um, and the applications for the bootcamp starting in the winter are now open. So the early applications deadline is November 23rd, and you can go on this link and, um, and, and apply. Uh, so that’s my time. Thank you so much for having me and thanks for having me, Kian. [APPLAUSE] [NOISE] Let me set up the microphone. [NOISE]. Do you wanna take one or two questions? Yes, I’ll take a couple of questions. All right. Let me ask- I’ll ask a question about the privacy concerns, uh, and further other ethics concerns. What about compensation for the medical experts that you’re potentially putting out of business, uh, with the free tool like the one that you’re, you’re developing or, you know, in just in general? Because their, their knowledge is being used to train these models, it’s not free. Uh-huh. Yeah. So the question was we’re having these, uh, automated AI models trained with the knowledge of medical experts, um, and what are ways in which we’re thinking of compensating these medical experts, uh, right now or in the future when we have, uh, possibly automated models? Um, I think a lot of people are thinking about these problems and working on them, uh, right now. There are a variety of approaches, uh, that people are thinking about in terms of economic incentives and there’s a lot of, uh, fear about sort of will AI actually work with or augment experts in whatever field they’re working on. I don’t have a great, uh, silver bullet for this, uh, but I know there’s, there’s a lot of work going on in there. [NOISE] I just wanted to know, um, when you’re looking through, uh, MRIs, we should have- looking at four or five category of issues like we used there, one of them is the most likely. Uh, it’s possible that a human looking at it could point out something that was not being looked at by the AI model at that time? Yeah. So how do you address it? Yeah. That’s a great question. So the- just to repeat the question, it’s, uh, we ha- we’re looking at MR exams and we’re saying for these three pathologies, we’re able to output the probabilities, what happens if there’s another pathology that we haven’t looked at? Uh, so I have a couple of answers for that. The first is that one of the- one of the categories here was simply to tell whether it was normal or abnormal. So the idea here is that the abnormality class will capture a lot of different pathologies there, at least the ones seen at Stanford. Uh, but it’s often the case that we’re building for one particular pathology, and then there’s obviously a, um, a burden on the, the model and the model developers to be able to convey, “Hey look, our algorithm model only does this and you really need to watch out for everything else that the model doesn’t cover.” Maybe that’s the- unless there’s one more question? No. All right. That’s the last question we’ll take then. Thank you once again. Thanks, man. [APPLAUSE] So now you’ve got, you’ve got the, the perspective. Is the microphone working? Yeah. Now you’ve got the perspective of an AI researcher working in healthcare. Now you are going to be the AI researcher, researcher working in healthcare. We’re going to go over a case study, and that is targeted at skin disease. So, you know, uh, in order to detect skin disease, sometimes you take pictures, microscopic pictures of cells on your skin, and then analyze those pictures. So that’s what we’re going to talk about today. So let me talk about the problem statement. You’re a deep learning engineer and you’ve been chosen by a group of healthcare practitioners, uh, to determine which parts of a microscopic image corresponds to a cell. Okay. So here is how, how it looks like. Um, on the, the, the black and white, it’s not a black and white image, it’s a color image but looks black and white. The input image is the, the one that is closer to me, um, and the yellow, um, one is the ground truth that has been labeled by a doctor, let’s say. So what you’re trying to do is to segment the cells on this image, and we didn’t talk about segmentation yet or a little bit. Segmentation is, uh, is about producing, uh, value- a class for each of the pixels in our image. So in this case, each pixel would correspond to either no cell or cell, zero or one. And once we output a matrix of zero’s and one’s telling us which pixels corresponded to a cell, we should get hopefully a mask like the yellow mask that I overlapped with the input image. Does that make sense? Yeah. Isn’t there a third category that’s the boundary, because in the colored image, the yellow one you don’t have the boundaries for the cell. Yeah, we’ll talk about the boundary later. But right now, assume it’s a binary segmentation, so zero and one, no cell and cell. Okay? So uh, it’s going to be very interactive, uh, and I think we’re going to use Menti for several question and group you guys into groups of three. So here are other examples of images that were segmented with a mask. Now, doctors have collected 100,000 images coming from microscopes, but the images come from three different microscopes. There is a type A, type B, and type C microscope, and the data is splitted between these three as 50 percent for type A, 25 percent for type B, 25 percent for type C. Um, the first question I’ll have for you is, given that the doctors want to be able to use your algorithm on images from the microscope of type C, this microscope is the latest one, it’s the one that is going to be used widely in the field, and they want your, your network to work on this one. How would you split your dataset into train, dev, and test set? That’s the question. And please group in teams of two or three and discuss it for a minute, uh, on how you would split this dataset. [NOISE] [OVERLAPPING] You can start going on Menti and, and write down your answers as well. [NOISE]. [OVERLAPPING] Okay. So take, uh, 30 seconds to input your, your insights on, on Menti. You can do one per team, and we’ll start going over some of the answers here. Okay. Dev test least split C train on A plus B, 20k in train, 2.5 in dev and test. Training 80 all A, all B, 5KC dev, 10KC test 10KC. 95-5 where test and dev is from population we care about. I think these are good answers. I think there’s no perfect answer to that, but two things to take into consideration. You have a lot of data so you probably wanna split it into 95-5 closer to that than to 60-20-20. And most importantly, you want to have C images in the des- dev and test set to have the same distribution among these two. That’s what you’ve seen in the third course, uh, and we would prefer to have actually C images in the train set. You wanted your algorithm to have C images. So I would say a very good answer is, is this one. 95-5 where the 5-5 are exclusively from C, and you also have C images in the 90 percent of training images. Any other insights on that? Whatever is- yeah. [NOISE] How do we type that, like microscopes A and B dataset doesn’t have some, like, hidden, you know, feature that will mess up the training. Yeah. So, there is much more thing we didn’t talk about here. One is how do we know what’s the distribution of microscope A images and microscopy B images versus microscope C. Do they look like each other? If they do, all good. If they don’t, how can we, how can we make sure the model doesn’t get bad hints from these two distributions. Uh, another thing is data, data augmentation. We could augment this dataset as well, and try to get as much as C-distribution images as possible. We’re going to talk about that. Okay. Split has to roughly be 95-5 not 60-20-20, distribution of dev and test sets has to be the same, containing images from C, and there also- should also be C image in the training set. Now, talking about data augmentation. Uh, do you think you can augment these data? And if yes, give only three distinct method you would use. If no, explic- explicate- er, explain why you cannot. You wanna take 30 seconds to talk about it with your neighbors? Yeah. [OVERLAPPING] Okay. [NOISE]. Okay. Guys, let’s go over some of the answers. So rotation, zoom, blur. I think looking at the images that we have from the cells, this might work very well. Uh, rotation, zoom, blur, translation, uh, combination of those, stretch, symmetry, like, probably a lot of those work. One follow-up question that I’ll have is, can you, can someone give an example of, uh, a task where data augmentation might hurt the model rather than helping it. [NOISE] Yeah. If I wanna overfit on the test set. If you want to overfit on the test set. Can you be more precise? [NOISE]. Like, and you don’t wanna generalize too much. Oh, you don’t want your model to generalize too much? Okay. [NOISE] Yeah, that, there, there are some cases where you don’t want the mo- model to generalize too much, especially, you know, doing encoding, but any, any other ideas? You’re doing like face detection, you wouldn’t want the face to be, like, upside down or, like, either side. I see. So if you do face detection, you probably don’t want the face to the upside down, although we never know depending on the use. [LAUGHTER] but, uh, it’s, it’s not gonna help much if the camera is always like that and it’s filming humans that are not upside down. Any, but I don’t think it’s gonna hurt the model. It’s probably going to not help the model, I guess. Yeah. Anything [NOISE] if you like, stretch the image then that will be inaccurate. Yeah, good point. So, there are, there are algorithms like maybe, you know, FlowNet. It’s an algorithm that- that’s used for, uh, on videos to detect the speed of a car, let’s say. Uh, if you stretch the images, you, probably you cannot detect the speed of the car anymore. Any other examples? [NOISE] Yeah. Character recognition. Character recognition I think is a good example. So, let’s say, you’re, you’re trying to detect [NOISE] what this is and you do symmetry flip and you get that, you know, like you- you’re, you’re labeling as B everything that was D and as D everything that was B. For nine and six it’s the same story. So these data augmentations are actually hurting the model because you don’t relabel when you data, when you augment your data, all right? Okay. [NOISE] Okay. So yeah, many augmentation methods are possible: cropping, adding random noise, um, changing contrasts. I think data augmentation is super important. I remember a story of, um, of a company that was working on, uh, self-driving cars and, and also, uh, virtual assistants in cars, you know what, like, this type of interaction you have with someone in your car, a virtual assistant, and they noticed that the speech recognition system [NOISE] was actually not working well when the car was going backwards. Like, no idea why, like, why. It just doesn’t seem related to the speech recognition system of the car. And they test it out and they, they, they looked and they figured out that people, uh, were putting their hands in the passenger seat looking back and talking to the virtual assistant. And because the microphone was in the front, the voice was very different when you were talking to, to, to the back of the car rather than the front of the car. And so they used data augmentation in order to augment their current data. They didn’t have data on that type of, of people talking to the back of the car. So by augmenting smartly, you can change the voices so that they look like they were used by someone who was talking to the back of the car and that solved the problem. Okay. Um, small question. Uh, we can do it quickly. What is the mathematical relation between nx and ny? So remember we have an RGB image [NOISE] and we can, we can flatten it into a vector of size nx, and the output is a mask of size ny. What’s the relationship between nx and ny? Someone wants to go for it? [NOISE] They’re equal. They’re equal. Who thinks they’re equal? Who thinks they are not equal and why? [NOISE] Based on why, because you have RGB on this side and you just have one color [inaudible]. Exactly. Ny would be 3nx, uh, sorry, nx would be 3ny because you have RGB images and for each RGB pixel, you would have one output zero or one. Okay. That was a question on one of the midterms. It was a complicated question. Uh, what’s the last activation of your network? Sigmoid. You want probably an output zero and one. Uh, and if you had several classes, so later on we will see we can also segment per disease, then you would have a softmax. Uh, what loss function should we use? [NOISE] I’m gonna give it to you to go quickly because we don’t have too much time. You’re going to use, uh, [NOISE] a binary cross-entropy loss over all the output and the entries of, of the output of your network. Does that makes sense? So always think, that thinking through the loss function is interesting. [NOISE] Okay. So you, you have a first try and, and you’ve coded your own neural network that you’ve, uh, that you’ve named model M1, M1 and you’ve trained it for 1000 epochs. It doesn’t end up performing well. So it looks like that. You give it the input image through the model and get an output that is expected to be the following one but it’s not. So one of your friends tells you about transfer learning and they, they, they tell you about another labeled data set of one million microscope images, that have been labeled for skin disease classification, which are very similar to those you wanna work with from microscope C. So a model M2 has already been trained by another research lab on these new data sets on a 10-class disease classification. And so here is an example of input/output of the model. You have an input image that probably looks very similar to the ones you’re working on. The network has a certain number of layers and a softmax classification at the end that gives you the probability distribution over the disease that seems to correspond to this image. So they’re not doing segmentation anymore, right? They’re doing classification. Okay. So the question here is going to be, you want to perform transfer learning from M2 to M1, what are the hyper parameters that you will have to tune? It’s more difficult than it looks like. So think about it, discuss with your neighbors for a minute. Try to figure out what are the hyper parameters involved in this transfer learning process. [NOISE] Okay, take 15 more seconds to wrap it up. [NOISE] Okay. Let’s see what you guys have. Learning rates. It is a hyperparameter. I don’t know if it’s specific to the, to the transfer learning, weights of the last layers. So I don’t think that’s, ah, a hyperparameter. Weights are parameters. New cost function for additional output layers. I think that’s a hyper- the choice of the loss you might count it as a parameter. I don’t think it’s specifically related to transfer learning. You will have to train with the loss you’ve used on your model M1. Number of new layers, yeah, weights of the new, another hyperparameter. Okay. Last one or two in the layers of M2. So do we train, what do we fine tune? There’s a lot about layers actually. Size of added layers, not sure. [LAUGHTER] Okay, let- let’s, let’s go over it together because it seems that there’s a lot of different answers here. Um, [NOISE] I’m trying to write it down here. So let’s say we have, we have the model M2. [NOISE] Is it big enough for the back? We have the model M2, and so we give it an input image. [NOISE] Okay input. [NOISE] And the model M2 gives us a probability distribution, softmax. So we have a softmax here. [NOISE] Yo- you will agree that we probably don’t need the softmax layer. We don’t want it, we want to do some segmentation. So one thing we have to choose is, how much of this pre-trained network? Because it’s a pre-trained network. How much of this network do we keep? Let’s say, we keep these layers. Because they probably know the inherent salient features of the dataset like the edges of th- the cells that we’re very interested in. So we take it. So we have it here. [NOISE] And you agree that here we have a first hyper-parameter. That is L. The number of layers from M2 that we take. Now, what other hyperparameters do we have to choose? This is L. We probably have to add a certain number of layers here, in order to produce our segmentation. So there’s probably another hyperparameter. [NOISE] Which is L_0. How many layers do I stack on top of this one? And remember, these layers are pre-trained. [NOISE] But these ones are randomly initialized. [NOISE] That makes sense. So two hyperparameters. Anyone sees a third one? [NOISE] The third one comes when you decide to train this new network. You have the input image. [NOISE] Give it to the network. Get the output segmentation mask. Segmentation mask let’s say seg mask. [NOISE] And what you have to decide is how many of these layers will I freeze? How many of the pre-train layers I freeze? Probably, if, if you have a small dataset, you prefer keeping the features that are here freezing them, and focusing on retraining the last few layers. So there is another hyperparameter which is, how much of this will I freeze L_f. What does it mean to freeze? It means during training, I don’t train these layers. I assume that they’ve been seeing a lot of data already. They understand very well the edges and less complex features of the data. I’m going to use my new- my small dataset to train the last layers. So three hyperparameters. [NOISE] L, L_0, and L_f. Does that makes sense? [NOISE] Okay. So this is for transfer learning. So it looks more complicated than the question- the question was more complicated than it looked like. Okay. Let’s move, where am I? Okay. Let’s go over another question. Okay. So this, we did it. Now it’s interesting because, ah, here we have an input image, and in the middle, we have the output that the doctor would like. But on the right, you have the output of your algorithm. So you see that there is a difference, between what they want and what we’re producing. And it goes back to someone mentioned it earlier. There is a problem here. How do you think you can correct the model, and/or the dataset to satisfy the doctor’s requests? So the issue with, with this image is that, they want to be able to separate the cells among them, and they cannot do it based on your algorithm, its still a little hard. There is, there is something to add. So can someone come up with the answer. Or do you want to explain actually you mentioned one of the answers so that we, we can finish this slide, yeah? Ah, you wanna add boundaries because now it looks like you could have like three cells on the bottom left blurring in together. And so if you answer adding boundaries, it makes the cells more well-defined. Good answer. So one way is when you label your datasets, originally you labeled with zeros and ones, for every pixel. Now, instead you will label with three classes, zero, one or boundary. Like let’s say zero, one, two, for boundary or even the best method I would say is that for each pixel, for each input pixel, the output will be [NOISE] the corresponding- okay, this one is not good. [NOISE] The corresponding label, like this is a cell picture. [NOISE] P of cell, P of boundary [NOISE] and P of no cell. What you will do is that instead of having a sigmoid activation you will use a softmax activation. Okay, and the softmax will be for a pixel. Um, one other way to do that, if it still doesn’t work, doesn’t work even if you labeled the boundaries. What is another way to do that? You relabel your datasets by taking into account the boundaries. The model still doesn’t perform well. I think it’s all about the weighting of the loss function. It’s likely that the number of pixels that are boundaries are going to be fewer than the number of pixels that are cells or no cells. So the network will be biased towards predicting cell or no cell. Instead, what you can do is, when you compute your loss function, your loss function should have three terms. One, binary cross-entropy let’s say for no cell, one for cell, [NOISE] and one for boundary. [NOISE] Okay, and this is going to be summed over, i equals 1 to n_i. The whole output pixel values. What you can do is to attribute a coefficient to each of those; alpha, beta or one. And by tweaking these coefficients, if you put a very high- a very low number here and there, it means you’re telling your model to focus on the boundary. You telling th- the model that if you miss the boundary, it’s a huge penalty. We want you to train by figuring out all the boundaries. That’s another trick that you could use. One question on that. Yeah. When you say you’re relabeling your dataset you- so like you do that manually or is that [inaudible] Good question. What do I mean by rela- relabeling your dataset? This, this- last Friday’s section has been be labeling bounding boxes, you know, for the YOLO algorithm. So the same tools are available for segmentation where you have an image, and you would draw the different lines. Ah, in practice, if the more- if the tool that you were using, the line used will just count as a cell, everything including the line with, with- everything inside what you draw. Plus the boundary we count as cell and the rest has no cell, it’s just a line of code to make it different. The line you drew will count as boundary. Everything inside will count as cell, and everything outside will count as no cell. So it’s the way you use your labeling tool. That’s all. So do we make alpha and beta static or do we make alpha and beta variable parameters like we fit in other, you know, in-between. I think it’s not learnable parameters. It’s more hyperparameters to tune. [NOISE] So the same way you tune lambda for your regularization, you would tune alpha and beta. So when you make a distinction like if that becomes an attention mechanism, how do you combine those two terms? So this is not an attention mechanism because it’s just a training trick. I would say. You cannot know, ah, how much attention we tell you for each image, how much the model is looking at this part versus that part. This is not going to tell you that, it’s just a training trick. [NOISE] What’s the advantage to doing it this way as opposed to like object detection like detecting each cell? So the question is what’s the advantage of doing segmentation rather than detection? Yeah. Yeah, so detection means you want to output a bounding box. If you output the bounding box, what you could do is output the bounding box, crop it out, and then analyze the cell and try to find the contour of the cell. But if you want to separate the cells, if you want to be very precise, segmentation is going to work well. If you want to be very fast, bounding boxes would work better, I think that’s the general way. Segmentation is not working as fast as the YOLO algorithm works for object detection. Yeah. I would say that. But it’s more- much more precise. Okay. So modify the datasets in order to label the boundaries, on top of that you can change the loss function to give more weight to boundaries or penalize false positives. Okay. Ah, we have one more slide I think. Ah, so let’s go over it. So now th- the doctors, they give you a new dataset that contain images similar to the previous ones. Uh, the difference is that each linked image now is labeled with zero and one. Zero meaning, there are no cancer cells on that image, and one means there is at least a cancer cell on this image. So we’re not doing segmentation anymore. It’s a binary classification; image, cancer or no cancer. Okay. So you easily build the state-of-the-art model because you’re you’re a very strong person in classification, uh, and you achieve 99 percent accuracy. The doctors are super happy, and they ask you to explain the network’s prediction. So given an image classified as one, how can you figure out based on which cell the model predicts one? So Pranav talked a little bit about that. There are other methods that you should be able to figure out right now. Even if you don’t know class activation maps. [NOISE] So to sum it up. [NOISE] We have an image, [NOISE] input image, [NOISE] put it in your new network that is a binary classifier. [NOISE] And the network says one. You wanna figure out why the network says one, based on which pixels, what do you do? Visualize the weights. [NOISE] Visualize the weights. Uh, what do you visualize in the weights? The edges. So I think visualizing the weights, uh, is not related to the input. The weights are not gonna change based on the input. So here you wanna know why this input led to one. So it’s not about the weights. [NOISE] Do you mark the gradients on each pixel of the input and see which one [inaudible]. Good idea. So you know, after you get the one here, this is Y hat, basically. It’s not exactly one, let’s say it’s 0.7 probability. What you gotta remember is that this number derivative of Y hat with respect to X, is what? It’s a matrix of shapes same as X, you know, it’s a matrix. And each entry of the matrix is telling you how much moving this pixel influences Y hat. Do you agree? So the top left number here is telling you how much X1 is impacting Y hat. Is it or not? Maybe it’s not. If you have a cat detector and the cat is here, you can change this, this pixel is never gonna change anything. So the value here is going to be very small, closer to zero. Let’s assume the cancer cell is here, you will see high number in this part of the matrix because these, these pi- these are the pixel that if we move them, it will change Y hat. Does it make sense? So quick way to interpret your network. It doesn’t- it’s not too, too good, like, you’re not gonna have tremendous results. But you should see these pixels have higher derivative values than the others. Okay. That’s one way. And then we will see in two weeks, uh, how to interpret neural networks, visualizing the weights included and all the other methods. Okay. So gradient with respect to- your model detects cancer cells from the test set images with 99 percent accuracy, while a doctor would on average perform 97 percent on the same task. Is this possible or not? Who thinks it’s possible to have a network that achieves more accuracy on the test set than the doctor? Okay. Can someone, can someone say why? Do you have an explanation? You can look at complex things that possibly you didn’t get from your training. Okay, the network probably looks at complex things that doctor didn’t see, they didn’t see. That’s what you’re saying. Possibly. I think there is a more rigorous explanation. Human error is an approximation for base error but we don’t have to know what it is so theoretically we can get better. Yeah. So here we’re talking about base error, human level performance and all that stuff. That’s when you should see it. So one thing is that there are many concepts that you will see in course three that are actually implemented in the industry. But it’s, it’s not because you know them that you’re going to understand that it’s time to use them and that’s what we want you to get to. Like, now, when I ask you this question, you have to talk- think about base error, human level accuracy and so on. So the question that you should ask here is; what was the data set labeled? Wh- what were the labels coming from? If the data set was labeled by individual doctors, I think that looks weird. Like, if it was labeled by individual doctors, I think it’s very weird that the model performs better on the test set than what doctors have labeled, because- simply because the labels are wrong, three percent of the time on average the labels are wrong. So you’re, you’re te- teaching wrong things to your model three percent of the time. So it’s surprising that it gets better, could happen, but surprising. But if every single image of the data that has been labeled by a group of doctors as Pranav talked about it, then, the average accuracy of this group of doctor is probably higher than one doctor. Maybe it’s 99 percent, in which case it makes sense that the model can beat one doctor. Does it make sense? So you have base error, you’re trying to approximate with, with, like, the best error you can achieve. So regrouping- grouping a cluster of doctors, probably better than one doctor. This is your human level performance and then you should be able to beat one doctor. Okay. [NOISE] So you want to build a pipeline that goes from image taken by the front of your car to steering direction for autonomous driving. What you could do, is that you could send this image to a car detector, that detects all the cars, a pedestrian detector, that detects all the pedestrians. And then you can give it to a path planner, let’s say, that plans the path and outputs the steering direction, let’s say. So it’s not end-to-end. End-to-end would be, I have an input image and I give it an ou- I want this output. So a few other disadvantages of this is, is, uh, something can go wrong anywhere in the model, you know. How do you know which part of the model went wrong? Can you tell me which part? I give you [NOISE] an image, the steering direction is wrong. Why? Yes. Look at the different components and try to isolate it. Good idea, looking at the different components. So what you can do is look what happens here and there. Loo- look what’s happening here and there. You think, based on this image, the car detector worked well or not? You can check it out. Do you think the pedestrian detector works well or not? You can check it out. If there is something wrong here, it’s probably one of these two items. It doesn’t mean this one is good, it just means that these two items are wrong. How do you check that this one is good? You can label ground-truth images and give them here as input to this one, and figure out if it’s figuring out the steering direction or not. If it is, it seems that the path planner is working well. If it is not, it means there’s a problem here. Now, what if every single component seemed to work properly, like let’s say these two work properly, but there is still a problem. It might be because what you selected as a human was wrong. The path planner cannot detect, cannot get the steering direction correct based on only the pedestrians and the car detecti- and, and the cars, probably need the stop signs and stuff like that as well, you know. And so because you made hand engineering choices here, your model might go wrong. That’s another thing. And another advantage of, of, uh, of this type of pipeline is that data is probably easier to find out at end for every algorithm, rather than the, for the whole end-to-end pipeline. If you want to collect data for the entire pipeline, you would need to take a car, put a camera in the front, like, like, build a, kind of, steering wheel angle detector that will measure your steering wheel at every step, while you’re driving. So you need to drive everywhere, basically, with a car that has this feature. It’s pretty hard. You need a lot of data, a lot of roads. While this one, you can collect data of images anywhere and label it, uh, and label the pedestrians on it. You can detect cars by the same process, okay? So these choices also depend on what data can you access easily or what data is harder to acquire. Any questions on that? You’re going to learn about convolutional neural networks now. We’re going to get fun with a lot of imaging. You have a quiz and two programming assignments for the first module. Second module same. Midterm, next Friday, not this one. Everything up to C4M2 will be included in the midterm. So up to the videos you’re watching this week. Includes TA sections and a next one- and every in-class lecture including next Wednesday. And this Friday you have a TA section. Any questions on that? Okay. See you next week, guys.