The image feeded into the model via webcam or cctv camera is divided into 13 * 13 grids

Each of these cells is responsible for predicting 5 bounding boxes. A bounding box describes the rectangle that encloses an object.YOLO also outputs a confidence score that tells us how certain it is that the predicted bounding box actually encloses some object. This score doesn’t say anything about what kind of object is in the box, just if the shape of the box is any good. For each bounding box, the cell also predicts a class. This works just like a classifier: it gives a probability distribution over all the possible classes. The confidence score for the bounding box and the class prediction are combined into one final score that tells us the probability that this bounding box contains weapon or not. Since there are 13×13 = 169 grid cells and each cell predicts 5 bounding boxes, we end up with 845 bounding boxes in total. It turns out that most. of these boxes will have very low confidence scores, so we only keep the boxes whose final score is 30% or more