course: use PaddleOCR Realize container number detection and recognition
The complete tutorial is available https://hyperai.com/console/open-tutorials/containers/XJsxhLTnKNu You can directly「clone」use.
Project Introduction
Container number refers to the container number of the container used for shipping export goods,This field is required when filling out the consignment note.Basic concepts of standard container number composition:use ISO6346(1995)standard
The standard container number is determined by 11 Bit code composition,as:CBHU 123456 7,Including three parts:
- The first part consists of 4 Composed of English letters.The first three codes mainly indicate the box owner、Operator,The fourth code indicates the type of container.Column as follows CBHU The standard container at the beginning indicates that the container owner and operator are COSCO Shipping
- The second part consists of 6 Composed of digits.It's the box registration code,Unique identifier used for holding a container body
- The third part is the checksum from the front 4 The letters and 6 The digits are calculated through verification rules,Used to identify whether errors occur during verification.That is, the 11 Position number
This tutorial is based on PaddleOCR Perform container number detection and recognition tasks,Train detection separately using a small amount of data、Identification model,Finally, connect them together to achieve the task of container number detection and recognition
Environmental preparation
1、stay HyperAI Start a「model training」The container,Environmental selection paddlepaddle-2.3,Resource selection vGPU Or other GPU container
2、stay Jupyter Open one in the middle Terminal window,Then execute the following command:
cd PaddleOCR-release-2.5 #get into PaddleOCR-release-2.5 folder
pip install -r requirements.txt #install PaddleOCR Required dependencies
python setup.py install #install PaddleOCR
Dataset Introduction
The materials used in this tutorial Container Number Dataset,This data contains 3003 The resolution is 1920×1080
Container image
1、PaddleOCR The annotation rules for training the detection model are as follows,Middle use"\t"separate:
" Image file name json.dumps Encoded image annotation information"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
among json.dumps The image annotation information before encoding contains multiple dictionaries list,In the dictionary points The coordinates of the four points representing the text box (x, y),Arrange clockwise starting from the point in the upper left corner. transcription Indicate the text of the current text box,When its content is“###”When,Indicates that the text box is invalid,Skipping during training.
2、PaddleOCR The annotation rules for identifying model training are as follows,Middle use"\t"separate:
" Image file name Image annotation information "
train_data/rec/train/word_001.jpg simple and reliable
train_data/rec/train/word_002.jpg Make the complicated world simpler through technology
Data organization
Preparation of data required for model detection
Convert the dataset 3000 Press the picture 2:1 Divided into training set and validation set,Run the following code
from tqdm import tqdm
finename = "all_label.txt"
f = open(finename)
lines = f.readlines()
t = open('det_train_label.txt','w')
v = open('det_eval_label.txt','w')
count = 0
for line in tqdm(lines):
if count < 2000:
t.writelines(line)
count += 1
else:
v.writelines(line)
f.close()
t.close()
v.close()
100%|██████████| 3003/3003 [00:00<00:00, 37908.32it/s]
Data preparation required for identifying models
We follow the annotations in the testing section,Crop the dataset to include only text and images as recognition data as much as possible,Run the following code
from PIL import Image
import json
from tqdm import tqdm
import os
import numpy as np
import cv2
import math
from PIL import Image, ImageDraw
class Rotate(object):
def __init__(self, image: Image.Image, coordinate):
self.image = image.convert('RGB')
self.coordinate = coordinate
self.xy = [tuple(self.coordinate[k]) for k in ['left_top', 'right_top', 'right_bottom', 'left_bottom']]
self._mask = None
self.image.putalpha(self.mask)
@property
def mask(self):
if not self._mask:
mask = Image.new('L', self.image.size, 0)
draw = ImageDraw.Draw(mask, 'L')
draw.polygon(self.xy, fill=255)
self._mask = mask
return self._mask
def run(self):
image = self.rotation_angle()
box = image.getbbox()
return image.crop(box)
def rotation_angle(self):
x1, y1 = self.xy[0]
x2, y2 = self.xy[1]
angle = self.angle([x1, y1, x2, y2], [0, 0, 10, 0]) * -1
return self.image.rotate(angle, expand=True)
def angle(self, v1, v2):
dx1 = v1[2] - v1[0]
dy1 = v1[3] - v1[1]
dx2 = v2[2] - v2[0]
dy2 = v2[3] - v2[1]
angle1 = math.atan2(dy1, dx1)
angle1 = int(angle1 * 180 / math.pi)
angle2 = math.atan2(dy2, dx2)
angle2 = int(angle2 * 180 / math.pi)
if angle1 * angle2 >= 0:
included_angle = abs(angle1 - angle2)
else:
included_angle = abs(angle1) + abs(angle2)
if included_angle > 180:
included_angle = 360 - included_angle
return included_angle
def image_cut_save(path, bbox, save_path):
"""
:param path: Image path
:param left: The distance between the pixel in the upper left corner of the block and the left boundary of the image
:param upper:The distance between the pixel in the upper left corner of the block and the upper boundary of the image
:param right:The distance between the pixel at the bottom right corner of the block and the left boundary of the image
:param lower:The distance between the pixel at the bottom right corner of the block and the upper boundary of the image
"""
img_width = 1920
img_height = 1080
img = Image.open(path)
coordinate = {'left_top': bbox[0], 'right_top': bbox[1], 'right_bottom': bbox[2], 'left_bottom': bbox[3]}
rotate = Rotate(img, coordinate)
left, upper = bbox[0]
right, lower = bbox[2]
if lower-upper > right-left:
rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
else:
rotate.run().convert('RGB').save(save_path)
return True
#Read detection labels and create recognition datasets
files = ["det_train_label.txt","det_eval_label.txt"]
filetypes =["train","eval"]
for index,filename in enumerate(files):
f = open(filename)
l = open('rec_'+filetypes[index]+'_label.txt','w')
if index == 0:
data_dir = "RecTrainData"
else:
data_dir = "RecEvalData"
if not os.path.exists(data_dir):
os.mkdir(data_dir)
lines = f.readlines()
for line in tqdm(lines):
image_name = line.split("\t")[0].split("/")[-1]
annos = json.loads(line.split("\t")[-1])
img_path = os.path.join("/input0/images",image_name)
for i,anno in enumerate(annos):
data_path = os.path.join(data_dir,str(i)+"_"+image_name)
if image_cut_save(img_path,anno["points"],data_path):
l.writelines(str(i)+"_"+image_name+"\t"+anno["transcription"]+"\n")
l.close()
f.close()
100%|██████████| 2000/2000 [02:55<00:00, 11.39it/s]
100%|██████████| 1003/1003 [01:30<00:00, 11.05it/s]
experiment
Due to a relatively small dataset,For better and faster convergence of the model,Choose here PaddleOCR In the middle PP-OCRv3 Model detection and recognition.PP-OCRv3 stay PP-OCRv2 On the basis of,Chinese scene end-to-end Hmean Compared to indicators PP-OCRv2 promote 5%, End to end improvement of English numerical models 11%.For detailed optimization details, please refer to PP-OCRv3 Technical report.
detection model
Detecting model configuration
PaddleOCR Provides many detection models,On the path PaddleOCR-release-2.5/configs/det
You can find the model and its configuration files below.If we choose a model ch_PP-OCRv3_det_student.yml
,The configuration file path is in:PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml
.Necessary settings must be made before use,Like training parameters、Dataset path, etc.Display some key configurations as follows:
#Key training parameters
use_gpu: true #Do you use a graphics card
epoch_num: 1200 #train epoch number
save_model_dir: ./output/ch_PP-OCR_V3_det/ #Model save path
save_epoch_step: 200 #Every training session 200epoch,Save the model once
eval_batch_step: [0, 100] #Train every iteration 100 second,Perform a verification once
pretrained_model: ./PaddleOCR-release
2.5/pretrain_models/ch_PP-OCR_V3_det/best_accuracy.pdparams #Pre trained model path
#Training set path setting
Train:
dataset:
name: SimpleDataSet
data_dir: /input0/images #Image folder path
label_file_list:
- ./det_train_label.txt #Label Path
Model fine-tuning
stay notebook Run the following command in the middle to fine tune the model,among -c The path to the configured model file is passed in
%run PaddleOCR-release-2.5/tools/train.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml
Use default hyperparameters,Model ch_PP-OCRv3_det_student
Train on the training set 385 individual epoch after,The model on the validation set hmean achieve:96.96%,There was no significant growth thereafter
[2022/10/11 06:36:09] ppocr INFO: best metric, hmean: 0.969551282051282, precision: 0.9577836411609498,
recall: 0.981611681990265, fps: 20.347745459258228, best_epoch: 385
Identification model
Identify model configuration
PaddleOCR It also provides many recognition models,On the path PaddleOCR-release-2.5/configs/rec
You can find the model and its configuration files below.If we choose a model ch_PP-OCRv3_rec_distillation,The configuration file path is in:PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
.Necessary settings must be made before use,Like training parameters、Dataset path, etc.Display some key configurations as follows:
#Key training parameters
use_gpu: true #Do you use a graphics card
epoch_num: 1200 #train epoch number
save_model_dir: ./output/rec_ppocr_v3_distillation #Model save path
save_epoch_step: 200 #Every training session 200epoch,Save the model once
eval_batch_step: [0, 100] #Train every iteration 100 second,Perform a verification once
pretrained_model: ./PaddleOCR-release-2.5/pretrain_models/PPOCRv3/best_accuracy.pdparams #Pre trained model path
#Training set path setting
Train:
dataset:
name: SimpleDataSet
data_dir: ./RecTrainData/ #Image folder path
label_file_list:
- ./rec_train_label.txt #Label Path
Model fine-tuning
stay notebook Run the following command in the middle to fine tune the model,among -c The path to the configured model file is passed in
%run PaddleOCR-release-2.5/tools/train.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
Use default hyperparameters,Model ch_PP-OCRv3_rec_distillation
Train on the training set 136 individual epoch after,The accuracy of the model on the validation set reaches:96.11%,There was no significant growth thereafter
[2022/10/11 20:04:28] ppocr INFO: best metric, acc: 0.9610600272522444, norm_edit_dis: 0.9927426548965615,
Teacher_acc: 0.9540291998159589, Teacher_norm_edit_dis: 0.9905629345025616, fps: 246.029195787707, best_epoch: 136
Model reasoning
Detecting model inference
stay notebook Run the following command in the middle to use the fine tuned model to detect text in the test image,among:
Global.infer_img
For image path or image folder path,
Global.pretrained_model
For the fine tuned model,
Global.save_res_path
Save the path for inference results
%run PaddleOCR-release-2.5/tools/infer_det.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.infer_img="/input0/images" Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_res_path="./output/det_infer_res/predicts.txt"
Identification model inference
stay notebook Run the following command in the middle to use the fine tuned model to detect text in the test image,among:
Global.infer_img
For image path or image folder path,
Global.pretrained_model
For the fine tuned model,
Global.save_res_path
Save the path for inference results
%run PaddleOCR-release-2.5/tools/infer_rec.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
-o Global.infer_img="./RecEvalData/" Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_res_path="./output/rec_infer_res/predicts.txt"
Detection and recognition model serial reasoning
Model conversion
Before performing serial inference, it is necessary to first convert the trained and saved model into an inference model,Just execute the following detection commands separately.among,-c
Pass in the configuration file path for the model to be converted,-o Global.pretrained_model
For the model file to be converted,Global.save_inference_dir
To convert and obtain the storage path of the inference model
# Detecting model conversion
%run PaddleOCR-release-2.5/tools/export_model.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_inference_dir="./output/det_inference/"
W1011 07:10:20.363173 544 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.1 W1011 07:10:20.366801 544 gpu_context.cc:306] device: 0, cuDNN Version: 8.0. W1011 07:10:22.629678 544 gpu_context.cc:506] WARNING: device: �. The installed Paddle is compiled with CUDNN 8.1, but CUDNN version in your machine is 8.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
[2022/10/11 07:10:24] ppocr INFO: load pretrain successful from ./output/ch_PP-OCR_V3_det/best_accuracy [2022/10/11 07:10:27] ppocr INFO: inference model is saved to ./output/det_inference/inference
# Recognition model transformation
%run PaddleOCR-release-2.5/tools/export_model.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
-o Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_inference_dir="./output/rec_inference/"
[2022/10/11 07:10:33] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3_distillation/best_accuracy [2022/10/11 07:10:35] ppocr INFO: inference model is saved to ./output/rec_inference/Teacher/inference [2022/10/11 07:10:36] ppocr INFO: inference model is saved to ./output/rec_inference/Student/inference
Model concatenation reasoning
After conversion is complete,PaddleOCR Provides a cascading tool for detecting and identifying models,It is possible to concatenate any trained detection model and any recognition model into a two-stage text recognition system.Input image undergoes text detection、Correction of detection box、Text recognition、Score filtering involves four main stages of outputting text positions and recognition results.The execution code is as follows,among image_dir
Path for a single image or a collection of images,det_model_dir
To detect inference The path of the model,rec_model_dir
To identify inference The path of the model.Visual recognition results are saved by default to ./inference_results Inside the folder.
%run PaddleOCR-release-2.5/tools/infer/predict_system.py \
--image_dir="OCRTest" \
--det_model_dir="./output/det_inference/" \
--rec_model_dir="./output/rec_inference/Student/"
[2022/10/11 07:10:46] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320 [2022/10/11 07:10:48] ppocr DEBUG: dt_boxes num : 2, elapse : 1.0023341178894043 [2022/10/11 07:10:48] ppocr DEBUG: rec_res num : 2, elapse : 0.02405834197998047 [2022/10/11 07:10:48] ppocr DEBUG: 0 Predict time of OCRTest/1-122700001-OCR-LF-C01.jpg: 1.041s [2022/10/11 07:10:48] ppocr DEBUG: TTEMU3108252, 0.864 [2022/10/11 07:10:48] ppocr DEBUG: 22G1, 0.843 [2022/10/11 07:10:48] ppocr DEBUG: The visualized image saved in ./inference_results/1-122700001-OCR-LF-C01.jpg [2022/10/11 07:10:48] ppocr DEBUG: dt_boxes num : 1, elapse : 0.047757863998413086 [2022/10/11 07:10:48] ppocr DEBUG: rec_res num : 1, elapse : 0.016452789306640625 [2022/10/11 07:10:48] ppocr DEBUG: 1 Predict time of OCRTest/1-122720001-OCR-AH-A01.jpg: 0.073s [2022/10/11 07:10:48] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-AH-A01.jpg [2022/10/11 07:10:48] ppocr DEBUG: dt_boxes num : 2, elapse : 0.05301952362060547 [2022/10/11 07:10:48] ppocr DEBUG: rec_res num : 2, elapse : 0.020509719848632812 [2022/10/11 07:10:48] ppocr DEBUG: 2 Predict time of OCRTest/1-122720001-OCR-AS-B01.jpg: 0.081s [2022/10/11 07:10:48] ppocr DEBUG: EITU1786393, 0.990 [2022/10/11 07:10:48] ppocr DEBUG: 45G1, 0.963 [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-AS-B01.jpg [2022/10/11 07:10:49] ppocr DEBUG: dt_boxes num : 2, elapse : 0.049460411071777344 [2022/10/11 07:10:49] ppocr DEBUG: rec_res num : 2, elapse : 0.020053625106811523 [2022/10/11 07:10:49] ppocr DEBUG: 3 Predict time of OCRTest/1-122720001-OCR-LB-C02.jpg: 0.077s [2022/10/11 07:10:49] ppocr DEBUG: LTU1, 0.814 [2022/10/11 07:10:49] ppocr DEBUG: 45G1, 0.997 [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-LB-C02.jpg [2022/10/11 07:10:49] ppocr DEBUG: dt_boxes num : 2, elapse : 0.051781654357910156 [2022/10/11 07:10:49] ppocr DEBUG: rec_res num : 2, elapse : 0.020511150360107422 [2022/10/11 07:10:49] ppocr DEBUG: 4 Predict time of OCRTest/1-122720001-OCR-RF-D01.jpg: 0.081s [2022/10/11 07:10:49] ppocr DEBUG: EITU1786393, 0.966 [2022/10/11 07:10:49] ppocr DEBUG: 45G1, 0.939 [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-RF-D01.jpg [2022/10/11 07:10:49] ppocr DEBUG: dt_boxes num : 0, elapse : 0.04465031623840332 [2022/10/11 07:10:49] ppocr DEBUG: rec_res num : 0, elapse : 1.430511474609375e-06 [2022/10/11 07:10:49] ppocr DEBUG: 5 Predict time of OCRTest/1-122728001-OCR-AH-A01.jpg: 0.049s [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122728001-OCR-AH-A01.jpg [2022/10/11 07:10:49] ppocr INFO: The predict total time is 2.9623537063598633