Faster-RCNN 訓練自己資料集的坑記錄

阿新 • • 發佈：2019-01-06

主要照這篇部落格進行訓練配置，因為沒有GPU所以好多坑，CPU訓練可以參見這篇部落格

正所謂，跑通了的都一樣，錯誤千萬樣。按照教程來也是坑多

訓練：

 python train_faster_rcnn_alt_opt.py  --net_name ZF --weights /home/lys/py-faster-rcnn/data/imagenet_models/ZF.v2.caffemodel --cfg /home/lys/py-faster-rcnn/experiments/cfgs/faster_rcnn_alt_opt.yml  --imdb voc_2007_trainval

error1：

Cannot use GPU in CPU-only Caffe: check mode.

solution1：

把py-faster-rcnn/tools/下的所有py檔案中的GPU註釋掉，然後mode設為cpu。示例如下：

#    caffe.set_mode_gpu()
    caffe.set_mode_cpu()
#    if args.gpu_id is not None:
#        caffe.set_device(args.gpu_id)

error2：

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "train_faster_rcnn_alt_opt.py", line 125, in train_rpn
    roidb, imdb = get_roidb(imdb_name)
  File "train_faster_rcnn_alt_opt.py", line 62, in get_roidb
    imdb = get_imdb(imdb_name)
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/factory.py", line 38, in get_imdb
    return __sets[name]()
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/factory.py", line 20, in <lambda>
    __sets[name] = (lambda split=split, year=year: pascal_voc(split, year))
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/pascal_voc.py", line 39, in __init__
    self._image_index = self._load_image_set_index()
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/pascal_voc.py", line 83, in _load_image_set_index
    'Path does not exist: {}'.format(image_set_file)
AssertionError: Path does not exist: /home/lys/py-faster-rcnn/data/VOCdevkit2007/VOC2007/ImageSets/Main/trainval.txt

solution2：

智障錯誤。只顧了檢視trainval.txtz在不在了，沒有建立VOCdevkit2007資料夾，直接就VOC2007了

error3：

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "train_faster_rcnn_alt_opt.py", line 125, in train_rpn
    roidb, imdb = get_roidb(imdb_name)
  File "train_faster_rcnn_alt_opt.py", line 68, in get_roidb
    roidb = get_training_roidb(imdb)
  File "/home/lys/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 118, in get_training_roidb
    imdb.append_flipped_images()
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/imdb.py", line 108, in append_flipped_images
    boxes = self.roidb[i]['boxes'].copy()
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/imdb.py", line 67, in roidb
    self._roidb = self.roidb_handler()
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/pascal_voc.py", line 112, in gt_roidb
    for index in self.image_index]
  File "/home/lys/py-faster-rcnn/tools/../lib/datasets/pascal_voc.py", line 217, in _load_pascal_annotation
    cls = self._class_to_ind[obj.find('name').text.lower().strip()]
KeyError: 'leftatrial'

solution3：

cls = self._class_to_ind[obj.find('name').text.lower().strip()]把lower()去掉，上面提到的第一篇部落格有講

error4：

I0627 10:57:37.710443 10173 solver.cpp:81] Creating training net from train_net file: models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt
F0627 10:57:37.710464 10173 io.cpp:36] Check failed: fd != -1 (-1 vs. -1) File not found: models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt

solution4：

把/home/lys/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt下的4個stage1_fast_rcnn_solver30k40k類的檔案設定絕對路徑

train_net: "/home/lys/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt"上文提到的第二篇部落格

error5：

F0627 11:27:37.913828 10633 smooth_L1_loss_layer.cpp:54] Not Implemented Yet

solution5：

實現這個檔案中的兩個函式，然後進入到caffe-fast-rcnn下重新make一下。參照這篇部落格(template處缺少<typename Dtype>)和這篇不過我目前沒有更改roi檔案，，果然還是得改。。。。。github上也有這個問題

// ------------------------------------------------------------------
// Fast R-CNN
// Copyright (c) 2015 Microsoft
// Licensed under The MIT License [see fast-rcnn/LICENSE for details]
// Written by Ross Girshick
// ------------------------------------------------------------------

#include "caffe/fast_rcnn_layers.hpp"

namespace caffe {

template <typename Dtype>
void SmoothL1LossLayer<Dtype>::LayerSetUp(
  const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  SmoothL1LossParameter loss_param = this->layer_param_.smooth_l1_loss_param();
  sigma2_ = loss_param.sigma() * loss_param.sigma();
  has_weights_ = (bottom.size() >= 3);
  if (has_weights_) {
    CHECK_EQ(bottom.size(), 4) << "If weights are used, must specify both "
      "inside and outside weights";
  }
}

template <typename Dtype>
void SmoothL1LossLayer<Dtype>::Reshape(
  const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  LossLayer<Dtype>::Reshape(bottom, top);
  CHECK_EQ(bottom[0]->channels(), bottom[1]->channels());
  CHECK_EQ(bottom[0]->height(), bottom[1]->height());
  CHECK_EQ(bottom[0]->width(), bottom[1]->width());
  if (has_weights_) {
    CHECK_EQ(bottom[0]->channels(), bottom[2]->channels());
    CHECK_EQ(bottom[0]->height(), bottom[2]->height());
    CHECK_EQ(bottom[0]->width(), bottom[2]->width());
    CHECK_EQ(bottom[0]->channels(), bottom[3]->channels());
    CHECK_EQ(bottom[0]->height(), bottom[3]->height());
    CHECK_EQ(bottom[0]->width(), bottom[3]->width());
  }
  diff_.Reshape(bottom[0]->num(), bottom[0]->channels(),
      bottom[0]->height(), bottom[0]->width());
  errors_.Reshape(bottom[0]->num(), bottom[0]->channels(),
      bottom[0]->height(), bottom[0]->width());
  // vector of ones used to sum
  ones_.Reshape(bottom[0]->num(), bottom[0]->channels(),
      bottom[0]->height(), bottom[0]->width());
  for (int i = 0; i < bottom[0]->count(); ++i) {
    ones_.mutable_cpu_data()[i] = Dtype(1);
  }
}

template <typename Dtype>
void SmoothL1LossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  //NOT_IMPLEMENTED;

  // cpu implementation
  CHECK_EQ(bottom[0]->count(1), bottom[1]->count(1))
      << "Inputs must have the same dimension.";
  int count = bottom[0]->count();
  caffe_sub(count, 
            bottom[0]->cpu_data(), 
            bottom[1]->cpu_data(),
            diff_.mutable_cpu_data());

  if(has_weights_){
    caffe_mul(count, 
              bottom[2]->cpu_data(), 
              diff_.cpu_data(), 
              diff_.mutable_cpu_data());
  }
  // f(x) = 0.5 * (sigma * x)^2          if |x| < 1 / sigma / sigma
  //        |x| - 0.5 / sigma / sigma    otherwise
  const Dtype* in = diff_.cpu_data();
  Dtype* out = errors_.mutable_cpu_data();
  for(int index=0; index<count; ++index){
    Dtype val = in[index];
    Dtype abs_val = abs(val);
    if(abs_val < 1.0 / sigma2_){
        out[index] = 0.5 * val * val * sigma2_;
    }
    else{
        out[index] = abs_val - 0.5 / sigma2_;
    }
  }

  if(has_weights_){
    caffe_mul(count, bottom[3]->cpu_data(), out, errors_.mutable_cpu_data());
  }

  // compute loss
  Dtype loss = caffe_cpu_dot(count, ones_.cpu_data(), errors_.cpu_data());
  top[0]->mutable_cpu_data()[0] = loss / bottom[0]->num();
  // end cpu implementation

}

template <typename Dtype>
void SmoothL1LossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
  //NOT_IMPLEMENTED;
  // cpu implementation
  int count = diff_.count();
  const Dtype* in = diff_.cpu_data();
  Dtype* out = diff_.mutable_cpu_data();
  for(int index=0; index < count; index++){
    Dtype val = in[index];
    Dtype abs_val = abs(val);
    if(abs_val < 1.0 / sigma2_){
        out[index] = sigma2_ *  val;
    } 
    else{
        out[index] = (Dtype(0) < val) - (val < Dtype(0));
    }
  }

  for(int i=0; i<2; ++i){
    if(propagate_down[i]){
        const Dtype sign = (i == 0) ? 1 : -1;
        const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
        caffe_cpu_axpby(
            count, 
            alpha, 
            out,//diff_.cpu_data(), 
            Dtype(0), 
            bottom[i]->mutable_cpu_diff());

        if(has_weights_){
            caffe_mul(
                count, 
                bottom[2]->cpu_data(), 
                bottom[i]->cpu_diff(), 
                bottom[i]->mutable_cpu_data());
            caffe_mul(
                count,
                bottom[3]->cpu_data(),
                bottom[i]->cpu_diff(),
                bottom[i]->mutable_cpu_data());
        }
    }
  }
  // end cpu implementation
}

#ifdef CPU_ONLY
STUB_GPU(SmoothL1LossLayer);
#endif

INSTANTIATE_CLASS(SmoothL1LossLayer);
REGISTER_LAYER_CLASS(SmoothL1Loss);

}  // namespace caffe

error6：一天了。。。。我已經從第一個錯誤犯到第六個了。。。。加油。。。。。。

Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "train_faster_rcnn_alt_opt.py", line 198, in train_fast_rcnn
    max_iters=max_iters)
  File "/home/lys/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 160, in train_net
    model_paths = sw.train_model(max_iters)
  File "/home/lys/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 101, in train_model
    self.solver.step(1)
  File "/home/lys/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 144, in forward
    blobs = self._get_next_minibatch()
  File "/home/lys/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 63, in _get_next_minibatch
    return get_minibatch(minibatch_db, self._num_classes)
  File "/home/lys/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 55, in get_minibatch
    num_classes)
  File "/home/lys/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 100, in _sample_rois
    fg_inds, size=fg_rois_per_this_image, replace=False)
  File "mtrand.pyx", line 1176, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:18822)
TypeError: 'numpy.float64' object cannot be interpreted as an index

solution6:

調整numpy版本，部落格

python -c "import numpy;print numpy.version.version"#檢視numpy版本，1.12.1

sudo pip install -U numpy==1.11.0

訓練終於告一段落。。。。。感謝各路部落格大神

再訓練又出錯，不過可以避免的，我明明記得刪掉pkl了。。。。

error7:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "train_faster_rcnn_alt_opt.py", line 125, in train_rpn
    roidb, imdb = get_roidb(imdb_name)
  File "train_faster_rcnn_alt_opt.py", line 68, in get_roidb
    roidb = get_training_roidb(imdb)
  File "/home/lys/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 122, in get_training_roidb
    rdl_roidb.prepare_roidb(imdb)
  File "/home/lys/py-faster-rcnn/tools/../lib/roi_data_layer/roidb.py", line 27, in prepare_roidb
    roidb[i]['image'] = imdb.image_path_at(i)
IndexError: list index out of range

solution7:

刪除py-faster-rcnn/data/cache/ 資料夾下的.pkl檔案，或者改名備份，重新訓練即可。部落格最後的error2

error8:

obj for obj in objs if int(obj.find('difficult').text) == 0]

zhushudiao

error9:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "train_faster_rcnn_alt_opt.py", line 195, in train_fast_rcnn
    max_iters=max_iters)
  File "/home/amax/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 160, in train_net
    model_paths = sw.train_model(max_iters)
  File "/home/amax/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 111, in train_model
    model_paths.append(self.snapshot())
  File "/home/amax/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 73, in snapshot
    self.bbox_stds[:, np.newaxis])
ValueError: operands could not be broadcast together with shapes (84,4096) (8,1)

測試：

將py-faster-rcnn/output/faster_rcnn_alt_opt/voc_2007_trainval/下的ZF_faster_rcnn_final.caffemodel複製到/py-faster-rcnn/data/faster_rcnn_models/下，在tools下執行

python demo.py --net zf --cpu

調整引數：參見部落格

學習率之類：

py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt中的solve檔案設定

迭代次數

py-faster-rcnn/tools/train_faster_rcnn_alt_opt.py中修改
py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt裡對應的solver檔案（有4個）也修改，stepsize小於上面修改的數值

Faster-RCNN 訓練自己資料集的坑記錄

Faster-RCNN 訓練自己資料集的坑記錄

使用pytorch版faster-rcnn訓練自己資料集

py-faster-rcnn訓練自己資料集需要修改的引數

Fast RCNN 訓練自己資料集 (1編譯配置)

Fast RCNN 訓練自己資料集 (2修改資料讀取介面)

用Faster RCNN訓練自己的資料集

Caffe學習系列——Faster-RCNN訓練自己的資料集

Faster RCNN 訓練自己的資料集遇到的一些問題

不能再詳細！！！手把手教你用Faster-RCNN訓練自己的資料集

深度學習Caffe實戰筆記（20）Windows平臺 Faster-RCNN 訓練自己的資料集

利用faster rcnn 訓練自己的資料集——kitti資料集

win10 tensorflow faster rcnn訓練自己的資料集（一、製作VOC2007資料集）

使用faster rcnn訓練自己的資料（py-faster-rcnn ）

faster-rcnn訓練自己的資料時報錯too many indices

《錯誤手記-01》 facenet使用預訓練模型fine-tune重新訓練自己資料集報錯

Kaldi中thchs30訓練自己資料集的步驟

yolov3訓練自己資料集可參考文章

Yolov3訓練自己資料集+資料分析

【YOLO初探】之 keras-yolov3訓練自己資料集

YOLOv3在訓練自己資料集時的速度慢問題

Faster-RCNN 訓練自己資料集的坑記錄

相關推薦