1. 程式人生 > >Linux使用技巧7--GBK轉成UTF-8

Linux使用技巧7--GBK轉成UTF-8

                     

檔案的內容編碼的轉換

Windows系統中編輯的Java原始碼,在Linux下開啟會出現中文亂碼的情況。原因就是檔案編碼格式的問題,Windows下通常是GBK而Linux下是UTF-8。

在vim中用set fileencoding命令就可以看出編碼格式,如下:

//linux下fileencoding=utf-8//windows下fileencoding=latin1
  • 1
  • 2
  • 3
  • 4

最簡單的辦法就是在windows下將檔案另存為utf8格式。那麼在linux下我們可以使用iconv工具將其轉換格式。

$ iconv --helpUsage: iconv [OPTION...] [FILE...]Convert encoding of given files from one encoding to another. Input/Output format specification:  -f, --from-code=NAME
       encoding of original text
  -t, --to-code=NAME         encoding for output Information:  -l, --list                 list all known coded character sets Output control:  -c                         omit invalid characters from output  -o, --output=FILE          output file  -s, --silent               suppress warnings
      --verbose              print progress information  -?, --help                 Give this help list      --usage                Give a short usage message  -V, --version              Print program version
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
$ iconv -f GBK -t UTF-8 test.java -o test2.java
  • 1

轉換完畢,中文亂碼就不見了。

檔案的批量轉換

$ find com -type d -exec mkdir -p com2/{} \;
  • 1
  • 2

2.轉換

$ find com -type f -exec iconv -f GBK -t UTF-8 {} -o com2/{} \;
  • 1

檔案/資料夾名的轉換

這就要用到convmv工具了。

$ convmvYour Perl version has fleas #22111 #37757 #49830 convmv 1.15 - converts filenames from one encoding to anotherCopyright (C) 2003-2011 Bjoern JACKE <[email protected]> USAGE: convmv [options] FILE(S)-f enc     encoding *from* which should be converted-t enc     encoding *to* which should be converted-r         recursively go through directories-i         interactive mode (ask for each action)--nfc      target files will be normalization form C for UTF-8 (Linux etc.)--nfd      target files will be normalization form D for UTF-8 (OS X etc.)--qfrom    be quiet about the "from" of a rename (if it screws up your terminal e.g.)--qto      be quiet about the "to" of a rename (if it screws up your terminal e.g.)--exec c   execute command instead of rename (use #1 and #2 and see man page)--list     list all available encodings--lowmem   keep memory footprint low (see man page)--nosmart  ignore if files already seem to be UTF-8 and convert if posible--notest   actually do rename the files--replace  will replace files if they are equal--unescape convert%20ugly%20escape%20sequences--upper    turn to upper case--lower    turn to lower case--parsable write a parsable todo list (see man page)--help     print this help
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

將tech目錄下的資料夾或檔案遞迴轉換:

sudo convmv -f gbk -t utf-8 -r --notest tech/
  • 1

另外需要注意,有時候在windows上用zip壓縮時也會帶來亂碼問題。