1. 程式人生 > >Python學習 python3的這些新特性很方便

Python學習 python3的這些新特性很方便

Python3的這些新特性很方便

2018年01月31日 14:02:18 閱讀數:1503
													<span class="tags-box artic-tag-box">
							<span class="label">標籤:</span>
															<a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;python&quot;}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=python&amp;t=blog" target="_blank">python																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;Python3&quot;}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=Python3&amp;t=blog" target="_blank">Python3																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;遷移&quot;}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=遷移&amp;t=blog" target="_blank">遷移																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;特性&quot;}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=特性&amp;t=blog" target="_blank">特性																</a>
						</span>
																														</div>
			<div class="operating">
													</div>
		</div>
	</div>
</div>
<article>
	<div id="article_content" class="article_content clearfix csdn-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post">
							            <div class="markdown_views">
						<!-- flowchart 箭頭圖示 勿刪 -->
						<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
						<h3 id="概述"><a name="t0"></a>概述</h3>

  隨著Python在機器學習和資料科學領域的應用越來越廣泛,相關的Python庫也增長的非常快。但是Python本身存在一個非常要命的問題,就是Python2和Python3,兩個版本互不相容,而且Github上Python2的開源庫有很多不相容Python3,導致大量的Python2的使用者不願意遷移到Python3。   Python3在很多方面都做出了改變,優化了Python2的很多不足,標準庫也擴充了很多內容,例如協程相關的庫。現在列舉一些Python3裡提供的功能,跟你更好的從Python2遷移到Python3的理由。

系統檔案路徑處理庫:pathlib

  使用Python2的同學,應該都用過os.path這個庫,來處理各種各樣的路徑問題,比如拼接檔案路徑的函式:os.path.join()

,用Python3,你可以使用pathlib很方便的完成這個功能:

from pathlib import Path

dataset = 'wiki_images'
datasets_root = Path('/path/to/datasets/') 

train_path = datasets_root / dataset / 'train'
test_path = datasets_root / dataset / 'test'

for image_path in train_path.iterdir():
    with image_path.open() as f: # note, open is a method of Path object
# do something with an image
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

相比與os.path.join()函式,pathlib更加安全、方便、可讀。pathlib還有很多其他的功能。

p.exists()
p.is_dir()
p.parts()
p.with_name('sibling.png') # only change the name, but keep the folder
p.with_suffix('.jpg') # only change the extension, but keep the folder and the name
p.chmod(mode)
p.rmdir()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

型別提醒: Type hinting

  型別提醒在複雜的專案中可以很好的幫助我們規避一些手誤或者型別錯誤,Python2的時候是靠IDE來識別,格式IDE識別方法不一致,並且只是識別,並不具備嚴格限定。例如有下面的程式碼,引數可以是numpy.array , astropy.Table and astropy.Column, bcolz, cupy, mxnet.ndarray等等。

def repeat_each_entry(data):
    """ Each entry in the data is doubled 
    <blah blah nobody reads the documentation till the end>
    """
    index = numpy.repeat(numpy.arange(len(data)), 2)
    return data[index]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

同樣上面的程式碼,傳入pandas.Series型別的引數也是可以,但是執行時會出錯。

repeat_each_entry(pandas.Series(data=[0, 1, 2], index=[3, 4, 5])) # returns Series with Nones inside
  • 1

  這還只是一個函式,對於大型的專案,會有好多這樣的函式,程式碼很容易就跑飛了。所以確定的引數型別對於大型專案來說非常重要,而Python2沒有這樣的能力,Python3可以。

def repeat_each_entry(data: Union[numpy.ndarray, bcolz.carray]):
  • 1

  目前,比如JetBrains家的PyCharm已經支援Type Hint語法檢查功能,如果你使用了這個IDE,可以通過IDE功能進行實現。如果你像我一樣,使用了SublimeText編輯器,那麼第三方工具mypy可以幫助到你。   PS:目前型別提醒對ndarrays/tensors支援不是很好。

執行時型別檢查:

正常情況下,函式的註釋處理理解程式碼用,其他沒什麼用。你可以是用enforce來強制執行時檢查型別。

@enforce.runtime_validation
def foo(text: str) -> None:
    print(text)

foo('Hi') # ok
foo(5)    # fails


@enforce.runtime_validation
def any2(x: List[bool]) -> bool:
    return any(x)

any ([False, False, True, False]) # True
any2([False, False, True, False]) # True

any (['False']) # True
any2(['False']) # fails

any ([False, None, "", 0]) # False
any2([False, None, "", 0]) # fails
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

使用@特殊字元表示矩陣乘法

如下程式碼:

# l2-regularized linear regression: || AX - b ||^2 + alpha * ||x||^2 -> min

# Python 2
X = np.linalg.inv(np.dot(A.T, A) + alpha * np.eye(A.shape[1])).dot(A.T.dot(b))
# Python 3
X = np.linalg.inv(A.T @ A + alpha * np.eye(A.shape[1])) @ (A.T @ b)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

使用@符號,整個程式碼變得更可讀和方便移植到其他如numpy、tensorflow等庫。

**特殊字元來遞迴檔案路徑

在Python2中,遞迴查詢檔案不是件容易的事情,即使使用glob庫,但是python3中,可以通過萬用字元簡單的實現。

import glob

# Python 2
found_images = \
    glob.glob('/path/*.jpg') \
  + glob.glob('/path/*/*.jpg') \
  + glob.glob('/path/*/*/*.jpg') \
  + glob.glob('/path/*/*/*/*.jpg') \
  + glob.glob('/path/*/*/*/*/*.jpg') 

# Python 3
found_images = glob.glob('/path/**/*.jpg', recursive=True)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

和之前提到的pathlib一起使用,效果更好:

# Python 3
found_images = pathlib.Path('/path/').glob('**/*.jpg')
  • 1
  • 2

Print函式

列印到指定檔案

print >>sys.stderr, "critical error"      # Python 2
print("critical error", file=sys.stderr)  # Python 3
  • 1
  • 2

不使用join函式拼接字串

# Python 3
print(*array, sep='\t')
print(batch, epoch, loss, accuracy, time, sep='\t')
  • 1
  • 2
  • 3

重寫print函式

# Python 3
_print = print # store the original print function
def print(*args, **kargs):
    pass  # do something useful, e.g. store output to some file
  • 1
  • 2
  • 3
  • 4

再比如下面的程式碼

@contextlib.contextmanager
def replace_print():
    import builtins
    _print = print # saving old print function
    # or use some other function here
    builtins.print = lambda *args, **kwargs: _print('new printing', *args, **kwargs)
    yield
    builtins.print = _print

with replace_print():
    <code here will invoke other print function>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

雖然上面這段程式碼也能達到重寫print函式的目的,但是不推薦使用。

字串格式化

python2提供的字串格式化系統還是不夠好,太冗長麻煩,通常我們會寫這樣一段程式碼來輸出日誌資訊:

# Python 2
print('{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {acc_mean:0.4f}±{acc_std:0.4f} time: {avg_time:3.2f}'.format(
    batch=batch, epoch=epoch, total_epochs=total_epochs, 
    acc_mean=numpy.mean(accuracies), acc_std=numpy.std(accuracies),
    avg_time=time / len(data_batch)
))

# Python 2 (too error-prone during fast modifications, please avoid):
print('{:3} {:3} / {:3}  accuracy: {:0.4f}±{:0.4f} time: {:3.2f}'.format(
    batch, epoch, total_epochs, numpy.mean(accuracies), numpy.std(accuracies),
    time / len(data_batch)
))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

輸出的結果是:

120  12 / 300  accuracy: 0.8180±0.4649 time: 56.60
  • 1

python3.6的f-strings功能實現起來就簡單多了。

# Python 3.6+
print(f'{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {numpy.mean(accuracies):0.4f}±{numpy.std(accuracies):0.4f} time: {time / len(data_batch):3.2f}')
  • 1
  • 2

而且,在編寫查詢或生成程式碼片段時非常方便:

query = f"INSERT INTO STATION VALUES (13, '{city}', '{state}', {latitude}, {longitude})"
  • 1

嚴格排序

下面這些比較操作在python3裡是非法的

# All these comparisons are illegal in Python 3
3 < '3'
2 < None
(3, 4) < (3, None)
(4, 5) < [4, 5]

# False in both Python 2 and Python 3
(4, 5) == [4, 5]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

不同型別的資料無法排序

sorted([2, '1', 3])  # invalid for Python 3, in Python 2 returns [2, 3, '1']
  • 1

NLP Unicode問題

s = '您好'
print(len(s))
print(s[:2])

Output:

Python 2: 6\n��
Python 3: 2\n您好.


x = u'со'
x += 'co' # ok
x += 'со' # fail
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

下面這段程式碼在Python2裡執行失敗但是Python3會成功執行,Python3的字串都是Unicode編碼,所以這樣對NLP來說很方便,再比如:

'a' < type < u'a'  # Python 2: True
'a' < u'a'         # Python 2: False
  • 1
  • 2
from collections import Counter
Counter('Möbelstück')
Python 2: Counter({'\xc3': 2, 'b': 1, 'e': 1, 'c': 1, 'k': 1, 'M': 1, 'l': 1, 's': 1, 't': 1, '\xb6': 1, '\xbc': 1})
Python 3: Counter({'M': 1, 'ö': 1, 'b': 1, 'e': 1, 'l': 1, 's': 1, 't': 1, 'ü': 1, 'c': 1, 'k': 1})
  • 1
  • 2
  • 3
  • 4

字典

CPython3.6+裡的dict預設的行為和orderdict很類似

import json
x = {str(i):i for i in range(5)}
json.loads(json.dumps(x))
# Python 2
{u'1': 1, u'0': 0, u'3': 3, u'2': 2, u'4': 4}
# Python 3
{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

同樣的,**kwargs字典內容的資料和傳入引數的順序是一致的。

from torch import nn

# Python 2
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

# Python 3.6+, how it *can* be done, not supported right now in pytorch
model = nn.Sequential(
    conv1=nn.Conv2d(1,20,5),
    relu1=nn.ReLU(),
    conv2=nn.Conv2d(20,64,5),
    relu2=nn.ReLU())
)        
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

Iterable unpacking

# handy when amount of additional stored info may vary between experiments, but the same code can be used in all cases
model_paramteres, optimizer_parameters, *other_params = load(checkpoint_name)

# picking two last values from a sequence
*prev, next_to_last, last = values_history

# This also works with any iterables, so if you have a function that yields e.g. qualities,
# below is a simple way to take only last two values from a list 
*prev, next_to_last, last = iter_train(args)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

更高效能的預設pickle engine

# Python 2
import cPickle as pickle
import numpy
print len(pickle.dumps(numpy.random.normal(size=[1000, 1000])))
# result: 23691675

# Python 3
import pickle
import numpy
len(pickle.dumps(numpy.random.normal(size=[1000, 1000])))
# result: 8000162
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

縮短到Python2時間的1/3

更安全的列表推導

labels = <initial_value>
predictions = [model.predict(data) for data, labels in dataset]

# labels are overwritten in Python 2
# labels are not affected by comprehension in Python 3
  • 1
  • 2
  • 3
  • 4
  • 5

更簡易的super()

# Python 2
class MySubClass(MySuperClass):
    def __init__(self, name, **options):
        super(MySubClass, self).__init__(name='subclass', **options)

# Python 3
class MySubClass(MySuperClass):
    def __init__(self, name, **options):
        super().__init__(name='subclass', **options)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

Multiple unpacking

合併兩個Dict

x = dict(a=1, b=2)
y = dict(b=3, d=4)
# Python 3.5+
z = {**x, **y} 
# z = {'a': 1, 'b': 3, 'd': 4}, note that value for `b` is taken from the latter dict.
  • 1
  • 2
  • 3
  • 4
  • 5

Python3.5+不僅僅合併dict很方便,合併list等也很方便

[*a, *b, *c] # list, concatenating 
(*a, *b, *c) # tuple, concatenating 
{*a, *b, *c} # set, union 
  • 1
  • 2
  • 3
Python 3.5+
do_something(**{**default_settings, **custom_settings})

# Also possible, this code also checks there is no intersection between keys of dictionaries
do_something(**first_args, **second_args)
  • 1
  • 2
  • 3
  • 4
  • 5

整數型別

python2提供了兩個整數型別:int和long,python3只提供有個整數型別:int,如下的程式碼:

isinstance(x, numbers.Integral) # Python 2, the canonical way
isinstance(x, (long, int))      # Python 2
isinstance(x, int)              # Python 3, easier to remember
  • 1
  • 2
  • 3

總結

python3提供了很多新的特性,方便我們編碼的同時,也帶來了更好的安全性和較高的效能。而且官方也一直推薦儘快遷移到python3。當然,遷移的代價因系統而異,希望這篇文章能對你遷移python2到python3有些幫助。

相關文章

                                                                                                                                               +   +   +   +                                              
  • 1
				<script>
					(function(){
						function setArticleH(btnReadmore,posi){
							var winH = $(window).height();
							var articleBox = $("div.article_content");
							var artH = articleBox.height();
							if(artH > winH*posi){
								articleBox.css({
									'height':winH*posi+'px',
									'overflow':'hidden'
								})
								btnReadmore.click(function(){
									articleBox.removeAttr("style");
									$(this).parent().remove();
								})
							}else{
								btnReadmore.parent().remove();
							}
						}
						var btnReadmore = $("#btn-readmore");
						if(btnReadmore.length>0){
							if(currentUserName){
								setArticleH(btnReadmore,3);
							}else{
								setArticleH(btnReadmore,1.2);
							}
						}
					})()
				</script>
				</article>