27 Django 的檔案上傳
在學習完 Django 的 Form 模組後,我們就用最後一個常用的檔案上傳的場景來結束本部分的內容。
1. Django 的檔案上傳實驗
同樣,話不多說,我們先通過兩個上傳的例子來看看 Django 的上傳功能。
實驗1:簡單檔案上傳
準備本地檔案,upload.txt,上傳到伺服器的 /root/test/django
目錄下;
準備模板檔案,顯示上傳按鈕:
<form method="post" action="/hello/file_upload/" enctype="multipart/form-data">
{% csrf_token %}
{{ forms }}< br>
<input type="submit" value="提交">
</form>
完成 Form 表單以及檢視函式的編寫:
class FileUploadForm(forms.Form):
file = forms.FileField(label="檔案上傳")
def handle_uploaded_file(f):
save_path = os.path.join('/root/test/django', f.name)
with open(save_path, 'wb+') as fp:
for chunk in f.chunks():
fp.write(chunk)
@csrf_exempt
def file_upload(request, *args, **kwargs):
error_msg = ""
if request.method == 'POST':
forms = FileUploadForm(request.POST,request.FILES)
if forms.is_valid():
handle_uploaded_file(request.FILES['file'])
return HttpResponse('上傳成功')
error_msg = "異常"
else:
forms = FileUploadForm()
return render(request,'test_file_upload.html',{'forms':forms, "error_msg": error_msg})
編寫 URLConf 配置:
urlpatterns = [
# ...
# 檔案上傳測試
path('file_upload/', views.file_upload)
]
只需要這樣幾步,一個簡單的檔案上傳就完成了。接下來啟動服務進行測試,參考如下的操作:
實驗2:使用模型(model) 處理上傳的檔案
第一步,先設定 settings.py 中的 MEDIA_ROOT,這個設定上傳檔案儲存的根目錄;
# first_django_app/settings.py
# ...
MEDIA_ROOT = '/root/test/'
# ...
第二步,準備檔案上傳模型類
# hello_app/models.py
# ...
class FileModel(models.Model):
name = models.CharField('上傳檔名', max_length=20)
upload_file = models.FileField(upload_to='django')
注意:這個 upload_to 引數和 settings.py 中的 MEDIA_ROOT 屬性值一起確定檔案上傳的目錄。它可以有很多種形式,比如寫成upload_to='django/%Y/%m/%d'
這樣的。此外,該引數可以接收方法名,該方法返回的是上傳檔案的目錄。
第三步,我們必須要生成這個對應的表,使用如下命令:
(django-manual) [root@server first_django_app]# python manage.py makemigrations hello_app
(django-manual) [root@server first_django_app]# python manage.py migrate hello_app
執行完成這兩步之後,在資料庫裡面,我們就生成了相應的表。預設的表面是[應用名_模型類名小寫],即hello_app__filemodel。
第三步, 準備相應的檢視函式;
# hello_app/views.py
# ...
def file_upload2(request, *args, **kwargs):
if request.method == 'POST':
upload_file = request.FILES['upload_file']
FileModel.objects.create(name=upload_file.name, upload_file=upload_file)
return HttpResponse('上傳成功')
return render(request,'test_file_upload2.html',{})
# ...
(django-manual) [root@server first_django_app]# cat templates/test_file_upload2.html
{% load staticfiles %}
<form method="post" action="/hello/file_upload2/" enctype="multipart/form-data">
{% csrf_token %}
<label>選擇上傳檔案:</label><input type="file" name="file">
<div><input type="submit" value="提交" style="margin-top:10px"></div>
</form>
注意:這裡和之前儲存檔案方式略有不同,直接使用對應模型例項的儲存資料方法即可,檔案將會自動上傳到指定目錄下且會在資料庫中新增一條記錄。
編寫對應的 URLconf 配置,如下:
# hello_app/urls.py
# ...
urlpatterns = [
# ...
# 檔案上傳測試
path('file_upload2/', views.file_upload2)
]
接下來,就是常規的啟動服務,然後頁面上測試。參考如下:
實驗3:多檔案上傳實驗
實現一次上傳多個檔案也比較簡單,我們只需要改動前端的一行程式碼,就可以支援一次性上傳多個檔案。改動前端程式碼如下:
<!--原來的語句 <label>選擇上傳檔案:</label><input type="file" name="file"> -->
<label>選擇上傳檔案:</label><input type="file" name="files" multiple="">
接下來,簡單調整下檢視函式:
def file_upload2(request, *args, **kwargs):
if request.method == 'POST':
# 獲取檔案列表
upload_files = request.FILES.getlist('files')
# 遍歷檔案並儲存
for f in upload_files:
FileModel.objects.create(name=f.name, upload_file=f)
return HttpResponse('上傳成功')
return render(request,'test_file_upload2.html',{})
最後看我們的啟動服務和測試介面過程如下:
2. Django 的檔案上傳程式碼分析
2.1 Django 中和上傳檔案相關的基礎類
這一節主要是來分析下 Django 中和上傳檔案相關的程式碼。首先介紹下幾個基礎類:
FileProxyMixin 類:用於輔助檔案上傳的 mixin 類。來看看其原始碼長相:
# 原始碼路徑: django/core/files/utils.py
class FileProxyMixin:
"""
A mixin class used to forward file methods to an underlaying file
object. The internal file object has to be called "file"::
class FileProxy(FileProxyMixin):
def __init__(self, file):
self.file = file
"""
encoding = property(lambda self: self.file.encoding)
fileno = property(lambda self: self.file.fileno)
flush = property(lambda self: self.file.flush)
isatty = property(lambda self: self.file.isatty)
newlines = property(lambda self: self.file.newlines)
read = property(lambda self: self.file.read)
readinto = property(lambda self: self.file.readinto)
readline = property(lambda self: self.file.readline)
readlines = property(lambda self: self.file.readlines)
seek = property(lambda self: self.file.seek)
tell = property(lambda self: self.file.tell)
truncate = property(lambda self: self.file.truncate)
write = property(lambda self: self.file.write)
writelines = property(lambda self: self.file.writelines)
@property
def closed(self):
return not self.file or self.file.closed
def readable(self):
if self.closed:
return False
if hasattr(self.file, 'readable'):
return self.file.readable()
return True
def writable(self):
if self.closed:
return False
if hasattr(self.file, 'writable'):
return self.file.writable()
return 'w' in getattr(self.file, 'mode', '')
def seekable(self):
if self.closed:
return False
if hasattr(self.file, 'seekable'):
return self.file.seekable()
return True
def __iter__(self):
return iter(self.file)
注意:可以看到,想要繼承這個 Mixin 並正常使用,繼承的類應該有例項屬性 file
。這裡 Mixin 中的屬性和我們在 Python 中用 open()
方法得到的檔案物件的屬性幾乎一致,後面實驗中可以得到佐證。
File 類:專門為上傳檔案的定義的基類,直接看原始碼。
class File(FileProxyMixin):
DEFAULT_CHUNK_SIZE = 64 * 2 ** 10
def __init__(self, file, name=None):
self.file = file
if name is None:
name = getattr(file, 'name', None)
self.name = name
if hasattr(file, 'mode'):
self.mode = file.mode
def __str__(self):
return self.name or ''
def __repr__(self):
return "<%s: %s>" % (self.__class__.__name__, self or "None")
def __bool__(self):
return bool(self.name)
def __len__(self):
return self.size
@cached_property
def size(self):
if hasattr(self.file, 'size'):
return self.file.size
if hasattr(self.file, 'name'):
try:
return os.path.getsize(self.file.name)
except (OSError, TypeError):
pass
if hasattr(self.file, 'tell') and hasattr(self.file, 'seek'):
pos = self.file.tell()
self.file.seek(0, os.SEEK_END)
size = self.file.tell()
self.file.seek(pos)
return size
raise AttributeError("Unable to determine the file's size.")
def chunks(self, chunk_size=None):
"""
Read the file and yield chunks of ``chunk_size`` bytes (defaults to
``File.DEFAULT_CHUNK_SIZE``).
"""
chunk_size = chunk_size or self.DEFAULT_CHUNK_SIZE
try:
self.seek(0)
except (AttributeError, UnsupportedOperation):
pass
while True:
data = self.read(chunk_size)
if not data:
break
yield data
def multiple_chunks(self, chunk_size=None):
"""
Return ``True`` if you can expect multiple chunks.
NB: If a particular file representation is in memory, subclasses should
always return ``False`` -- there's no good reason to read from memory in
chunks.
"""
return self.size > (chunk_size or self.DEFAULT_CHUNK_SIZE)
# ...
def open(self, mode=None):
if not self.closed:
self.seek(0)
elif self.name and os.path.exists(self.name):
self.file = open(self.name, mode or self.mode)
else:
raise ValueError("The file cannot be reopened.")
return self
def close(self):
self.file.close()
這裡就能看到我們之前在實驗1中用來儲存上傳檔案時用到的 chunks()
方法,我們現在通過 Django 的命令列模式來使用下這個 File 類,看它有哪些功能。
(django-manual) [root@server first_django_app]# python manage.py shell
Python 3.8.1 (default, Dec 24 2019, 17:04:00)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from django.core.files import File
接下來,我們看到 File 類例項化時要關聯一個檔案物件,我們使用之前實驗1上傳的檔案 upload.txt 作為例項化引數:
>>> fp = open('/root/test/django/upload.txt', 'r+')
>>> f = File(fp)
接下來我們就可以測試 File 物件中的各種屬性和方法了。具體操作如下:
>>> f.name
'/root/test/django/upload.txt'
>>> f.size
47
# 按照20位元組大小,判斷檔案需不需要分塊讀入
>>> f.multiple_chunks(20)
True
# 預設塊大小64k,47位元組太小了,所以不用分塊讀入
>>> f.multiple_chunks()
False
我們可以使用 chunks()
方法分塊讀取檔案內容,然後做我們想做的事情,如下:
>>> for c in f.chunks():
... print('本次讀入:{}'.format(c))
...
本次讀入:測試上傳檔案
xxxxx
spyinx test upload
>>> for c in f.chunks(20):
... print('本次讀入:{}'.format(c))
...
本次讀入:測試上傳檔案
xxxxx
spyinx
本次讀入: test upload
上面測試了2種形式,一種不需要分塊讀如資料,一口氣讀完所有內容(因為預設的分塊大小大於檔案內容)。另一種則設定小一些分塊大小,這樣會每次讀取最多20位元組內容,依次列印讀取到的內容。
接下來我們看下和上傳相關的兩個檔案類:TemporaryUploadedFile
和 InMemoryUploadedFile
。這兩個類都是繼承自 UploadedFile
,而 UploadedFile
又是繼承至 File
類的。
# 原始碼路徑: django/core/files/uploadedfile.py
class UploadedFile(File):
"""
An abstract uploaded file (``TemporaryUploadedFile`` and
``InMemoryUploadedFile`` are the built-in concrete subclasses).
An ``UploadedFile`` object behaves somewhat like a file object and
represents some file data that the user submitted with a form.
"""
def __init__(self, file=None, name=None, content_type=None, size=None, charset=None, content_type_extra=None):
super().__init__(file, name)
self.size = size
self.content_type = content_type
self.charset = charset
self.content_type_extra = content_type_extra
def __repr__(self):
return "<%s: %s (%s)>" % (self.__class__.__name__, self.name, self.content_type)
def _get_name(self):
return self._name
def _set_name(self, name):
# Sanitize the file name so that it can't be dangerous.
if name is not None:
# Just use the basename of the file -- anything else is dangerous.
name = os.path.basename(name)
# File names longer than 255 characters can cause problems on older OSes.
if len(name) > 255:
name, ext = os.path.splitext(name)
ext = ext[:255]
name = name[:255 - len(ext)] + ext
self._name = name
name = property(_get_name, _set_name)
這個類相比於 File 基類主要是增加了多個例項屬性,其他方法到沒啥變化。接下里來看繼承這個類的兩個 File 類:
class TemporaryUploadedFile(UploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, content_type_extra=None):
_, ext = os.path.splitext(name)
file = tempfile.NamedTemporaryFile(suffix='.upload' + ext, dir=settings.FILE_UPLOAD_TEMP_DIR)
super().__init__(file, name, content_type, size, charset, content_type_extra)
def temporary_file_path(self):
"""Return the full path of this file."""
return self.file.name
def close(self):
try:
return self.file.close()
except FileNotFoundError:
# The file was moved or deleted before the tempfile could unlink
# it. Still sets self.file.close_called and calls
# self.file.file.close() before the exception.
pass
class InMemoryUploadedFile(UploadedFile):
"""
A file uploaded into memory (i.e. stream-to-memory).
"""
def __init__(self, file, field_name, name, content_type, size, charset, content_type_extra=None):
super().__init__(file, name, content_type, size, charset, content_type_extra)
self.field_name = field_name
def open(self, mode=None):
self.file.seek(0)
return self
def chunks(self, chunk_size=None):
self.file.seek(0)
yield self.read()
def multiple_chunks(self, chunk_size=None):
# Since it's in memory, we'll never have multiple chunks.
return False
這兩段程式碼非常簡單,程式碼展現的邏輯也非常清晰。TemporaryUploadedFile
開啟的檔案是臨時生成的檔案,而 InMemoryUploadedFile
類對於上傳的檔案會儲存到記憶體中。我們熟悉了這兩個類之後來對應的處理上傳檔案的 Handler,一個會使用 TemporaryUploadedFile
類使用臨時檔案儲存上傳的檔案,另一個會使用 InMemoryUploadedFile
將上傳檔案的內容寫到記憶體中:
class TemporaryFileUploadHandler(FileUploadHandler):
"""
Upload handler that streams data into a temporary file.
"""
def new_file(self, *args, **kwargs):
"""
Create the file object to append to as data is coming in.
"""
super().new_file(*args, **kwargs)
# 這個檔案是開啟臨時檔案的控制代碼
self.file = TemporaryUploadedFile(self.file_name, self.content_type, 0, self.charset, self.content_type_extra)
# 將受到的資料寫入到對應的臨時檔案中
def receive_data_chunk(self, raw_data, start):
self.file.write(raw_data)
# 處理檔案完畢
def file_complete(self, file_size):
# 檔案指標,指向初始位置
self.file.seek(0)
# 設定檔案大小
self.file.size = file_size
return self.file
class MemoryFileUploadHandler(FileUploadHandler):
"""
File upload handler to stream uploads into memory (used for small files).
"""
def handle_raw_input(self, input_data, META, content_length, boundary, encoding=None):
"""
Use the content_length to signal whether or not this handler should be
used.
"""
# Check the content-length header to see if we should
# If the post is too large, we cannot use the Memory handler.
self.activated = content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE
def new_file(self, *args, **kwargs):
super().new_file(*args, **kwargs)
if self.activated:
self.file = BytesIO()
raise StopFutureHandlers()
def receive_data_chunk(self, raw_data, start):
"""Add the data to the BytesIO file."""
if self.activated:
self.file.write(raw_data)
else:
return raw_data
def file_complete(self, file_size):
"""Return a file object if this handler is activated."""
if not self.activated:
return
self.file.seek(0)
return InMemoryUploadedFile(
file=self.file,
field_name=self.field_name,
name=self.file_name,
content_type=self.content_type,
size=file_size,
charset=self.charset,
content_type_extra=self.content_type_extra
)
2.2 Django 中上傳檔案流程追蹤
這部分內容會有點複雜和枯燥,我會盡量簡化程式碼,並使用前面的上傳實驗幫助我們在原始碼中列印一些 print
語句,輔助我們更好的理解整個上傳過程。
思考問題:為什麼上傳檔案時,我們能通過 request.FILES['file']
拿到檔案?Django 幫我們把檔案資訊存到這裡面,那麼它是如何處理上傳的檔案的呢?
我們現在的目的就是要搞清楚上面的問題,可能裡面的程式碼會比較複雜,目前我們不深入研究程式碼細節,只是搞清楚整個過程以及 Django 幫我們做了哪些工作。
首先,我們列印下檢視函式的 request
引數,發現它是 django.core.handlers.wsgi.WSGIRequest
的一個例項,這在很早之前也是介紹過的。我們重點看看 WSGIRequest
類中的 FILES 屬性:
# 原始碼位置:django/core/handlers/wsgi.py
# ...
class WSGIRequest(HttpRequest):
# ...
@property
def FILES(self):
if not hasattr(self, '_files'):
self._load_post_and_files()
return self._files
# ...
看到這裡,我們就大概知道 FILES
屬性值的來源了,就是通過 self._load_post_and_files()
這個方法設定self._files
值,而這個就是 FILES
的值。接下來就是繼續深入 self._load_post_and_files()
這個方法,但是我們不追究程式碼細節。
# 原始碼位置:django/http/request.py
class HttpRequest:
"""A basic HTTP request."""
# ...
def _load_post_and_files(self):
"""Populate self._post and self._files if the content-type is a form type"""
if self.method != 'POST':
self._post, self._files = QueryDict(encoding=self._encoding), MultiValueDict()
return
if self._read_started and not hasattr(self, '_body'):
self._mark_post_parse_error()
return
if self.content_type == 'multipart/form-data':
if hasattr(self, '_body'):
# Use already read data
data = BytesIO(self._body)
else:
data = self
try:
self._post, self._files = self.parse_file_upload(self.META, data)
except MultiPartParserError:
# An error occurred while parsing POST data. Since when
# formatting the error the request handler might access
# self.POST, set self._post and self._file to prevent
# attempts to parse POST data again.
self._mark_post_parse_error()
raise
elif self.content_type == 'application/x-www-form-urlencoded':
self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
else:
self._post, self._files = QueryDict(encoding=self._encoding), MultiValueDict()
# ...
一般而言,我們使用的是 form 表單提交的上傳,對應的 content-type 大部分時候是 multipart/form-data
。所以,獲取 _files
屬性的最重要的程式碼就是:
self._post, self._files = self.parse_file_upload(self.META, data)
咋繼續追蹤 self.parse_file_upload()
這個方法。
class HttpRequest:
"""A basic HTTP request."""
# ...
def _initialize_handlers(self):
self._upload_handlers = [uploadhandler.load_handler(handler, self)
for handler in settings.FILE_UPLOAD_HANDLERS]
@property
def upload_handlers(self):
if not self._upload_handlers:
# If there are no upload handlers defined, initialize them from settings.
self._initialize_handlers()
return self._upload_handlers
@upload_handlers.setter
def upload_handlers(self, upload_handlers):
if hasattr(self, '_files'):
raise AttributeError("You cannot set the upload handlers after the upload has been processed.")
self._upload_handlers = upload_handlers
def parse_file_upload(self, META, post_data):
"""Return a tuple of (POST QueryDict, FILES MultiValueDict)."""
self.upload_handlers = ImmutableList(
self.upload_handlers,
warning="You cannot alter upload handlers after the upload has been processed."
)
parser = MultiPartParser(META, post_data, self.upload_handlers, self.encoding)
return parser.parse()
# ...
這三個涉及的函式都比較簡單,主要是獲取處理上傳檔案的 handlers。settings.FILE_UPLOAD_HANDLERS
這個值是取得 global_settings.py
中設定的,而非專案的 settings.py
檔案(該檔案預設沒有設定該引數值)。但是我們可以在 settings.py
檔案中設定 FILE_UPLOAD_HANDLERS
的值以覆蓋預設的 handlers。
# 原始碼位置:django\conf\global_settings.py
# ...
# List of upload handler classes to be applied in order.
FILE_UPLOAD_HANDLERS = [
'django.core.files.uploadhandler.MemoryFileUploadHandler',
'django.core.files.uploadhandler.TemporaryFileUploadHandler',
]
# ...
最後可以看到 parse_file_upload()
方法的核心語句也只有一句:
parser = MultiPartParser(META, post_data, self.upload_handlers, self.encoding)
最後呼叫 parser.parse()
方法獲得結果。最後要說明的是 parser.parse()
比較複雜,我們簡單看下函式的大致內容即可,課後在繼續深究函式的細節:
# 原始碼位置:django/http/multipartparser.py
class MultiPartParser:
# ...
def parse(self):
"""
Parse the POST data and break it into a FILES MultiValueDict and a POST
MultiValueDict.
Return a tuple containing the POST and FILES dictionary, respectively.
"""
from django.http import QueryDict
encoding = self._encoding
handlers = self._upload_handlers
# HTTP spec says that Content-Length >= 0 is valid
# handling content-length == 0 before continuing
if self._content_length == 0:
return QueryDict(encoding=self._encoding), MultiValueDict()
# See if any of the handlers take care of the parsing.
# This allows overriding everything if need be.
for handler in handlers:
result = handler.handle_raw_input(
self._input_data,
self._meta,
self._content_length,
self._boundary,
encoding,
)
# Check to see if it was handled
if result is not None:
return result[0], result[1]
# Create the data structures to be used later.
self._post = QueryDict(mutable=True)
self._files = MultiValueDict()
# Instantiate the parser and stream:
stream = LazyStream(ChunkIter(self._input_data, self._chunk_size))
# Whether or not to signal a file-completion at the beginning of the loop.
old_field_name = None
counters = [0] * len(handlers)
# Number of bytes that have been read.
num_bytes_read = 0
# To count the number of keys in the request.
num_post_keys = 0
# To limit the amount of data read from the request.
read_size = None
# ...
# Signal that the upload has completed.
# any() shortcircuits if a handler's upload_complete() returns a value.
any(handler.upload_complete() for handler in handlers)
self._post._mutable = False
return self._post, self._files
可以看到,這個函式最後得到 self._post, self._files
, 然後返回該結果。有興趣的話可以自行在這幾個重要的地方加上 print()
方法看看對應的 self._post, self._files
的輸出結果,有助於加深印象。
3. 小結
本小節首先以三個檔案上傳實驗演示了 Django 中的檔案上傳功能。接下來我們分析了 Django 中涉及檔案上傳相關的類以及對應的配置引數。在經過這一節的講解後,我們第三部分內容算是徹底結束了,這部分涉及了許多原始碼的講解,會有些枯燥,但是非常有意思。但是如果你能認真追下來,並課後繼續閱讀和除錯程式碼,相信你會在日後成為 Django 高手,遇到任何問題都能夠自己獨立解決。