27 Django 的檔案上傳

在學習完 Django 的 Form 模組後,我們就用最後一個常用的檔案上傳的場景來結束本部分的內容。

1. Django 的檔案上傳實驗

同樣,話不多說,我們先通過兩個上傳的例子來看看 Django 的上傳功能。


準備本地檔案,upload.txt,上傳到伺服器的 /root/test/django 目錄下;


<form method="post" action="/hello/file_upload/" enctype="multipart/form-data">
    {% csrf_token %}
    {{ forms }}<
<input type="submit" value="提交"> </form>

完成 Form 表單以及檢視函式的編寫:

class FileUploadForm(forms.Form):
    file = forms.FileField(label="檔案上傳")

def handle_uploaded_file(f):
    save_path = os.path.join('/root/test/django', f.name)
    with open(save_path, 'wb+') as fp:
        for chunk in
f.chunks(): fp.write(chunk) @csrf_exempt def file_upload(request, *args, **kwargs): error_msg = "" if request.method == 'POST': forms = FileUploadForm(request.POST,request.FILES) if forms.is_valid(): handle_uploaded_file(request.FILES['file']) return
HttpResponse('上傳成功') error_msg = "異常" else: forms = FileUploadForm() return render(request,'test_file_upload.html',{'forms':forms, "error_msg": error_msg})

編寫 URLConf 配置:

urlpatterns = [
    # ...
    # 檔案上傳測試
    path('file_upload/', views.file_upload)



實驗2:使用模型(model) 處理上傳的檔案

第一步,先設定 settings.py 中的 MEDIA_ROOT,這個設定上傳檔案儲存的根目錄;

# first_django_app/settings.py
# ...

MEDIA_ROOT = '/root/test/'

# ...


# hello_app/models.py
# ...

class FileModel(models.Model):
    name = models.CharField('上傳檔名', max_length=20)
    upload_file = models.FileField(upload_to='django')

注意:這個 upload_to 引數和 settings.py 中的 MEDIA_ROOT 屬性值一起確定檔案上傳的目錄。它可以有很多種形式,比如寫成upload_to='django/%Y/%m/%d' 這樣的。此外,該引數可以接收方法名,該方法返回的是上傳檔案的目錄。


(django-manual) [root@server first_django_app]# python manage.py makemigrations hello_app
(django-manual) [root@server first_django_app]# python manage.py migrate hello_app



第三步, 準備相應的檢視函式;

# hello_app/views.py
# ...
def file_upload2(request, *args, **kwargs):
    if request.method == 'POST':
        upload_file = request.FILES['upload_file']
        FileModel.objects.create(name=upload_file.name, upload_file=upload_file)
        return HttpResponse('上傳成功')
    return render(request,'test_file_upload2.html',{})

# ...
(django-manual) [root@server first_django_app]# cat templates/test_file_upload2.html 
{% load staticfiles %}

<form method="post" action="/hello/file_upload2/" enctype="multipart/form-data"> 
    {% csrf_token %}
    <label>選擇上傳檔案:</label><input type="file" name="file">
    <div><input type="submit" value="提交" style="margin-top:10px"></div>


編寫對應的 URLconf 配置,如下:

# hello_app/urls.py
# ...

urlpatterns = [
    # ...
    # 檔案上傳測試
    path('file_upload2/', views.file_upload2)





<!--原來的語句 <label>選擇上傳檔案:</label><input type="file" name="file"> -->
<label>選擇上傳檔案:</label><input type="file" name="files" multiple="">


def file_upload2(request, *args, **kwargs):
    if request.method == 'POST':
        # 獲取檔案列表
        upload_files = request.FILES.getlist('files')
        # 遍歷檔案並儲存
        for f in upload_files:
            FileModel.objects.create(name=f.name, upload_file=f)
        return HttpResponse('上傳成功')
    return render(request,'test_file_upload2.html',{})


2. Django 的檔案上傳程式碼分析

2.1 Django 中和上傳檔案相關的基礎類

這一節主要是來分析下 Django 中和上傳檔案相關的程式碼。首先介紹下幾個基礎類:

FileProxyMixin 類:用於輔助檔案上傳的 mixin 類。來看看其原始碼長相:

# 原始碼路徑: django/core/files/utils.py

class FileProxyMixin:
    A mixin class used to forward file methods to an underlaying file
    object.  The internal file object has to be called "file"::

        class FileProxy(FileProxyMixin):
            def __init__(self, file):
                self.file = file

    encoding = property(lambda self: self.file.encoding)
    fileno = property(lambda self: self.file.fileno)
    flush = property(lambda self: self.file.flush)
    isatty = property(lambda self: self.file.isatty)
    newlines = property(lambda self: self.file.newlines)
    read = property(lambda self: self.file.read)
    readinto = property(lambda self: self.file.readinto)
    readline = property(lambda self: self.file.readline)
    readlines = property(lambda self: self.file.readlines)
    seek = property(lambda self: self.file.seek)
    tell = property(lambda self: self.file.tell)
    truncate = property(lambda self: self.file.truncate)
    write = property(lambda self: self.file.write)
    writelines = property(lambda self: self.file.writelines)

    def closed(self):
        return not self.file or self.file.closed

    def readable(self):
        if self.closed:
            return False
        if hasattr(self.file, 'readable'):
            return self.file.readable()
        return True

    def writable(self):
        if self.closed:
            return False
        if hasattr(self.file, 'writable'):
            return self.file.writable()
        return 'w' in getattr(self.file, 'mode', '')

    def seekable(self):
        if self.closed:
            return False
        if hasattr(self.file, 'seekable'):
            return self.file.seekable()
        return True

    def __iter__(self):
        return iter(self.file)

注意:可以看到,想要繼承這個 Mixin 並正常使用,繼承的類應該有例項屬性 file。這裡 Mixin 中的屬性和我們在 Python 中用 open()方法得到的檔案物件的屬性幾乎一致,後面實驗中可以得到佐證。

File 類:專門為上傳檔案的定義的基類,直接看原始碼。

class File(FileProxyMixin):
    DEFAULT_CHUNK_SIZE = 64 * 2 ** 10

    def __init__(self, file, name=None):
        self.file = file
        if name is None:
            name = getattr(file, 'name', None)
        self.name = name
        if hasattr(file, 'mode'):
            self.mode = file.mode

    def __str__(self):
        return self.name or ''

    def __repr__(self):
        return "<%s: %s>" % (self.__class__.__name__, self or "None")

    def __bool__(self):
        return bool(self.name)

    def __len__(self):
        return self.size

    def size(self):
        if hasattr(self.file, 'size'):
            return self.file.size
        if hasattr(self.file, 'name'):
                return os.path.getsize(self.file.name)
            except (OSError, TypeError):
        if hasattr(self.file, 'tell') and hasattr(self.file, 'seek'):
            pos = self.file.tell()
            self.file.seek(0, os.SEEK_END)
            size = self.file.tell()
            return size
        raise AttributeError("Unable to determine the file's size.")

    def chunks(self, chunk_size=None):
        Read the file and yield chunks of ``chunk_size`` bytes (defaults to
        chunk_size = chunk_size or self.DEFAULT_CHUNK_SIZE
        except (AttributeError, UnsupportedOperation):

        while True:
            data = self.read(chunk_size)
            if not data:
            yield data

    def multiple_chunks(self, chunk_size=None):
        Return ``True`` if you can expect multiple chunks.

        NB: If a particular file representation is in memory, subclasses should
        always return ``False`` -- there's no good reason to read from memory in
        return self.size > (chunk_size or self.DEFAULT_CHUNK_SIZE)

    # ...

    def open(self, mode=None):
        if not self.closed:
        elif self.name and os.path.exists(self.name):
            self.file = open(self.name, mode or self.mode)
            raise ValueError("The file cannot be reopened.")
        return self

    def close(self):

這裡就能看到我們之前在實驗1中用來儲存上傳檔案時用到的 chunks() 方法,我們現在通過 Django 的命令列模式來使用下這個 File 類,看它有哪些功能。

(django-manual) [root@server first_django_app]# python manage.py shell
Python 3.8.1 (default, Dec 24 2019, 17:04:00) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from django.core.files import File

接下來,我們看到 File 類例項化時要關聯一個檔案物件,我們使用之前實驗1上傳的檔案 upload.txt 作為例項化引數:

>>> fp = open('/root/test/django/upload.txt', 'r+')
>>> f = File(fp)

接下來我們就可以測試 File 物件中的各種屬性和方法了。具體操作如下:

>>> f.name
>>> f.size
# 按照20位元組大小,判斷檔案需不需要分塊讀入
>>> f.multiple_chunks(20)
# 預設塊大小64k,47位元組太小了,所以不用分塊讀入
>>> f.multiple_chunks()

我們可以使用 chunks() 方法分塊讀取檔案內容,然後做我們想做的事情,如下:

>>> for c in f.chunks():
...     print('本次讀入:{}'.format(c))

spyinx test upload
>>> for c in f.chunks(20):
...     print('本次讀入:{}'.format(c))

本次讀入: test upload


接下來我們看下和上傳相關的兩個檔案類:TemporaryUploadedFileInMemoryUploadedFile。這兩個類都是繼承自 UploadedFile,而 UploadedFile 又是繼承至 File 類的。

# 原始碼路徑: django/core/files/uploadedfile.py
class UploadedFile(File):
    An abstract uploaded file (``TemporaryUploadedFile`` and
    ``InMemoryUploadedFile`` are the built-in concrete subclasses).

    An ``UploadedFile`` object behaves somewhat like a file object and
    represents some file data that the user submitted with a form.

    def __init__(self, file=None, name=None, content_type=None, size=None, charset=None, content_type_extra=None):
        super().__init__(file, name)
        self.size = size
        self.content_type = content_type
        self.charset = charset
        self.content_type_extra = content_type_extra

    def __repr__(self):
        return "<%s: %s (%s)>" % (self.__class__.__name__, self.name, self.content_type)

    def _get_name(self):
        return self._name

    def _set_name(self, name):
        # Sanitize the file name so that it can't be dangerous.
        if name is not None:
            # Just use the basename of the file -- anything else is dangerous.
            name = os.path.basename(name)

            # File names longer than 255 characters can cause problems on older OSes.
            if len(name) > 255:
                name, ext = os.path.splitext(name)
                ext = ext[:255]
                name = name[:255 - len(ext)] + ext

        self._name = name

    name = property(_get_name, _set_name)

這個類相比於 File 基類主要是增加了多個例項屬性,其他方法到沒啥變化。接下里來看繼承這個類的兩個 File 類:

class TemporaryUploadedFile(UploadedFile):
    A file uploaded to a temporary location (i.e. stream-to-disk).
    def __init__(self, name, content_type, size, charset, content_type_extra=None):
        _, ext = os.path.splitext(name)
        file = tempfile.NamedTemporaryFile(suffix='.upload' + ext, dir=settings.FILE_UPLOAD_TEMP_DIR)
        super().__init__(file, name, content_type, size, charset, content_type_extra)

    def temporary_file_path(self):
        """Return the full path of this file."""
        return self.file.name

    def close(self):
            return self.file.close()
        except FileNotFoundError:
            # The file was moved or deleted before the tempfile could unlink
            # it. Still sets self.file.close_called and calls
            # self.file.file.close() before the exception.

class InMemoryUploadedFile(UploadedFile):
    A file uploaded into memory (i.e. stream-to-memory).
    def __init__(self, file, field_name, name, content_type, size, charset, content_type_extra=None):
        super().__init__(file, name, content_type, size, charset, content_type_extra)
        self.field_name = field_name

    def open(self, mode=None):
        return self

    def chunks(self, chunk_size=None):
        yield self.read()

    def multiple_chunks(self, chunk_size=None):
        # Since it's in memory, we'll never have multiple chunks.
        return False

這兩段程式碼非常簡單,程式碼展現的邏輯也非常清晰。TemporaryUploadedFile 開啟的檔案是臨時生成的檔案,而 InMemoryUploadedFile 類對於上傳的檔案會儲存到記憶體中。我們熟悉了這兩個類之後來對應的處理上傳檔案的 Handler,一個會使用 TemporaryUploadedFile 類使用臨時檔案儲存上傳的檔案,另一個會使用 InMemoryUploadedFile 將上傳檔案的內容寫到記憶體中:

class TemporaryFileUploadHandler(FileUploadHandler):
    Upload handler that streams data into a temporary file.
    def new_file(self, *args, **kwargs):
        Create the file object to append to as data is coming in.
        super().new_file(*args, **kwargs)
        # 這個檔案是開啟臨時檔案的控制代碼
        self.file = TemporaryUploadedFile(self.file_name, self.content_type, 0, self.charset, self.content_type_extra)

    # 將受到的資料寫入到對應的臨時檔案中
    def receive_data_chunk(self, raw_data, start):

    # 處理檔案完畢
    def file_complete(self, file_size):
        # 檔案指標,指向初始位置
        # 設定檔案大小
        self.file.size = file_size
        return self.file

class MemoryFileUploadHandler(FileUploadHandler):
    File upload handler to stream uploads into memory (used for small files).

    def handle_raw_input(self, input_data, META, content_length, boundary, encoding=None):
        Use the content_length to signal whether or not this handler should be
        # Check the content-length header to see if we should
        # If the post is too large, we cannot use the Memory handler.
        self.activated = content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE

    def new_file(self, *args, **kwargs):
        super().new_file(*args, **kwargs)
        if self.activated:
            self.file = BytesIO()
            raise StopFutureHandlers()

    def receive_data_chunk(self, raw_data, start):
        """Add the data to the BytesIO file."""
        if self.activated:
            return raw_data

    def file_complete(self, file_size):
        """Return a file object if this handler is activated."""
        if not self.activated:

        return InMemoryUploadedFile(

2.2 Django 中上傳檔案流程追蹤

這部分內容會有點複雜和枯燥,我會盡量簡化程式碼,並使用前面的上傳實驗幫助我們在原始碼中列印一些 print語句,輔助我們更好的理解整個上傳過程。

思考問題:為什麼上傳檔案時,我們能通過 request.FILES['file'] 拿到檔案?Django 幫我們把檔案資訊存到這裡面,那麼它是如何處理上傳的檔案的呢?

我們現在的目的就是要搞清楚上面的問題,可能裡面的程式碼會比較複雜,目前我們不深入研究程式碼細節,只是搞清楚整個過程以及 Django 幫我們做了哪些工作。

首先,我們列印下檢視函式的 request 引數,發現它是 django.core.handlers.wsgi.WSGIRequest 的一個例項,這在很早之前也是介紹過的。我們重點看看 WSGIRequest 類中的 FILES 屬性:

# 原始碼位置:django/core/handlers/wsgi.py
# ...
class WSGIRequest(HttpRequest):
    # ...
    def FILES(self):
        if not hasattr(self, '_files'):
        return self._files
    # ...

看到這裡,我們就大概知道 FILES 屬性值的來源了,就是通過 self._load_post_and_files() 這個方法設定self._files 值,而這個就是 FILES 的值。接下來就是繼續深入 self._load_post_and_files() 這個方法,但是我們不追究程式碼細節。

# 原始碼位置:django/http/request.py

class HttpRequest:
    """A basic HTTP request."""
    # ...
    def _load_post_and_files(self):
        """Populate self._post and self._files if the content-type is a form type"""
        if self.method != 'POST':
            self._post, self._files = QueryDict(encoding=self._encoding), MultiValueDict()
        if self._read_started and not hasattr(self, '_body'):

        if self.content_type == 'multipart/form-data':
            if hasattr(self, '_body'):
                # Use already read data
                data = BytesIO(self._body)
                data = self
                self._post, self._files = self.parse_file_upload(self.META, data)
            except MultiPartParserError:
                # An error occurred while parsing POST data. Since when
                # formatting the error the request handler might access
                # self.POST, set self._post and self._file to prevent
                # attempts to parse POST data again.
        elif self.content_type == 'application/x-www-form-urlencoded':
            self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
            self._post, self._files = QueryDict(encoding=self._encoding), MultiValueDict()
    # ...

一般而言,我們使用的是 form 表單提交的上傳,對應的 content-type 大部分時候是 multipart/form-data。所以,獲取 _files 屬性的最重要的程式碼就是:

self._post, self._files = self.parse_file_upload(self.META, data)

咋繼續追蹤 self.parse_file_upload() 這個方法。

class HttpRequest:
    """A basic HTTP request."""
    # ...

    def _initialize_handlers(self):
        self._upload_handlers = [uploadhandler.load_handler(handler, self)
                                 for handler in settings.FILE_UPLOAD_HANDLERS]

    def upload_handlers(self):
        if not self._upload_handlers:
            # If there are no upload handlers defined, initialize them from settings.
        return self._upload_handlers

    def upload_handlers(self, upload_handlers):
        if hasattr(self, '_files'):
            raise AttributeError("You cannot set the upload handlers after the upload has been processed.")
        self._upload_handlers = upload_handlers

    def parse_file_upload(self, META, post_data):
        """Return a tuple of (POST QueryDict, FILES MultiValueDict)."""
        self.upload_handlers = ImmutableList(
            warning="You cannot alter upload handlers after the upload has been processed."
        parser = MultiPartParser(META, post_data, self.upload_handlers, self.encoding)
        return parser.parse()
    # ...

這三個涉及的函式都比較簡單,主要是獲取處理上傳檔案的 handlers。settings.FILE_UPLOAD_HANDLERS 這個值是取得 global_settings.py 中設定的,而非專案的 settings.py 檔案(該檔案預設沒有設定該引數值)。但是我們可以在 settings.py 檔案中設定 FILE_UPLOAD_HANDLERS 的值以覆蓋預設的 handlers。

# 原始碼位置:django\conf\global_settings.py

# ...

# List of upload handler classes to be applied in order.

# ...

最後可以看到 parse_file_upload() 方法的核心語句也只有一句:

parser = MultiPartParser(META, post_data, self.upload_handlers, self.encoding)

最後呼叫 parser.parse() 方法獲得結果。最後要說明的是 parser.parse() 比較複雜,我們簡單看下函式的大致內容即可,課後在繼續深究函式的細節:

# 原始碼位置:django/http/multipartparser.py

class MultiPartParser:
    # ...
        def parse(self):
        Parse the POST data and break it into a FILES MultiValueDict and a POST

        Return a tuple containing the POST and FILES dictionary, respectively.
        from django.http import QueryDict

        encoding = self._encoding
        handlers = self._upload_handlers

        # HTTP spec says that Content-Length >= 0 is valid
        # handling content-length == 0 before continuing
        if self._content_length == 0:
            return QueryDict(encoding=self._encoding), MultiValueDict()

        # See if any of the handlers take care of the parsing.
        # This allows overriding everything if need be.
        for handler in handlers:
            result = handler.handle_raw_input(
            # Check to see if it was handled
            if result is not None:
                return result[0], result[1]

        # Create the data structures to be used later.
        self._post = QueryDict(mutable=True)
        self._files = MultiValueDict()

        # Instantiate the parser and stream:
        stream = LazyStream(ChunkIter(self._input_data, self._chunk_size))

        # Whether or not to signal a file-completion at the beginning of the loop.
        old_field_name = None
        counters = [0] * len(handlers)

        # Number of bytes that have been read.
        num_bytes_read = 0
        # To count the number of keys in the request.
        num_post_keys = 0
        # To limit the amount of data read from the request.
        read_size = None

        # ...

        # Signal that the upload has completed.
        # any() shortcircuits if a handler's upload_complete() returns a value.
        any(handler.upload_complete() for handler in handlers)
        self._post._mutable = False
        return self._post, self._files

可以看到,這個函式最後得到 self._post, self._files, 然後返回該結果。有興趣的話可以自行在這幾個重要的地方加上 print() 方法看看對應的 self._post, self._files 的輸出結果,有助於加深印象。

3. 小結

本小節首先以三個檔案上傳實驗演示了 Django 中的檔案上傳功能。接下來我們分析了 Django 中涉及檔案上傳相關的類以及對應的配置引數。在經過這一節的講解後,我們第三部分內容算是徹底結束了,這部分涉及了許多原始碼的講解,會有些枯燥,但是非常有意思。但是如果你能認真追下來,並課後繼續閱讀和除錯程式碼,相信你會在日後成為 Django 高手,遇到任何問題都能夠自己獨立解決。