Python下载文件的简单示例-Article-IT72.COM

Python下载文件的简单示例

Home / Article 百晓生 2019-3-19 5160

在本教程中，您将学习如何使用不同的Python模块从Web下载文件。此外，您还将下载常规文件，网页，YouTube视频，Google云端硬盘文件，Amazon S3和其他来源。

最后，您将学习如何克服可能遇到的各种挑战，例如下载重定向文件，下载大文件，完成多线程下载以及其他策略。

使用请求

您可以使用请求模块从URL下载文件。

请考虑以下代码：

import requests
url = 'https://www.python.org/static/img/python-logo@2x.png'
myfile = requests.get(url)
open('c:/users/LikeGeeks/downloads/PythonImage.png', 'wb').write(myfile.content)

只需使用get 请求模块的方法获取URL ，并将结果存储到名为“myfile”的变量中。然后，将变量的内容写入文件。

使用wget

您还可以使用Python 的wget模块从URL下载文件。所述 wget 模块可使用PIP如下安装：

pip install wget

以下代码，我们将下载Python的徽标图像：

import wget
url = "https://www.python.org/static/img/python-logo@2x.png"
wget.download(url, 'c:/users/LikeGeeks/downloads/pythonLogo.png')

在此代码中，URL以及路径（将存储图像的位置）传递给wget模块的下载方法。

下载重定向的文件

在本节中，您将学习如何从URL下载，该URL使用请求重定向到带有.pdf文件的另一个URL。URL的内容如下：

https://readthedocs.org/projects/python-guide/downloads/pdf/latest/

要下载此pdf文件，请使用以下代码：

import requests
url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
myfile = requests.get(url, allow_redirects=True)
open('c:/users/LikeGeeks/documents/hello.pdf', 'wb').write(myfile.content)

在此代码中，我们指定的第一步是URL。然后，我们使用请求模块的get方法来获取URL。在get方法中，我们将其设置 allow_redirects 为True，这将允许URL中的重定向，并且重定向后的内容将分配给变量 myfile。

最后，我们打开一个文件来编写获取的内容。

在Chunks中下载大文件

请考虑以下代码：

import requests
url = 'https://www.python.org/static/img/python-logo@2x.png'
myfile = requests.get(url)
open('c:/users/LikeGeeks/downloads/PythonImage.png', 'wb').write(myfile.content)

首先，我们像之前一样使用请求模块的get方法，但这次我们将stream属性设置为True。

然后，我们在当前工作目录中创建一个名为PythonBook.pdf的文件并打开它进行写入。

然后，我们指定一次要下载的块大小。我们设置为1024字节，遍历每个块，并在文件中写入块，直到块完成。

下载块时，Python shell将如下所示：

下载多个文件（并行/批量下载）

要一次下载多个文件，请导入以下模块：

import os
import requests
from time import time
from multiprocessing.pool import ThreadPool

我们导入了os和time模块来检查下载文件所需的时间。该模块 ThreadPool 允许您使用池运行多个线程或进程。

让我们创建一个简单的函数，将响应发送到块中的文件：

def url_response(url):
    path, url = url
    r = requests.get(url, stream = True)
    with open(path, 'wb') as f:
        for ch in r:
            f.write(ch)

URL是一个二维数组，用于指定要下载的页面的路径和URL。

urls = [("Event1", "https://www.python.org/events/python-events/805/"),
("Event2", "https://www.python.org/events/python-events/801/"),
("Event3", "https://www.python.org/events/python-events/790/"),
("Event4", "https://www.python.org/events/python-events/798/"),
("Event5", "https://www.python.org/events/python-events/807/"),
("Event6", "https://www.python.org/events/python-events/807/"),
("Event7", "https://www.python.org/events/python-events/757/"),
("Event8", "https://www.python.org/events/python-user-group/816/")]

将URL传递给requests.get，就像我们在上一节中所做的那样。最后，打开文件（URL中指定的路径）并写入页面内容。

现在，我们可以分别为每个URL调用此函数，我们也可以同时为所有URL调用此函数。让我们分别在for循环中为每个URL执行此操作并注意计时器：

start = time()
for x in urls:
    url_response (x)
print(f"Time to download: {time() - start}")

下载进度条

进度条是客户端模块的UI小部件。要安装客户端模块，请键入以下命令：

pip install clint

代码

import requests
from clint.textui import progress
url = 'http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf'
r = requests.get(url, stream=True)
with open("LearnPython.pdf", "wb") as Pypdf:
    total_length = int(r.headers.get('content-length'))
    for ch in progress.bar(r.iter_content(chunk_size = 2391975), expected_size=(total_length/1024) + 1):
        if ch:
            Pypdf.write(ch)

在此代码中，我们导入了请求模块，然后，从clint.textui导入了progress小部件。唯一的区别在于for循环。我们在将内容写入文件时使用了进度模块的bar方法。

使用urllib下载网页

在本节中，我们将使用urllib下载网页。

urllib库是Python的标准库，因此您无需安装它。

以下代码行可以轻松下载网页：

urllib.request.urlretrieve('url', 'path')

在此处指定要保存的URL以及要存储的位置：

urllib.request.urlretrieve('https://www.python.org/', 'c:/users/LikeGeeks/documents/PythonOrganization.html')

在此代码中，我们使用了该 urlretrieve 方法并传递了文件的URL以及保存文件的路径。文件扩展名为.html。

通过代理下载

如果您需要使用代理来下载文件，则可以使用 ProxyHandler urllib模块。检查以下代码：

import urllib.request
>>> myProxy = urllib.request.ProxyHandler({'http': '127.0.0.2'})
>>> openProxy = urllib.request.build_opener(myProxy)
>>> urllib.request.urlretrieve('https://www.python.org/')

在此代码中，我们创建了代理对象，并通过调用build_opener urllib 的方法并传递代理对象来打开代理。然后，我们发出了检索页面的请求。

本文链接:https://it72.com/12519.htm