Speakers: Tom Augspurger, James Crist, Martin Durant
The libraries that power data analysis in Python are essentially limited to a single CPU core and to datasets that fit in RAM. Attendees will see how dask can parallelize their workflows, while still writing what looks like normal python, NumPy, or pandas code.
Dask is a parallel computing framework, with a focus on analytical computing. We'll start with `dask.delayed`, which helps parallelize your existing Python code. We’ll demonstrate `dask.delayed` on a small example, introducing the concepts at the heart of dask like the *task graph* and the *schedulers* that execute tasks. We’ll compare this approach to the simpler, but less flexible, parallelization methods available in the standard library like `concurrent.futures`.
Attendees will see the high-level collections dask provides for writing regular Python, NumPy, or Pandas code that is then executed in parallel on datasets that may be larger than memory. These high level collections provide a familiar API, but the execution model is very different. We'll discuss concepts like the GIL, serialization, and other headaches that come up with parallel programming. We’ll use dask’s various schedulers to illustrate the differences between multi-threaded, multi-processes, and distributed computing.
Dask includes a distributed scheduler for executing task graphs on a cluster of machines. We’ll provide each person access to their own cluster.
Slides can be found at: https://speakerdeck.com/pycon2018 and https://github.com/PyCon/2018-slides
The libraries that power data analysis in Python are essentially limited to a single CPU core and to datasets that fit in RAM. Attendees will see how dask can parallelize their workflows, while still writing what looks like normal python, NumPy, or pandas code.
Dask is a parallel computing framework, with a focus on analytical computing. We'll start with `dask.delayed`, which helps parallelize your existing Python code. We’ll demonstrate `dask.delayed` on a small example, introducing the concepts at the heart of dask like the *task graph* and the *schedulers* that execute tasks. We’ll compare this approach to the simpler, but less flexible, parallelization methods available in the standard library like `concurrent.futures`.
Attendees will see the high-level collections dask provides for writing regular Python, NumPy, or Pandas code that is then executed in parallel on datasets that may be larger than memory. These high level collections provide a familiar API, but the execution model is very different. We'll discuss concepts like the GIL, serialization, and other headaches that come up with parallel programming. We’ll use dask’s various schedulers to illustrate the differences between multi-threaded, multi-processes, and distributed computing.
Dask includes a distributed scheduler for executing task graphs on a cluster of machines. We’ll provide each person access to their own cluster.
Slides can be found at: https://speakerdeck.com/pycon2018 and https://github.com/PyCon/2018-slides
Tom Augspurger, James Crist, Martin Durant - Parallel Data Analysis with Dask - PyCon 2018 camera iphone 8 plus apk | |
35 Likes | 35 Dislikes |
2,859 views views | 17.9K followers |
People & Blogs | Upload TimePublished on 10 May 2018 |
Không có nhận xét nào:
Đăng nhận xét