Last summer Microsoft has rebranded the Azure Kusto Query engine as Azure Data Explorer. While it does not support fully elastic scaling, it at least allows to scale up and out a cluster via an API or the Azure portal to adapt to different workloads. It also offers parquet support out of the box which made me spend some time to look into it.
#python #pydata #azure #parquet
Apache Parquet is a columnar file format to
work with gigabytes of data. Reading and writing parquet files is efficiently
exposed to python with pyarrow. Additional statistics allow clients to use
predicate pushdown to only read subsets of data to reduce I/O.
Organizing data by column allows for better
compression, as data is more homogeneous. Better compression also reduces the
bandwidth required to read the input.
#python #pydata #parquet #arrow #pandas
Using Infrastructure-as-Code principles with configuration through machine processable definition files in combination with the adoption of cloud computing provides faster feedback cycles in development/testing and less risk in deployment to production.
This talk will give an overview on how to deploy web services on the Azure Cloud with different tools like Azure Resource Manager Templates, the Azure SDK for Python and the Azure module for Ansible and present best practices learned while moving a company into the Azure Cloud.