PyCon US 2024 Recap
PyCon US is the largest Python conference in the world. This year, over two thousand people from around the world gathered at the David L. Lawrence Convention Centre in Pittsburgh, Pennsylvania to meet and talk about all things Python.
PyCon runs over the course of one week - two days of tutorials, three days of talks, two days of sprints (hacking on open-sourced software). I attended the talk portion of the conference and one day of the sprint, and really enjoyed it.
Here are my top highlights of the conference:
- Great Tables by Michael Chow and Richard Ianone from the Posit PBC team. This was a super engaging presentation about the “Great Tables” package for creating beautiful tables with Python. In the talk, Michael and Richard shared examples of Great Tables usage with a Polars data frame. I’m excited to try out this package at work and also for my personal projects.
- Side note: I was very surprised to see such a strong presence from the Posit team (formerly RStudio) at PyCon. I always assumed that Posit was mostly focused on R but this is not the case. In the hallway, I bumped into Winston Chang from Posit who showed me his latest work on building PyShiny (an RShiny version of Python). Pretty cool to see Posit branching out from R and into the Python space.
- DuckDB Overview by Alex Monahan. This fun and easy-to-follow talk gave a great introduction to DuckDB and all of its features. DuckDb is relatively new (around five years old) but has gained a ton of traction in the data space. Alex works at MotherDuck (a DuckDb cloud service company) and did a fantastic job highlighting the power of DuckDb.
- Dask DataFrame 2.0 by Patrick Hoefler. This talk gave a comprehensive overview of Dask’s newest improvements like a better shuffle algorithm, query optimization, and efficient memory usage with a PyArrow backend. Patrick, who is a Dask maintainer, also shared benchmark results comparing Dask against Spark, Polars, and DuckDb. As a data scientist who uses PySpark, I found this talk especially relevant and has inspired me to explore Dask further.
- Speed is Not All You Need for Data Processing by Kevin Kho and Han Wang. This talk gave an honest take on benchmarking and highlights the caveats of measuring performance solely based on a package’s execution time. I think this topic is super important and relevant for data scientists, especially now that benchmarking is being widely used to evaluate new data frame packages in the Python ecosystem. Highly enjoyed listening to this one!
- Open space discussions. This was probably my favourite part of the conference. Highly recommend checking it out! There were several rooms dedicated to spontaneous open space discussions where people could chat about any random topic. I joined a few discussions about data frames, AI, and bayesian statistics. It was really nice to hear different opinions and experiences on these topics. For example, in the Bayesian statistics open space, I met a software engineer who codes primarily in Julia - that was definitely unexpected.
- Diversity. PyCon is truly a global event. I met conference attendees from diverse countries including South Korea, Nicaragua, Nigeria, and Poland to name a few. I also met people from diverse backgrounds including data scientists, data engineers, software engineers, developer advocates, and grad students. It was also great to meet people from both small startups and large companies. As a data scientist, I like sharing notes and talking about tech stacks with other data scientists from different companies.
In summary, PyCon US was an amazing experience and I’m excited to attend again next year.