Pandas, one of the most useful packages in Python for data analysis, just released version 0.14.0 yesterday. This is an update from 0.13.1. Info on the update can be found here.
Pandas provides a Dataframe object that will be familiar to R users (you can also think of it as an Excel-style table). All the stuff you'd want to do with an Excel table--aggregate fields, create pivot tables, do some basic stats, add/remove data, create simple visuals--can be accomplished with dataframes, along with SQL-style actions like table joins and querying. Pandas is really, really useful.
Will hopefully do a write-up on the major changes in the latest version once I have this installed--for some reason there appear to be problematic dependencies when I try to pip install though all of the related packages I'm using are sufficiently recent.
Today I learned dependencies can be found within each package's setup.py file as a list of packages associated with the install_requires key.
For example, Pandas's setup.py file is located here: https://github.com/pydata/pandas/blob/master/setup.py. Searching for install_requires yields:
All of these packages are up-to-date in my main installation of python 3.3 so I don't really know why this is generating errors--hopefully resolve this shortly.
Regarding dependencies, charting the connections between packages seems to be a bit of a hobby for some folk. Olivier Girardot has created an interactive visualization of the dependencies between packages in this blog post. Based on the picture below you can clearly see... well, you can't really see anything except complexity and inter-dependency.
As someone new to python I'm still getting used to package management, but the learning curve is pretty light and I have to say it's pretty fun using tools versioned well below 1.0. Version numbers bear little resemblance to the consumer tech world where apps and hardware move quickly to reach the 2.0 threshold, but all the same--it still feels like you're participating in something nascent and evolving.
Pandas provides a Dataframe object that will be familiar to R users (you can also think of it as an Excel-style table). All the stuff you'd want to do with an Excel table--aggregate fields, create pivot tables, do some basic stats, add/remove data, create simple visuals--can be accomplished with dataframes, along with SQL-style actions like table joins and querying. Pandas is really, really useful.
Will hopefully do a write-up on the major changes in the latest version once I have this installed--for some reason there appear to be problematic dependencies when I try to pip install though all of the related packages I'm using are sufficiently recent.
Today I learned dependencies can be found within each package's setup.py file as a list of packages associated with the install_requires key.
For example, Pandas's setup.py file is located here: https://github.com/pydata/pandas/blob/master/setup.py. Searching for install_requires yields:
- python-dateutil >= 2
- pytz >= 2011k
- numpy >= 1.6 (subject to change)
All of these packages are up-to-date in my main installation of python 3.3 so I don't really know why this is generating errors--hopefully resolve this shortly.
Regarding dependencies, charting the connections between packages seems to be a bit of a hobby for some folk. Olivier Girardot has created an interactive visualization of the dependencies between packages in this blog post. Based on the picture below you can clearly see... well, you can't really see anything except complexity and inter-dependency.
As someone new to python I'm still getting used to package management, but the learning curve is pretty light and I have to say it's pretty fun using tools versioned well below 1.0. Version numbers bear little resemblance to the consumer tech world where apps and hardware move quickly to reach the 2.0 threshold, but all the same--it still feels like you're participating in something nascent and evolving.