<![CDATA[James Beveridge - Blog]]>Sat, 12 Mar 2016 06:42:27 -0800Weebly<![CDATA[Pandas & Dependencies]]>Sun, 01 Jun 2014 20:18:08 GMThttp://www.jamesbev.com/blog/pandas-dependenciesPandas, one of the most useful packages in Python for data analysis, just released version 0.14.0 yesterday. This is an update from 0.13.1. Info on the update can be found here

Pandas provides a Dataframe object that will be familiar to R users (you can also think of it as an Excel-style table). All the stuff you'd want to do with an Excel table--aggregate fields, create pivot tables, do some basic stats, add/remove data, create simple visuals--can be accomplished with dataframes, along with SQL-style actions like table joins and querying. Pandas is really, really useful.

Will hopefully do a write-up on the major changes in the latest version once I have this installed--for some reason there appear to be problematic dependencies when I try to pip install though all of the related packages I'm using are sufficiently recent.

Today I learned dependencies can be found within each package's setup.py file as a list of packages associated with the install_requires key.

For example, Pandas's setup.py file is located here: https://github.com/pydata/pandas/blob/master/setup.py. Searching for install_requires yields:
  • python-dateutil >= 2
  • pytz >= 2011k
  • numpy >= 1.6 (subject to change)

All of these packages are up-to-date in my main installation of python 3.3 so I don't really know why this is generating errors--hopefully resolve this shortly.

Regarding dependencies, charting the connections between packages seems to be a bit of a hobby for some folk. Olivier Girardot has created an interactive visualization of the dependencies between packages in this blog post. Based on the picture below you can clearly see... well, you can't really see anything except complexity and inter-dependency. 

As someone new to python I'm still getting used to package management, but the learning curve is pretty light and I have to say it's pretty fun using tools versioned well below 1.0. Version numbers bear little resemblance to the consumer tech world where apps and hardware move quickly to reach the 2.0 threshold, but all the same--it still feels like you're participating in something nascent and evolving.
<![CDATA[Data Science - Datasets and Links]]>Thu, 06 Mar 2014 21:34:34 GMThttp://www.jamesbev.com/blog/data-science-datasets-and-linksJust updated pages on datasets and useful links regarding data science.  Check them out!]]><![CDATA[First Contact]]>Thu, 06 Mar 2014 19:21:47 GMThttp://www.jamesbev.com/blog/first-contactHi, I'm James Beveridge. This is my site. I'm an analytics guy who has primarily worked for the past 8 years on financial modeling, forecasting, customer segmentation, customer lifetime value, ROI and channel interaction for major online advertisers.

My background is math and improv.  I see permeable borders between virtually all disciplines and am constantly trying to delve into new areas (albeit with highly varying success).  I've been increasingly interested in machine learning and data science over the past 3 years and am currently tackling the amorphous field from a few different angles--self-education in Python, online and offline courses, brushing up on my linear algebra (eigenvalues!  orthogonality!!) and statistics, attending hackathons and nurturing a couple pet projects. I'll be using this site to share the stuff I'm learning, provide some hopefully useful content/direction, and complain about any challenges I'm having.

I also like to travel and just returned from a 3 month trip around SE Asia with my wife. This is not a normal thing for us--like many things people do, we thought and talked about it for years until all of a sudden the opportunity presented itself to just do it. We had a travel blog if you'd like to check it out. I'll hopefully be sharing some content related to travel here too.