It's been over a month since I wrote this article which gave a little bit of detail about how things work on my end. Things can really change in a month.. Frankly, I've lost track of all the new stuff since that article, so I'm just going to do another overview focusing more on the technical details about what I do here...
About The Data:
RonPaulGraphs.com now has data from 14 unique web sites (not including several sources of lookup/reference data) my data directory contains 33 "feeds" created from those sources. This is over 51 megabytes of data and growing each day.
Each source of data is collected on a schedule that I feel makes the most sense given the frequency of updates. Intrade, Youtube, Eventful data and a couple others are only updated once per day, so I only grab them once per day. Ron Paul and Huckabee data tend to change in interesting ways quite frequently, so I update them every couple minutes. While other data like RonPaulBlimp.com, RonPaulForums.com, teaparty07.com, and a few others change unpredictably or subtly enough that an update every hour or three is sufficient. November 5th, and November 11th no longer need updating.. obviously. The final category of data is the stuff I update manually when it changes, such as data about my personal donations, Primary Data, chipins/PACs, and several lookup sources like census population data used for my per capita data. I can adjust each of the schedules as needed which I will occasionally do for the Ron Paul financial/donor data on heavy donation days in an attempt to get more donor names/locations per hour.
Some of the data comes to me in easily digestible CSV format, some of it I have to extract from HTML pages or XML. In most cases I simply parse the data and toss it into a set of fairly consistent CSV files for graphing and mapping. For some of the bigger data feeds I keep the stuff I need to work with in a database for performance reasons as well as a way to offload some complexity to the database queries. The reason all the data is not in the database is mostly due to the organic evolution of the software and because it's just so darn easy to create and read CSV files... and sometimes that's just all you need.
The Code
The software which generates the graphs and html pages you see when you visit RonPaulGraphs is all custom code written on with the Java programming language and a number of free and/or open source libraries. Some of the prominent libraries and technologies include jquery, jfreechart, velocity and a bunch of stuff from Apache Commons.
The source code and data live in my personal SVN repository, which allows me to easily update my local environment to look like my server and visa-versa.
There are around 700 files and 70 megabytes that make up all the necessary bits to create the site you see.
The Process
The data fetching is done by its own processes, each on their own schedule, as mentioned above.
The html and graph generation, and site updates are done every 5 minutes by another process. When this process runs it gets the latest code and data, then runs through the entire site generation, this takes about 2 minutes.
When I need to make a change (add a new graph, chipin, link, page, whatever) I make it on my MacBook Pro, test it, and check it into the afore mentioned SVN repository, and my server takes over from there. The server running all this stuff is a Linux box hosted by godaddy.com... I'm also the system admin... my least favorite part.
The Site
The site you see is entirely disconnected from the data, it is merely a static result of the process described above. The customization options on the front page and all the maps are features that run on your browser based on the javascript and html that I generate. The information you customize is stored only in your cookies and I have no way of knowing how those features are really being used...
When I made my first blog post I was pretty excited to have just shy of 2000 visitors in a day... Well, I probably don't have to tell you that has gone up :) 60 blog posts and over 40 days later I get 4 times that many on a "bad day" and on my best day (November 5th) I had over 144,000 visitors in a single day!
Hundreds of sites link to RonPaulGraphs and hundreds more have embedded images and/or use "my" figures for reference.
I hear from dozens (hundreds?) of people every week with feedback, questions, suggestions and complaints.
Me
I don't even want to tell you how many hours I have into this project (I could only guess anyway)... but needless to say, it's more than a hobby at this point. It's still fun for me and has really been a fantastic experience, and I look forward to every email and can't wait to check out the numbers when I wake up every morning.
I have learned technologies and techniques that I may never have had the incentive to use in my day job. Solved problems I may never have encountered. Learned from people I may never have met.
I've talked to people in the campaign and have been 1 degree of separation from a national political figure that I really admire. I think I've had a positive impact on this campaign, and like many of you, I feel like we're writing at least a page or two of political history, and with any luck, a whole book.
For my first attempt at getting involved in the world of politics, this has been tremendously rewarding and has certainly changed my life.
Thanks to everyone of you who have been so supportive. Some of the best ideas have come from the community and I do my best to implement them as they come in. Keep up the great work!
dan