Washburn's World

My take on the world. My wife often refers to this as the WWW (Weird World of Washburn)

My Photo
Location: Germantown, Wisconsin, United States

I am a simple country boy transplanted from the Piehl Township in northern Wisconsin to the Milwaukee metropolitan area who came down "sout" in 1980 for college and have stayed in the area since.
If this blog is something you wish to support, consider a donation.

Wednesday, November 28, 2007

Texas Email Foot Print

My recent scuffling with the Governor of Texas has gotten me to thinking about email retention. I am looking for a long term home for the emails I expect to receive, so the natural question is how much data will that be?

What are the storage requirements in order to store one year’s worth of email generated by the Governor's office?

There are 300 people on staff with the Governor's office. According to email research from 2000 the average worker gets 40 email messages in his office every day and the size of an email averages 18.5 kilobytes

This seems low to me and the research is old, so I increased both size and volume 5-fold. This yields:
    300 people getting 200 email message each day where each email is 100K in size.
The total storage foot print for email from the whole whole office is then 6 gigabytes per day. Assuming the email traffic on Saturday and Sunday is the same as any other day, one year's worth of traffic is 2.2 terabytes.

I will splurge and go for 2.5 terabytes.

This will cost me $1,500 for an off the shelf USB compliant unit.

I seems to me that it would be cheaper for the Governor Perry to spend $2,000 to $4,000 for proper archive storage than to delete emails after only 7 days because there is too little space as the Governor's spokes man has stated here.

Since, I may have to buy this unit, a donation would be appreciated.


Anonymous Anonymous said...

Just like most stuff you do; short sided, out of touch with reality, old data, lazy research, and finding old data that fits your arguments. Come on John ... figures from a 2000 study from Berkley no less.
With a little more effort you could come up with better data. It took me 30 seconds to find several examples of the data storage crisis we are and will all be dealing with in the near future.
I argee with your open records demands, but why be a @#$% head about it. Offer some solutions to the mandate, and no, more storeage capibiltiy is not going to cut it. Now for your reading pleasure, some exerts for those who do not wish to go further into cyberspace .... http://www.networkworld.com/news/2007/030707-study-world-needs-more-data.html
"The amount of data that is created globally is set to increase to 988 exabytes (that's 988 billion gigabytes) by 2010 while the capacity of storage systems is predicted to be just 600 exabytes"
"We've already seen a massive increase in the amount of data stored: IDC estimates that the amount of data held in the world grew from 5 exabytes in 2003 to 161 exabytes in 2006 -- the equivalent of 12 stacks of books, each extending more than 93 million miles from the earth to the sun. The research company estimates that in 2007, for the first time, we will see the amount of data created exceed the storage capacity available."
IDC predicts that by 2010, while nearly 70% of the digital universe will be created by individuals, businesses of all sizes, agencies, governments and associations will be responsible for the security, privacy, reliability and compliance of at least 85% of that same digital universe. In 2006, just the e-mail traffic from one person to another (excluding spam) accounted for 6 exabytes (or 3%) of the world's data.
"The incredible growth and sheer amount of the different types of information being generated from so many different places represents more than just a worldwide information explosion of unprecedented scale," said John Gantz, chief research Officer and senior vice president, IDC. "From a technology perspective, organizations will need to employ ever-more sophisticated techniques to transport, store, secure and replicate the additional information that is being generated every day."
A finally ....
"This massive increase in data will cause further headaches for sysadmins as they struggle to get to grips with VoIP, greater demands on compliance, the growing use of video, the ever-increasing dependence on e-mail and the rise in surveillance systems."
Come on John, real solutions not just more frivolous emails adding to the problem!!!

Thu Nov 29, 01:33:00 PM CST  
Anonymous Anonymous said...

Sorry John, I'm finished wasting my time with your nonsense. You are giving us serious folks looking to improve the system a bad name. You started out good but have lost focus. Another good soldier gone bad :-(

Fri Nov 30, 12:04:00 AM CST  
Anonymous Dave H. said...

As Mr. Anonymous said, data creation is growing quickly, but, unlike him, I do not consider this to be a crisis in the making. And similarly, John, I think you are taking an unnecessarily expensive approach. First, I do believe you are overestimating the data volume significantly, probably by a factor of 20 (based on my own average received email message size.) Secondly, is there really a need to keep all this data available "live" online? Could it not be stored on a more archival type of media, such as DVD? A double layer data DVD holds about 8.5 Gb, which should easily retain any given day's worth of information. (Or, by my own estimate, as much as a month's worth.) You could even keep the most recent 100 day's worth online, then transition to offline storage as scheduled. By my estimate, you should be able to get away with an annual expense in the range of $400 even if you use a disk per day. Or, if you simply archive information as space requirements dictate, I would guess your yearly expenses might range around $25 for the media. Perhaps you can develop a more accurate estimate based on the data volume of the first response you receive.
All of the above set aside, I can forsee an alarmingly large cost involved in the data transfer if you are billed hourly for staff costs to collect the information. Is there any way to avoid that?

Mon Dec 03, 11:51:00 AM CST  
Anonymous Anonymous said...

Storing for storing sake will not be an option in the near future.

Mon Dec 03, 02:34:00 PM CST  
Blogger Dingo said...


Your post is neither nonsense nor frivolous, as your "anonymous" commentors argue.

They also believe that: a) your numbers are suspect, and b) storage capacity is a significant, perhaps insurmountable hurdle. The first point is irrelevant and the second is debatable.

Clearly, Governor Perry's emails SHOULD be kept for more than seven days, because emails are public records and might also have historical value. We're not talking about "storing for storing sake."

How long should his emails be kept? How should all this data should be stored?

As Republican State Senator Jeff Wentworth (San Antonio) said:

"It sounds like something that we oughta have hearings about, what the practice is government-wide... Arguments for keeping the emails for a certain period of time and arguments against it, for whatever reason."

Tue Dec 04, 08:24:00 PM CST  
Anonymous A. S. said...

This little calculation of what storage space costs and a basic misunderstanding of how enterprise applications are implemented dramatcially oversimplify your evaluation of the situation.

While I wholeheartedly agree that accountability is very, very, very important, please consider that the ability to accocunt for every little email, document, snippet, costs the taxpayers money. Take your calculation and extrapolate it out to lets say...a school district with 10,000 employees and 70,000+ children and more than just e-mail that requires retention.

For grins, please locate and read Local Schedule SD - Records of Public School Districts for a better education on what items need to be reatained. This applies to all State and Local Agencies.

The amount of data that a typical state or local agency moves on a given day would cause most people to have a meltdown. At peak my organization move nearly 30 gigabytes per second just back and forth to the internet. Logging those transactions (required by law under CIPA) alone requires a huge amount of storage space.
Internal data volumes for production ERP and Student Systems are far higher.

Who defines the increments at which revisions get tracked? How granular must things be to satisfy your curiosity? Backing up data is a great thing but are you interested in a snapshot in time? Or do we retain every message and every revision to every document that is made? Each edit along the way?

Your use of a USB drive for retention has the hallmark of a novice discussing things technical. Generally speaking enterprise users build systems for redundancy and fault tolerance. Using a single drive would be like driving around in a thorny wood with no spare. Tape backup you say? With the amount of data to archive these days, it has become impossible to schedule a backup window large enough to back all of this constantly changing data up.

This means lots and lots and lots of drives, using RAID in order to meet the requirements for retention. Ever consider what would happen if we didn't? What? You lost little Johnny's grades for the first three years of school? The state requires us to keep studeng grading data in perpituity.

We now move a step further down the road to disaster recovery. We have a ton of information backed up, but how do we get it back if a Hurricane hits? (never happened in the State of Texas). We like other organzations in tornado prone areas like to have offsite backups. Ours in the form of a disaster recovery site. Stop being able to pay 10,000 teachers and watch what happens to educational committment.

So what do we have now?

1. Multiple Systems with lots of data requiring long to near-eternal retention.

2. A rapidly changing technological environment. How do I keep a tape drive from 1988 able to retreive that format of data? How much will it cost to convert that data.

3. A method for retreiving useful information from it.

4. People who are already overloaded doing useful work, being tasked to retreive data so bloggers can have something to do, or to pursue personal crusades, whose time is better used providing resources to educate children.

5. Need for technology that is fault tolerant but that is no where near being "affordable"

6. Need for disaster recovery (which is nothing short of duplicate systems

7. The cost to purchase this equipment

8. The salaries and benefits for competent employees to maintain it

9. The cost for yearly hardware and software licensing and upkeep.

10. The cost in people hours to install, test, convert users and systems to this.

Now here is the rub...

Its not buying it that kills you, its keeping it running that really costs.

Taxpayers do not want to pay for this. I can't say I blame them as I am one of them. Most of them believe like you do that the 1.5 TB drive from Fry's should be enough. They think that the $300.00 Dell Weekly Desktop Special is fine desktop unit (maintenance? TCO? Anti-Virus? Productivity Software? Client Costs for anything Microsoft can nickel and dime a person for), but they do want to be assured that can see an email sent to someone in 1999 when they get in a snit. Knowing what it cost you is nothing. You may be getting charged for this, but it doesn't begin to offset the real cost to the public to sate your curiosity. You are not the only one with this "need".

While saying all of this I have watched one of my neighboring sister school districts go down in flames and rightly so for things that they have done that I as a public employee find repugnant, inexcusable and quite frankly unconcionable. That aside, it still costs someone, somewhere money to do these things. Part of this burden is the maintenance of public trust. The other portion of it is a "pain in the ass" that takes money away from organizations that I once, as an outsider considered bloated, but have come to understand are simply pressed to use money for educating children instead for maintaining the huge barrage of federal and state required, and invariably unfunded mandates.

As you live in Wisconsin, I would ask that you be somewhat more considerate of the taxpayers of the State of Texas or any other state for that matter, and stop trying to spend our money. Better yet help me pay my 6k property tax bill each year. Your 600+ dollars is a tenth of it.

I know my rant is about Public Schools and not State Goverment, but what impacts one impacts us all. I am no fan of Rick Perry either. By no means am I defending him, but it bothers me to let simplistic generalizations go without at least some attempt to put facts to the matter.

On a personal note, I am with you on Rick Perry retaining all emails. I operate with the assumption that any item capable of recording information on a daily basis that belongs to the District is subject to the scrutiny of the public and I make my staff and everyone that uses our systems aware of that fact. Full disclosure is the cornerstone of right action. Anyone that has anything to hide has no place in public service. Perhaps I am overly deontological about this whole thing, but balance is the keyword. I have a duty to the taxpayers of the state to ethical behavior. I expect as a taxpayer in the state that other public officials are of a like mind. But I also know that in the real world that the higher the resolution or the wider the space, the greater the cost.

I am only willing to pay for so much.

Tue Nov 11, 07:06:00 PM CST  

Post a Comment

<< Home