Publisher application programming interfaces
The major social-media publishers have APIs that third parties can write to in order to enable collection directly from the publisher. By writing to an API, it is possible to capture all of the data and metadata that the publisher makes available -- for example, a Facebook page -- and then map that data back into a preservation repository. A major consideration with the API method is bandwidth; social-media sites create massive volumes of content. In 2011, content aggregator Gnip estimated that Twitter created 35MB per second of sustained network traffic. That is a lot of content to ingest. It is wise to use third-party applications that connect to the social-media publishers directly.
There are many ways to execute an API collection approach. Many third-party vendors build connectors to social-media publishers and then provide applications that allow customers to collect and preserve as needed. One approach is to have employees authorize the enablement of an application that sits on the social-media site and have that application monitor and collect all information. This can be done automatically at the company firewall and gives the company an opportunity to restate policies and capture login information with informed consent of users. This practice has user privacy implications that should be carefully evaluated by counsel, especially for global corporations with users/customers located in foreign countries with strong privacy protections.
In the context of collecting and preserving social media, a proxy approach is one where a company requires employees to interface with social media through a proxy server so that interactions can be monitored and captured.
The most comprehensive approach to social-media collection and preservation would combine the API and proxy methods. Doing so would ensure complete capture of all of a user's social-media content. But this approach is probably overkill for any but the most highly regulated organizations (and even then, it will only be a small subset of employees in regulated companies that need to be monitored so closely).
It is worth noting that social-media publishers make collection possible, for example with Twitter's "public follow" and Facebook's "download your information." But these methods have limitations and aren't suitable for many cases. Twitter's public-follow feature enables access to all the past Tweets of a specified user and any new Tweets in real time without generating a formal "follow" request, but with a limit of 3,200 past Tweets . And the feature works only if the user allows Tweets to be public.
Developing clear policies on social media
Another lesson from the past: When email archiving first started, companies archived all emails, typically through journaling. That led to bloated archives that broke down and became more expensive than they were worth.
Sign up for CIO Asia eNewsletters.