EnableCloudStorageForAttachments

Extension of Attachment store to support Cloud Storage.

Inspired by WillNorris new S3LinkPlugin, and the DBIStoreContrib listeners for Search & Query optimization

Instead of just adding explicit S3 (or other Cloud storage) links to topics, could we extend Store to better use Cloud Storage.   This might also address CDN type applications in the core.  As with the DBIStoreContrib, this proposal is to augment attachment storage with a Cloud offload in parallel to the local file system.
  • The "Listeners" interface would need to be extended to support Listeners in the attach operations of Store.  The only event currently defined is update, which is triggered by the following actions:
    • moveTopic
    • moveWeb
    • saveTopic
    • repRev
    • delRev
    • remove
  • Add to the "Listeners interface an attach event triggered by the following actions:
    • saveAttachment - Trigger transfer of the attachment into the remote storage.
    • copyAttachment - Same as save for the target topic.
    • moveAttachment -
    • remove - tell listeners that an attachment has been removed, vs. the entire topic, which triggers an update event.
  • Attachment URL rendering needs to be extended to be pluggable, so that the Cloud or CDN URL can be substituted when available.  This could be hooked into a couple of locations. the ATTACHURL / ATTACHURLPATH,  or the Foswiki::getPubUrl function. 

Questions / Topics for discussion

  • How is a unique remote storage key generated from Web/Topic/Attachment
    • May depend upon capabilities of the cloud store.
    • md5sum the file? (avoids duplication and issues with moving the file between attachments, etc).

  • Can the status & key of the cloud storage copy be saved into the Topic / Attachment metadata by the "Listener"?
    • Probably not a good place - Don't want to create extra revisions when updating the storage status

  • Is a local mapping of Web/Topic/Attachment -> Cloud/Location with a Stale/Current flag appropriate? Stored in Working?
    • working/work-areas/cloudstore/web-topic/storage-map stored a file per web/topic

  • Can the synchronization into the remote storage be deferred. (Complicates determination of which URL to generate for the attachment)
    • This is preferable. Need to handle all this anyway otherwise a cloud outage is a big issue.

  • How to hook for the for pub and attach URLs.
    • Add a new Plugin handler - renderAttachURLHandler
    • Provides an alternate location, it will be used instead of the default location, when provided. Returns null to default to the local storage
    • Possibly overrides 4 macros - VarATTACHURL VarATTACHURLPATH, VarPUBURL, VarPUBURLPATH
      • PUB* versions overridden only if the remote storage key is in the form of [some prefix]/Web/Topic/Attachment
      • *PATH versions overridden to be absolute links. (consistent with current implementation - relative links are optional, decided by core).
    • Honor the static context - provide only local locations for access by publishing tools like GenPDF (See AddStaticContext)

  • How does the ATTACHURL decide to render the local copy or the CDN copy? Test for availability? Reference metadata? "Assume it's there?"

  • If hooked into getPubUrl, do we need Func API extensions to request only the Local URL vs. the cloud URL?

  • Does the cloud storage hold old revisions or only the most current?

  • Security implications.

  • What should be configurable in the configure interface
    • Storage access passwords, etc.
    • Controls for which web's are eligible for remote storage
    • Alternate location per web or topic regex (Use common CDN's for JQuery?)

  • How to handle unavailability of the Cloud storage
    • Deferred updates when back online
    • Fallback to local copy in ATTACHURL
    • Mark CDN copy as stale in metadata during save?

  • Considerations for remote unavailability. For example if the storage is reachable by the server but not reachable by the client. Add a url param to force all links to be local? Make it a configurable use preference setting?

-- GeorgeClark - 05 Dec 2010

Discussion

  • This could be implemented either using the new Listener interface, or using various before/after Plugin Handlers Listeners seem to be a better way to go, even if it delays release.

  • Also the Pub URL rewrite could be done in one of the Page handlers - but adding a hook to the pubUrl function seems better to me than scraping the URLs' out of the pages.

-- GeorgeClark - 06 Dec 2010

So basically, you're proposing a modernised DistributedServersPlugin (as it does the re-write, and can use round robin url's to distribute the load (allowing more simultaneousness requests on the browser side).

excellent! smile

-- SvenDowideit - 07 Dec 2010

Looks like a great idea to me. And it would be nice if this could go somewhere, I mean somewhere else than where DistributedServersPlugin went smile

-- OlivierRaginel - 07 Dec 2010

From IRC, SvenDowideit suggested that instead of proposing a hook for the current ATTACH and PUB macros, the initial implementation should use the existing handlers, and that the Feature proposal DeprecateContextlessURLConstructs would be the place to handle this longer term.

-- GeorgeClark - 07 Dec 2010

One more big issue will be extensions that bypass the Foswiki API for storage of attachments. The ImageGallery thumbnails for example will be missed. Some further discussion from #foswiki:

(12:29:02 PM) gac410: hey pharvey - regarding our discussion last night about attachments and cloud storage,  ImageGalleryPlugin is an issue.  Since all the thumbnails and reduced format images are written directly to a pub directory, they would all be missed  :-(
(12:29:47 PM) gac410: And they are outside of the pub/Web/Topic hierarchy so even an auto-attach type solution would fail.
(12:30:15 PM) MichaelDaum: thats for a specific reason
(12:30:51 PM) MichaelDaum: theres no working area like url + directory that is served by the http server other than somewhere under /pub
(12:31:24 PM) gac410: I understand - but if they are not attachments,  then Store Listeners can't stage them into a Cloud store.  
(12:31:24 PM) MichaelDaum: and polluting the attachments area is bad as well...as done by the ImagePlugin for instance...
(12:31:49 PM) MichaelDaum: which of course leads to another problem: have a central place for ImageGalleryP and ImageP to store their thumbnails
(12:31:58 PM) gac410: Managed by Store!
(12:32:47 PM) MichaelDaum: for now the Store is only taking care of in-band data vs out-of-band data like these thumbnails
(12:34:28 PM) MichaelDaum: same holds for any other rendition of attachments being cached somewhere
(12:34:50 PM) gac410: I know that - hence the issues.    My motivation on the Cloud storage was my uplink bandwidth being consumed by images.   If I can't offload the galleries,  then i don't get anywhere near as much benefit.
(12:34:51 PM) MichaelDaum: as I noted on the %URL{}% feature proposal  ... what was the name of the topic.
(12:35:14 PM) MichaelDaum: gac410, yap
(12:35:14 PM) gac410: Yes -I've been following that discussion too.
(12:35:28 PM) pharvey: Development.DeprecateContextlessURLConstructs
(12:35:40 PM) ***MichaelDaum waiting for foswikibot to pickitup
(12:35:44 PM) gac410: Foswiki:Development.DeprecateContextlessURLConstructs
(12:35:59 PM) pharvey: FoswikiBot: Development.DeprecateContextlessURLConstructs
(12:36:10 PM) pharvey: it's asleep :)
(12:36:14 PM) pharvey: and I should be too.
(12:36:15 PM) MichaelDaum: good idea
(12:36:16 PM) pharvey: g'night
(12:36:20 PM) MichaelDaum: nite
(12:36:23 PM) pharvey left the room (quit: Quit: ChatZilla 0.9.87 [Iceweasel 4.0.1/20110430090312]).
(12:36:23 PM) gac410: It's not sighned in.    nite
(12:36:51 PM) gac410: Oh - no there it is - must be sleeping then.   
(12:37:00 PM) MichaelDaum: it would really be cool to manage some distant store via Foswiki::Store ...
(12:37:18 PM) gac410: Yeah -  That's sort of what I'm ruminating about.
(12:37:39 PM) gac410: http://foswiki.org/Development/EnableCloudStorageForAttachments
(12:37:40 PM) FoswikiBot: [ EnableCloudStorageForAttachments < Development < Foswiki ]
(12:37:43 PM) MichaelDaum: I'd really be interested to combine this with my recent cmis work
(12:38:50 PM) gac410: My thought are local store is the "master",   Listeners post to a queue for attachments,  and then a backend syncs them to store and maintains some index for %URL [[ur]] and other mechanisms to modify.
(12:40:09 PM) gac410: Preferably making the backend pluggable,   so AttachCloudStore::AmazonS3    or AttachCloudStore::WhaeverStore ... would implement the synchronization for AttachCloudStore.pm
(12:41:25 PM) gac410: Hopefully that way foswiki would remain fully available in the event the cloud becomes unreachable.  And with limited uplink bandwidth,  the synchronization can run independent of the web serving.
(12:43:27 PM) gac410: Interesting bot - can map from url back to breadcrumbs,  but not the useful direction of interwiki link > real url
(12:44:28 PM) MichaelDaum: hm the delay since the sync completed might be a problem.
(12:45:01 PM) gac410: Well - the URL munger has to detect if sync completed.   If not -delivers the master pub/  url.  
(12:45:01 PM) MichaelDaum: if I understand you correctly, the same %URL expression would change its value once the data it points to has been synced over to the cloud
(12:45:05 PM) gac410: Yes
(12:45:17 PM) MichaelDaum: that might need to interact with the PageCache
(12:45:48 PM) MichaelDaum: the page cached earlier still has got the old url in it.
(12:45:58 PM) gac410: ugh - yes indeed.    Also I'm not the trusting sort,  so if the cloud becomes "unavailable", then urls should fall back to the master version as well.
(12:46:00 PM) MichaelDaum: it has to be recomputed once the sync is finished
(12:46:15 PM) MichaelDaum: ya.
(12:46:27 PM) MichaelDaum: cool idea this all actually
(12:47:09 PM) MichaelDaum: though best would be to upload directly to the cloud
(12:47:14 PM) MichaelDaum: wouldnt it
(12:47:34 PM) gac410: So I figured %URL needs to either directly or async needs to "ping"  - hey cloud are you available.    Also add a urlparam to force disable cloud in the event the issue is reachabilite from the client vs. the server.
(12:47:34 PM) MichaelDaum: otherwise you count the same bandwidth twice at least
(12:47:45 PM) MichaelDaum: 1. browser -> foswiki, 2. foswiki -> cloud
(12:47:56 PM) MichaelDaum: better would be browser->cloud
(12:48:17 PM) MichaelDaum: so foswiki has more the role of instrumenting the browser to upload to the right location
(12:48:49 PM) gac410: Yes - there is a double hit on save -    but then if the cloud is unavailable  like some recent outages - then the attachments are possibly lost.    
(12:48:52 PM) MichaelDaum: different story for content that materializes on the foswiki server directly
(12:49:21 PM) MichaelDaum: wheres my data if the cloud goes puff anyway ;)
(12:49:33 PM) gac410: Assuming that there is a very high ratio of read to save,  double hit during save is not that bad.
(12:50:42 PM) gac410: If cloud goes puff,  server should be totally reliable serving directly.   Actually it would be nice if the client location was also taken into account -  so when I'm local to the server network(s),  I never read from the cloud.
(12:50:50 PM) MichaelDaum: interesting also: just push some specific renditions of a video that just has been saved to the cloud....or even multiple formats of it
(12:51:15 PM) gac410: Yes - that would be good too.   Keep a master in one format and push the alternates.
(12:52:31 PM) MichaelDaum: hard to say which line is fatter: the one from the browser to foswiki or to the cloud server
(12:52:51 PM) gac410: I'm still old school - very distrusting of cloud.   I want to take my hard drive out and put it in another server and have a 100% complete store.     But cloud has huge benefits for acceleration and distribution of content closer to the clients.
(12:53:50 PM) MichaelDaum: most of my clients wont use a cloud feature. thats corporate intranets with very restrictive access rights.
(12:54:11 PM) MichaelDaum: more interesting are public servers with some cheap s3 for screencasts and stuff. that'd be noice.
(12:55:05 PM) gac410: Well... if the AttachCloudStore  is pluggable,  then an enterprise version could distribute the large files internal to the intranet but closer to users.  
(12:55:06 PM) MichaelDaum: this however can be dealt with totally without the foswiki store interacting with it
(12:55:29 PM) MichaelDaum: like: just have your posting embed the player pointing to content from somewhere else.
(12:55:38 PM) gac410: Yes.   
(12:56:33 PM) MichaelDaum: distributing large files ... thats what enterprise document management servers are good in. better interact with these products and offload this job to them.
(12:57:19 PM) MichaelDaum: not that I want to neglect the use of a more tightly integrated cloud store in foswiki. but the reality check is rather important.
(12:58:00 PM) gac410: AttachCloudStore::scp    could do a copy of attachments "across the pond"      Good point about edm servers -  though I'm not familiar with them.     Could the interface to an EDM be one of the pluggable attachment stores?
(12:58:17 PM) MichaelDaum: yes definitely
(12:58:42 PM) MichaelDaum: thats what I was aiming at with CmisPlugin
(01:00:19 PM) gac410: I don't want to get in the way of other uses - but my priority right now is a buried dsl uplink when Bots or users start downloading big images,  plus the javascript, css, etc.  all of which are growing.

-- GeorgeClark - 13 Jun 2011

ImageGalleryPlugin as well as ImagePlugin don't bypass the Store API. There simply isn't an API for their requirement part of the store.

-- MichaelDaum - 14 Jun 2011

I have to tell you, looking at the VC store code, its already possible to implement a unique storage key for we/topic/attachment, and its already possible to have each file in a different location. with the non-VC stores its actually a little harder, but the design lets you do it.

-- SvenDowideit - 01 Nov 2012
 

BasicForm edit

TopicClassification BrainStorming
TopicSummary Extend the Store and Attach rendering to support Cloud based storage
InterestedParties GeorgeClark, WillNorris
Topic revision: r10 - 01 Nov 2012, SvenDowideit
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy