The v2 specs must include a basic OTA upgrade system which allows new code to be shipped over the network that node operators can easily upgrade to (optionally automatically). I think that we can implement this as a RM application with a special resource which is (optionally) read by the node after each block.
Basic requirements:
New releases (code directly, or Git commit & remote information) distributed over the P2P network
Optional embedded external identity which is allowed to authorize upgrades.
Automatic recompilation & restart after an upgrade is activated
sandboxing is always impossible. nor do i see why we would want that, unless you mean something different than i think?
hot reloading is possible. we should also have a way to specify a transition function (similar to common lisp update-instance-for-redefined-class) that updates the state of an engine to the format specified by the new code
it would be bad if there were a mix of old and new engines in the same system, so we presumably want the upgrade to be atomic. but, what if the transition function needs to talk to other engines or start new enginesâwhile the update is in-progress?
a moderately annoying point is updating native libraries
it might be a good idea to do the update offline, even though online is possible, as that way we have fewer mechanisms to implement and test (since snapshot/restore within a single version is already something we have to support). we still need a transition function, but perhaps there can be just one that operates on the entire snapshotâthen, since it has access to all the state, the problem i mentioned goes away
it would be cool if we can have transparent interoperability with arbitrary external data sources, including git. presumably thatâs a ways out. in light of which sending the code directly for the time being probably makes more sense
What do you mean by âsandboxingâ? I mean limiting the âhost OSâ scope which the node has access to so e.g. distributed code cannot read /etc/passwd. For example, docker sandboxes - not perfectly, but pretty well, from my understanding.
Yes, agreed.
Iâm not sure how to address this entirely yet. One option could be for the upgrade itself to specify how engines need to be restarted. Perhaps some upgrades only upgrade specific engines - and do so in backwards-compatible ways, from the perspective of an observer of the engine - while others may require a full node restart.
Yes, we should also have the ability to do upgrades offline - and if online upgrading is too hard, we can punt this problem until later, itâs not absolutely critical for v2 - just a nice-to-have.
I donât think it makes sense to claim that any sandbox can be relied on as a primary security measure (especially if that sandboxâs name isnât âv8â or âjscââbut even then). (Fun trivia: a security researcher I know has a cluster of cheap arm boards used for browser tabs. Every new browser tab gets its own completely isolated hardware. This seems like a very good idea and the only reason I havenât done it is itâs a lot of work I havenât gotten around to yet.) The primary security measure has to be that you trust the signature on the code. Running in a sandbox, then, might not be a bad idea as a defense-in-depth measure, but is not necessarily super interesting. (And thatâs ignoring the hypothetical future where all the interesting stuff an attacker might want access to is part of the anoma node anyway.)
One option could be for the upgrade itself to specify how engines need to be restarted. Perhaps some upgrades only upgrade specific engines - and do so in backwards-compatible ways, from the perspective of an observer of the engine - while others may require a full node restart.
Where I suspect this leads: figuring out how to make every upgrade fully on-line; say, if an expensive data format migration is needed, do it incrementally and maintain the data in both the old format and the new until itâs done. Not necessarily a bad idea, but seems like a lot of engineering effort. We should decide if this is something we want to invest in or not, but Iâm not sure if half measures make senseâif weâre ok asking people to tolerate downtime sometimes, then it should probably be ok to ask them to tolerate it every time.