Now that the repository lock is reentrant, it's good to have the
methods take into account direct mode as well. In direct mode,
we explicitly don't want to deal with any kind of locking, because
we accept to manipulate stale data. In order to hide locking code
from the outside and have it transparently managed inside entropy.*
methods, we must respect requests made in direct mode.
The new EntropySQLiteRepository uses ResourceLock, and gains support
for reentrancy, anti-deadlock safety measures (only for nested calls),
unification of memory and file repositories locking code (the semantics
was already the same).
In latency sensitive code paths, the performance penality caused
by file lock contention and memory cache invalidation is too high.
This problem happens in Rigo, which is extremely latency sensitive.
Since we don't want to crap on the user, a way to solve this is
letting API consumers skip the memory cache and read data directly
from the database store. The trade off is that data may be stale,
incomplete, or invalid, but as long as the consumer is aware of this,
that's fine.
Firstly, rwsem is semantically different from flock (but this was known) and
this may confuse the API consumer. Secondly, the locking infrastructure is
purely meant for inter-process synchronization, threads synchronization is
not a current use case.
The old "dependencies" metadata is deprecated. It was found that
the generated metadata might get corrupted by colliding atom strings.
The new implementation avoids collisions completely and is more
efficient.
The original idea was to avoid doing cursor and connection resources
cleanup (left by old and dead threads) synchronously every time
_connection() and/or _cursor() is accessed. This strategy also had
a huge drawback: with no activity on the object, resources were
left hanging there forever.
This commit introduces a better strategy for transparent and automatic
cleanup of resources belonging to terminated threads: every time a new
thread_id arrives at _cursor() or _connection(), a new daemon thread
starts and synchronizes with the caller through a simple Thread.join()
(because it's a daemon thread, we can join() daemon threads as well,
even if this is not really compliant with the specs, but it seems to work
just fine in Python).
When the caller thread is joined, it is possible to start the resources
cleanup procedure, carefully taking into account that thread_ids are
recycled and thus there might be clashing with newly created threads.
This helped a design issue to emerge from the sand (like a zombie
at the seaside): it is impossible to cleanup resources left by the
MainThread because this thread never ends living, and if it dies,
everything dies, obviously. So, the first implementation of this new
strategy was NOT touching the MainThread resources but then, the old
behaviour was to kill them as well on EntropyRepository.close().
So, the final version of this patch kept the old buggy behaviour of
touching MainThread stuff (nein, nein, nein, nein would Hitler say).
However, a new keyword argument "safe" has been added to the close()
method so it is possible to start migrating code to the dark side of the
power.
This means nothing really changed for API consumers yet, just entropy.db.sql
code being more efficient (no weird for loops and synchronous crap)
and actually faster (multi-threading ftw).