Rewrite the serialization code of the I/O events coming in when
the Installed Packages Repository is modified (at filesystem level)
to better deal with bursts of events.
The new code uses a "baton" Semaphore as mutex that can be passed
through threads. The MainThread event handler function tries to
acquire the Semaphore in NB mode, if it does, it spawns a thread
that executes all the operations (acquire locks in blocking mode,
calculate updates, etc) and releases the Semaphore once done.
Olympic win!
The original idea was to avoid doing cursor and connection resources
cleanup (left by old and dead threads) synchronously every time
_connection() and/or _cursor() is accessed. This strategy also had
a huge drawback: with no activity on the object, resources were
left hanging there forever.
This commit introduces a better strategy for transparent and automatic
cleanup of resources belonging to terminated threads: every time a new
thread_id arrives at _cursor() or _connection(), a new daemon thread
starts and synchronizes with the caller through a simple Thread.join()
(because it's a daemon thread, we can join() daemon threads as well,
even if this is not really compliant with the specs, but it seems to work
just fine in Python).
When the caller thread is joined, it is possible to start the resources
cleanup procedure, carefully taking into account that thread_ids are
recycled and thus there might be clashing with newly created threads.
This helped a design issue to emerge from the sand (like a zombie
at the seaside): it is impossible to cleanup resources left by the
MainThread because this thread never ends living, and if it dies,
everything dies, obviously. So, the first implementation of this new
strategy was NOT touching the MainThread resources but then, the old
behaviour was to kill them as well on EntropyRepository.close().
So, the final version of this patch kept the old buggy behaviour of
touching MainThread stuff (nein, nein, nein, nein would Hitler say).
However, a new keyword argument "safe" has been added to the close()
method so it is possible to start migrating code to the dark side of the
power.
This means nothing really changed for API consumers yet, just entropy.db.sql
code being more efficient (no weird for loops and synchronous crap)
and actually faster (multi-threading ftw).
As explained in the code comments, this is mandatory for scenarios
in where the iterator has to run multiple times because transactions
can be rolled back and replayed indefinitely.
The licensename column is declared as UNIQUE, multiple threads inserting
rows can cause unique constraint violations. Considering the nature of
the data, using "INSERT OR REPLACE" can be considered safe and actually
wanted.
This issue caused a load of issues with the ca-certificates.
Example of partial readline():
0|obj|/usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_F\xc3\x85
and the next call:
\xc2\x91tan\xc3\x83\xc2\xbas\xc3\x83\xc2\xadtv\xc3\x83\xc2\xa1ny.crt\n
Trying to workaround it by reading ahead if line does not end with \n
Wrap the Cursor object around and execute every method through
a proxy function that catches adapter-specific exceptions and
translates them into entropy.db.exceptions ones. This way Entropy
is eventually sqlite3 agnostic and adapters for several storage
engines can be written without affecting the rest of the codebase.
entropy.db.mysql and entropy.db.sqlite are now subclassing
EntropySQLRepository. Methods will be moved there during the
next forthcoming overhaul. Implement ModuleProxy support and
alleviate the exception class objects issue (sqlite3 based
exceptions are thrown by entropy.db.sqlite and oursql based
exceptions are thrown by entropydb.mysql, and there is no
easy/quick fix for this apparently, besides wrapping all the
cursor calls).