The module entropy_path_loader (used only for running from within the
checkout; otherwise not even installed) is made to provide the _entropy
namespace.
(Other ideas instead of this entropy_path_loader change would be to
reorganise files layout; drop support for running from the checkout as
is - and perhaps require virtualenvs; require sourcing a script that
sets PYTHONPATH. However, as implemented, it is not intrusive, and the
good part is that it is quite isolated, not used in normal usage after
installation. Basically, it only does sys.path + provides _entropy
namespace.)
It looks like Portage now stores unicode paths correctly in its metadata
as opposed to what it used to be. We need to make sure that we parse those
"CONTENTS" file and content metadata in general using the correct encoding.
This will allow us to store and retrieve such metadata from the sqlite3
database correctly and also match the stored paths with the filesystem
paths exactly.
This commit may need a bit more real-life testing. Backward compat
wrt old Entropy and Portage tbz2 files should be as expected.
Unit tests attached.
- runs as non-root
- does not require being in entropy/portage group
- in fact, it's better (better isolation) to run as such
- thus does not modify running system
The wrapper script is ugly but very convenient.
It uses a simple rules without the need to use complex regexes. It is
not as powerful as the existing approaches so both are complementary.
For example, this:
foo/bar (.*)app-crypt/pinentry(.*)\[gtk\] \1app-crypt/pinentry-gtk2\2
expresses the intention that can be expressed simpler:
rewrite foo/bar from-dep=app-crypt/pinentry to-dep=app-crypt/pinentry-gtk2 if-dep-has-use=gtk drop-use=gtk
This makes possible to avoid directory hot spots on repository mirrors.
This commit requires some mileage and real-world testing, but it
seems to be running good on a relatively small repository.
No backward compatibility issues have been reported.
Now that the repository lock is reentrant, it's good to have the
methods take into account direct mode as well. In direct mode,
we explicitly don't want to deal with any kind of locking, because
we accept to manipulate stale data. In order to hide locking code
from the outside and have it transparently managed inside entropy.*
methods, we must respect requests made in direct mode.
The new EntropySQLiteRepository uses ResourceLock, and gains support
for reentrancy, anti-deadlock safety measures (only for nested calls),
unification of memory and file repositories locking code (the semantics
was already the same).
In latency sensitive code paths, the performance penality caused
by file lock contention and memory cache invalidation is too high.
This problem happens in Rigo, which is extremely latency sensitive.
Since we don't want to crap on the user, a way to solve this is
letting API consumers skip the memory cache and read data directly
from the database store. The trade off is that data may be stale,
incomplete, or invalid, but as long as the consumer is aware of this,
that's fine.
Firstly, rwsem is semantically different from flock (but this was known) and
this may confuse the API consumer. Secondly, the locking infrastructure is
purely meant for inter-process synchronization, threads synchronization is
not a current use case.
The old "dependencies" metadata is deprecated. It was found that
the generated metadata might get corrupted by colliding atom strings.
The new implementation avoids collisions completely and is more
efficient.