This website was mostly translated automatically from the
German version. We
apologize for any translation errors.
x
Paper-based contact tracing
Many contact tracing systems, such as Luca or Recover, rely
primarily on tech tools like smartphone apps to document visits.
This is problematic for a number of reasons. Paper-based or
analogue protocols are also offered by other systems such as Luca
(e.g. in the form of "key tags"), but static QR codes have a number
of data protection problems.
Zilp-Zalp, on the other hand, relies primarily on a paper-based
protocol that uses technological tools such as web applications
only in a supportive manner, and essentially works completely
without technological tools for the user during visits. Such a
protocol has many advantages in practice:
Users do not have to install smartphone applications from
sources that may not be trusted.
It is not necessary to carry a smartphone or have internet
connectivity to document visits.
The procedure can also be used by people who have no experience
in using smartphone apps or cannot use them for other reasons.
Basic ideas
To meet the contact tracing requirements outlined in the
overview, the following
must be in place:
A site operator must be able to document a
user 's visit in a legally
compliant manner.
A health department (GA) (plural:
GÄ) must be able to work with an infected user to
identify and contact (with minimal effort) possible at-risk
contacts of that user based on that
user 's visit history.
Basically, for robust contact tracing we need to be able to
identify possible intersections between visits of individual users.
This generally requires a documentation of the visits of individual
users, as well as a procedure to determine for a specific visit of
a user all visits of other users that occurred in the same locality
and in the same time period.
Possible strategies
In principle, there are various strategies for implementing
these requirements. Probably the most obvious but not necessarily
the most privacy-friendly strategy is to manage visit data
centrally and to determine relevant visits from this central data
storage as needed. This approach is followed by Luca, among others.
The problem here is that central data storage creates a multitude
of possibilities for monitoring users that are difficult to solve
technologically or organizationally.
Therefore, a more privacy-friendly solution is to store contact
data in a decentralized manner. To evaluate this, we first divide
the contact tracing problem into two sub-problems:
GÄs must be able to reconstruct visit histories
of individual users and to obtain relevant visits
of other users to these histories.
GPs must also be able to identify contact details
of individuals belonging to the extracted
visits.
Here we see that we * can consider the problem of documenting
*visits independently of * the problem of storing contact
information.
First, we create a possibility to store contact data in such a
way that they can only be processed by GÄ for contact tracing on an
ad hoc basis and are not accessible to any other
actors in the system, even in encrypted form.
Furthermore, we create a way to document
visits in a robust and data-efficient manner.
Protocol (v0.5)
Change history
v0.5
Location data can each be encrypted with a group key and added
to visit data. This allows the data to be stored anonymously in a
backend, which increases the availability of the data and reduces
the risk of loss in contrast to fully decentralized storage. Data
cannot be assigned to a specific location by the backend.
An ombudsman process is described that governs data access in
cases where a health department requests data without being able to
produce a user's secret key.
An extension has been added to describe the manual collection
of contact details by the operator of a locality, and a process for
health boards to access this data collected on behalf of the user
for a specific purpose.
Zilp-Zalp's paper-based, decentralized protocol includes the
following actors:
Users who provide their contact
information and visit history to GÄ for
contact tracing purposes.
Operators oflocalities that
document visits by users to
enable contact tracing of GEs.
Health departments (HAs) using visit
histories and contact information to do
contact tracing.
To enable the exchange of this data between the actors,
Zilp-Zalp implements an infrastructure with different services:
Web applications for operators, GÄ and users
(for the latter here only needed to create QR codes).
An API for sharing visit histories and contact
information.
The encryption of data for GÄs as well as the authentication of
their public requests is done by public-key encryption. In the
following, we assume that GÄs each have a pair of keys for signing
and - encrypting/decrypting data, and that other actors can verify
the trustworthiness of the public keys of these pairs via a
suitable mechanism (e.g., a root certificate that is delivered
together with the web application).
Initialization
Users in the system would like to make their contact data
available to GÄ on an ad hoc basis. We assume here that users would
like to store the data in such a way that GÄ can access the data
without any further action on the part of the user (but in a way
that is comprehensible to the user).
The contact data should only be processed by trustworthy actors
and generally be available to as few actors as possible in the
system (whether in encrypted or unencrypted form).
To enter contact data, users first open the web application and
enter relevant data such as name, address, telephone number and
e-mail address (a validation of this data is described below). The
application then generates two randomly generated symmetric keys $K
_ a$ and $K _ b$, which are combined using a suitable key
derivation procedure to form a key $K _ c$. The application now
encrypts the user's contact data symmetrically with key $K _ c$,
adds key $K _ a$ to this encrypted data, encrypts this data
asymmetrically with the public GÄ data key, and transmits this data
to the API, which stores it in a backend and returns a random
identifier $I_D$. The data stored there cannot be decrypted by any
actor without knowledge of the key $K _ b$ as well as the private
key of the GÄ. The latter is initially under the control of the
user and can only reach a GÄ that can decrypt it via the user or
via an operator.
Furthermore, the user's application generates a random value $H
_ s$, from which a pseudo-random series of further values $H _ 1, H
_ 2, \ldots H _ n$ is generated using a suitable method. The web
application now stores $H _ s$, $I _ D$ and $ K _ b$ together in a
data structure and encrypts them with the public GÄ data key. This
data remains with the user and is only passed on to a GA for
contact tracing.
Now the application generates value pairs consisting of $H _ i$
($ \ge 1$) on the one hand and $K _ b$ and $I _ D$ on the other
hand, whereby $H _ i$ is unencrypted and $(K _ b, I _ D)$ is
encrypted individually for each value pair with the GÄ data key.
The public key $U _ i ^ \mathrm{pub} $ generated for this data set
is appended to the data set, and the application stores the
associated private key $U _ i ^ \mathrm{priv} $ for possible later
transfer to the health authority. These pairs are used for contact
tracking and shared with public operators.
The application then generates QR codes from the data, hands
them to the user for printing, and then deletes all data. The data
that is transferred to the health department in the case of contact
tracing $ ( K _ b, I _ D, H _ s, U _ 1 ^ \mathrm{priv} \ldots U _ n
^ \mathrm{priv} ) $, as well as the data that the user himself uses
to check and adjust his data can either be stored in files, or
locally encrypted and then stored in the backend. In the latter
case, the application generates a random password and ID that the
user must write down in order to recover the data later. Local
storage as a file is preferable from a data protection point of
view.
Note: Currently, deriving $K _ c$ from $(K _ a, K _ b)$ does not
increase the security of the system, as $K _ b $ is kept
permanently with the user's contact information. In an extension of
the Proktoll, however, it is planned to separate the key $K _ b$
from the contact data in an additional step of by a GA, to store it
separately from the beginning or to encrypt it asymmetrically and
to give another party control over the decryption. It was therefore
left in the minutes for the time being.
Sequence diagram
The following sequence diagram summarizes the initialization
process.
User application
Backend
Generate secret keys $K_a$, $K_b$ and
$H_s$, derive $K_c$.
Query current (and future, if applicable)
GA keys from the backend $K_{\mathrm{ga}}^{\mathrm{pub}}$.
Return GA key
$K_\mathrm{ga}^\mathrm{pub}$
Prepare contact information $D_k$.
Encrypt contact details
$D_{k}^e=\mathrm{enc}_{s}(D_k, K_c)$, encrypt with GA key
$D_{k,\mathrm{ga}}^e = \mathrm{enc}_{a}((D_{k}^e,K_a),
K_{\mathrm{ga}}^{\mathrm{pub}})$.
Query backend storage from
$D_{k,\mathrm{ga}}^e$, get $I_D$.
Save $D_{k,\mathrm{ga}}^e$, generate
$I_D$ and return it.
Encrypted data for health department
$D^e_{\mathrm{ga}} = \mathrm{enc}_{a}((K_b, I_D, H_s,
U_1^{priv}\ldots U_n^{priv}),
K_{\mathrm{ga}}^{\mathrm{pub}})$.
Generate QR codes from
$D_{\mathrm{op}}^i$ and $D^e_{\mathrm{ga}}$, print QR codes.
Delete all data.
Optional: Initialization with data validation
As part of the normal initialization, no validation of the
contact data provided by the user takes place. If validation of
this data is desired, the initialization must be performed with the
help of a trusted third party. For this purpose, this third party
operates a special version of the web application via which users
initially initialize their data in exactly the same way as above.
In contrast to the normal initialization, however, the third party
checks the data before encryption (e.g. by comparing it with an
identification document) and confirms its correctness. The web
application then signs each value pair of the user with a signature
that certifies the presence of correct user data for the value
pair. These signatures $ S _ i $ are applied to the user's QR codes
in addition to the value pairs. An operator's web application can
read and confirm this signature when scanning a QR code. The
operator can hereby confirm that validated user data belongs to the
QR code. In addition, the third party can restrict the validity of
the QR codes. This is useful to prevent reuse in the system.
Trusted third parties could be, for example, state institutions
but also, if necessary, private sector actors (e.g. post offices)
that already have experience with the validation of data. However,
implementing such a system is likely to require a great deal of
effort and create an additional risk, as another third party will
have access to a user's data during initialisation. The benefits
should therefore be weighed against the effort involved.
Validation via third party APIs as done in other centralized
systems can theoretically be done as well, e.g. the web application
can only allow the creation of QR codes after certain data like a
phone number or an email address has been validated via an external
service. Since the web application (or generally any client
application) is under the control of the user they can easily
manipulate it to bypass validation. That this is feasible has
already been demonstrated. Client-side validation of data therefore
only discourages non-technical, cooperative users from providing
false data (this is not to say that such additional validation is
completely useless, but it should by no means be considered secure
or reliable).
Visit documentation
To document the visit of a location to operators, users simply
give them a random QR code. In addition to the QR code, it is
possible to enter additional meta data such as the exact time of
arrival and the length of stay in order to increase the accuracy of
the documentation. The operator then records the QR codes received
over a given period (e.g. one day) using the web application.
Initially, they are only stored locally. Operators can also capture
additional meta-data to improve accuracy for contact tracing.
The operator's web application groups visit data as close to
real time as possible (within a few hours). Individual visit data
can exist in several groups (overlapping). For each group, the web
application generates an asymmetric key pair. All visit records of
the group are encrypted with the public key of this pair. The
corresponding private key is encrypted with the public key of each
visit record and stored with the record. In addition, the operator
encrypts the locality data in each case with the respective group
keys of a visit data record and appends them there.
As soon as it is clear that no further visit records will be
assigned to a group, the web application deletes the private key
belonging to the group. The public key is stored together with the
group data. Similarly, the web application deletes a user's
original visit data as soon as it is clear that they will not be
added to another group.
The data encrypted in this way can only be accessed with the
help of the GA data key and a matching private key of a visit
record belonging to the group. These private keys are under the
control of the user and are only transmitted to the GA by the user
in the event of an infection. Accordingly, even if the GA could
access all operator data, it can only decrypt epidemiologically
relevant data for which a user has provided the matching private
key.
Users
Operator
Provide QR code with operator data.
Process QR code using web application for
operators, add relevant metadata (e.g. dwell time, exact
location).
Add visit data to relevant infection
groups, encrypt with group public key. Encrypt private group key
with public key of visit data. Store data together.
Delete non-group-encrypted visit
data.
Delete private group keys that are no
longer needed.
Manual visit recording
The operator must also be able to record contact data for people
who do not use Zilp-Zalp and store it in encrypted form. However,
this raises the question of how a purpose limitation can be
achieved for this data, since in this case the person has not gone
through any initialization in the Zilp-Zalp system. There are
several possibilities for this:
A one-time initialisation can be carried out with the support
of the operator. In this case, the person must be given the
corresponding secret keys, for example in the form of a QR code
printout or by noting two numbers (in the case of encrypted
storage).
Key derivation can be performed using the person's contact
details and locality data (for example, by merging and hashing the
person's name and address and locality). If the person contacts the
health department for contact tracing, the health department can
reconstruct the user's secret keys based on the given data (name of
the person and details of the locality visited) and then perform
contact tracing regularly. However, this weakens the security and -
privacy guarantees for that person and all other users in their
infection community.
It is possible to completely dispense with the storage of the
person's secret key; the person's visit data can then only be
decrypted for a specific purpose via a group key of the
corresponding infection community or with the help of the ombuds
process.
In principle, operators can also hand out prefabricated QR code
blocks to people, which they assign to a person with the help of
initialization. However, this involves the risk that operators can
view the entire QR codes of a block. However, this risk can be
limited by technical means.
Storage of visit data
Visit data can either be stored locally, or stored in a
(federated) backend. Local storage is privacy-friendly, but also
poses availability risks, as the operator of a locality must ensure
that the data is stored in a fail-safe manner. In addition, local
storage may slow down contact tracking. Since Zilp-Zalp already
integrates a mechanism that protects user visit data with the help
of an assignment to infection communities, the purpose limitation
is already given here. Accordingly, visit data can also be stored
in a backend, which can then automatically respond to hashes
written out. Here, the operator's web application can add the
locality data to all encrypted visit data and encrypt it with the
appropriate group key as well. The backend cannot assign individual
visit data to specific locations. Only via metadata (IP address of
the uploading location) would it be possible for a backend to
establish an assignment if necessary.
The backend should again be federated to reduce the risk of
metadata analysis. The storage of data can also take place
asynchronously.
Contact Tracking
In order to identify possible risk contacts of an infected user,
the user first hands over the QR code (either digital or analogue)
to the GA. This can decrypt it with the private GÄ data key,
whereby it receives the values $H _ s ^ l$, $I _ D ^ l$ and $K _ b
^ l$ ($l$ denotes here the data of the $l$-th user). With the value
$I _ D$ the GA can receive the encrypted user data from the
backend, which can be decrypted with the help of the private data
key as well as $K _ b$. Furthermore, the GA can use $H _ s$ to
create all hash values $H _ i$ of the user. It publishes these
values via the backend (together with other hash values to protect
the anonymity of the user). The operators' web applications
regularly download the list of these values and match them with the
locally stored hash values. If there is a match, all visit data
related to these hash values $H _ i$ (e.g. determined by comparing
the visit times) are transferred to the backend via the public API
after confirmation by the operator (if necessary, the data can be
encrypted again with the GÄ data key). From there they can be
recalled by the GA. This can decrypt data from users who have
formed an infection community with it and have accordingly been
encrypted with the same group key. To do this, the GA first
decrypts the group key with the matching private key $U _ i ^
{pub}$ that belongs to the original visit data. With this and the
private GÄ data key, it can in turn decrypt the values $ I _ D ^
k$, and $K _ b ^ k$ of relevant users, which in turn can be used to
query and decrypt the user's contact data from the backend.
However, since the GA does not have the key $ H _ s ^ k$ from this
user, it cannot reconstruct the user's visit history without
consent. On the contrary, the active cooperation of this user is
necessary. Nor can the GA - decrypt visits or - contact data of
users who have not formed an infection community with the original
user.
Sequence diagram
The following sequence diagrams show the contact follow-up
process. For reasons of clarity, the process was divided into three
steps.
Transfer of user data to the GA
First, the GA must receive the GA data from the user in order to
initiate further contact follow-up.
Users
Backend
Health Department
Transfer GA data to health department
(analog or digital).
Decode GA data $D_{\mathrm{ga}} =
\mathrm{dec}_{a}(D^e_\mathrm{ga}, K_{\mathrm{ga}}^{\mathrm{priv}})$
to get $(K_b, I_D, H_s, U_1^\mathrm{priv}\ldots
U_n^\mathrm{priv})$.
Request contact details from backend
using $I_D$.
Return contact information to
$I_D$.
Decode user data $D_{k,\mathrm{ga}} =
\mathrm{dec}_{a}(D_{k,\mathrm{ga}}^e,
K_{\mathrm{ga}}^{\mathrm{priv}})$ to get $(D_{k}^e,K_a)$. Derive
$K_c$ from $K_a$ and $K_b$, decode $D_{k} = \mathrm{dec}_s(D_{k}^e,
K_c)$ to get user data.
Regenerate $H_1 \ldots H_n$ from
$H_s$.
Request publication of values $H_1 \ldots
H_n$ (aggregated with other values to maintain anonymity).
Post the values $H_1 \ldots H_n$ with a
request to send relevant contact details.
Tendering of hashes
The GA then writes out relevant hashes for contact tracing and
waits for feedback from operators. Important: In order to prevent
the submission of manipulated data, operators must always also
submit the data available for the hash tendered. These data cannot
be falsified by an operator without knowledge of the key $K _ b$,
GÄ can thus exclude manipulated or incorrect data. An operator may
still return irrelevant data, but such behaviour can be traced back
to the operator in a number of ways and penalised accordingly. An
operator always returns complete group data, which contains visit
data encrypted with a private group key. The private key in turn
was encrypted with the public keys of the visit data of all group
members.
Operator
Backend
Request hash values written out.
Return written out hashes $H_i, H_j,
\ldots$.
Check hashes against stored visit data.
Search for relevant visit data $D_{op}$.
Request storage of the relevant data
$D_{op}$ associated with the corresponding hash $H_{i}$. Also
provide source data for the requested hash value.
Store data, associated with associated
hashes $H_{i}$.
Processing of relevant contact data
Finally, the GA processes the data of the operators. However,
epidemiological group data can only be decrypted if the GA receives
from the user a matching private key to visit data containing the
encrypted private key of the respective group data. Without the
presence of such a key, the GA cannot decrypt the data even if all
data and all other keys are present.
Backend
Health Department
Request data on hashes written out.
Return relevant data to the hashes.
Decrypt GA data $D_{\mathrm{ga,op}} =
\mathrm{dec}_{a}(\mathrm{dec}_{a}(D^e_\mathrm{ga,op},
K_{g_j}^\mathrm{priv}), K_{\mathrm{ga}}^{\mathrm{priv}})$ from each
record to get $(K_b, I_D)$, where the group key $K_{g_j}^{priv}$
can be decrypted using the matching user key $U_{i}^{priv}$ (or
without it using the Ombuds process).
Request contact details from backend
using $I_D$.
Return contact information to
$I_D$.
Decode user data $D_{k,\mathrm{ga}} =
\mathrm{dec}_{a}(D_{k,\mathrm{ga}}^e,
K_{\mathrm{ga}}^{\mathrm{priv}})$ to get $(D_{k}^e,K_a)$. Derive
$K_c$ from $K_a$ and $K_b$, decode $D_{k} = \mathrm{dec}_s(D_{k}^e,
K_c)$ to obtain user data. Verify and use data for contact
tracing.
Ombuds process
Scenarios are conceivable in which contact tracing must be
performed by a health department without the health department
knowing a user's secret keys. For example, it is conceivable that a
user loses his secret keys, but can provide the health department
with a list of visited locations. The health department should then
be able to use the visit data collected from these locations using
Zilp-Zalp for contact tracing. At the same time, it must be ensured
that health authorities or other actors cannot use this possibility
to decode visit data for any purpose.
One possibility to design such a process is the use of an
ombudsman office, which monitors and controls the data request as a
neutral third party. For this purpose, a key pair can be generated
for this body, the public key being made available to operators.
Group keys are then additionally encrypted with this key. If a
health authority requests visit data from an operator, it must
request decryption of the corresponding group key from the
ombudsman service in order to make the data usable. The ombudsman
checks the request and decrypts the group key if necessary. This
process is then publicly documented if necessary.
The disadvantage of this procedure is that ombud keys in turn
enable the global decryption of visit data. The operator of a
locality can therefore be used as a further trust authority.
However, sole control of decryption by operators should be avoided
as well, since they cannot exercise an effective control function
vis-à-vis public authorities, as the
accessibility of guest lists to police authorities has already
shown.
In general, requesting visit data through this mechanism should
be an absolute exception; accordingly, the process can be equipped
with a strong additional control mechanism in the form of an ombuds
process without greatly reducing its effectiveness.
In our view, the following aspects of the protocol can still be
improved:
Data storage in the backend : In the current
draft, a user's contact data is stored in encrypted form in a
backend so that it can be retrieved by a GA if necessary. In
principle, decryption is only possible if the GA also has one half
of the user's key, which the user has either transmitted directly
to the GA or which was transmitted by an operator as part of a
visit data query. Nevertheless, any centralized data storage poses
a threat to user privacy. A variant of this protocol can therefore
delay the centralization of this data. However, this in turn has
disadvantages, since in this case, GÄs can only obtain users'
contact data with their active assistance. If users are difficult
to reach, this can delay and hinder effective contact tracing. In
this sense, the privacy of users must be weighed against the
interest of the GA.
Variants
To further enhance user privacy, different variants of the
protocol can be created, which are discussed in the following
sections.
Event-related data release
Visit data collection using QR data also works without storing
contact data in a central backend. Accordingly, the collection and
storage of this data can be delayed as follows:
Instead of providing encrypted contact data directly to the
backend when creating QR codes, the user's web application can
initially only request a $ I _ D$ value and a token $ Z $ from the
backend and simultaneously store the public part of an asymmetric
key pair generated by the application for it there. The backend
stores this together with $ Z $ and $I _ D$. The associated private
key, $ I _ D$ and $ Z $ can be made available by the application to
the user for storage.
If the corresponding visit data is found during contact
tracing, the backend determines that no contact data exists for
them yet. It then writes out the token $ Z $ via a public list for
completion.
The user can periodically submit the data $ Z $ and $ I _ D $
to the web application, which then recognizes from the published
token $ Z $ that the user's data has been requested for contact
tracking. It then prompts the user to provide the data, encrypts it
with the GÄ public data key, signs the data with the private key of
the generated key pair, and communicates it via the API to the
backend, which stores it.
The GA can now retrieve the data regularly from the backend via
the API.
This procedure is more privacy-friendly, but may not be
practical for a paper-based procedure as it relies on the active
cooperation of the user. Users who do not regularly open the web
application will not know that their data has been requested.
However, the process is very well suited for digital contact
tracing via app. In this case, it delays the storage of personal
information until it is really needed.
The delayed provision of contact details associated with the
procedure must again be balanced against the protection of the
privacy of individual users.
Extensions
The following sections describe possible extensions to the
protocol that go beyond the basic functionality.
Review by the user
During initialization, in addition to the GA data, a data
package can also be generated for the user, which can contain,
among other things, the secret $ H _ s $ (this data package can
also be protected with a password of your choice).
The user can make this data package, which can also be provided
as a QR code, available to the web application. The web application
can use the data to reconstruct the user's hashes $ H _ i $ and,
with the help of the backend, check whether and by whom these
hashes were written out. The user thus receives information as to
whether his visit data has been requested by a GA and, in the event
of non-notification, can request further data on the use from this
office using his GA data.
The data package can also be used as described above to have
hashes declared invalid, e.g. in the event of loss or theft.
Paper-based contact tracing
Many contact tracing systems, such as Luca or Recover, rely primarily on tech tools like smartphone apps to document visits. This is problematic for a number of reasons. Paper-based or analogue protocols are also offered by other systems such as Luca (e.g. in the form of "key tags"), but static QR codes have a number of data protection problems.
Zilp-Zalp, on the other hand, relies primarily on a paper-based protocol that uses technological tools such as web applications only in a supportive manner, and essentially works completely without technological tools for the user during visits. Such a protocol has many advantages in practice:
Basic ideas
To meet the contact tracing requirements outlined in the overview, the following must be in place:
Basically, for robust contact tracing we need to be able to identify possible intersections between visits of individual users. This generally requires a documentation of the visits of individual users, as well as a procedure to determine for a specific visit of a user all visits of other users that occurred in the same locality and in the same time period.
Possible strategies
In principle, there are various strategies for implementing these requirements. Probably the most obvious but not necessarily the most privacy-friendly strategy is to manage visit data centrally and to determine relevant visits from this central data storage as needed. This approach is followed by Luca, among others. The problem here is that central data storage creates a multitude of possibilities for monitoring users that are difficult to solve technologically or organizationally.
Therefore, a more privacy-friendly solution is to store contact data in a decentralized manner. To evaluate this, we first divide the contact tracing problem into two sub-problems:
Here we see that we * can consider the problem of documenting *visits independently of * the problem of storing contact information.
Protocol (v0.5)
Change history
v0.5
Zilp-Zalp's paper-based, decentralized protocol includes the following actors:
To enable the exchange of this data between the actors, Zilp-Zalp implements an infrastructure with different services:
The encryption of data for GÄs as well as the authentication of their public requests is done by public-key encryption. In the following, we assume that GÄs each have a pair of keys for signing and - encrypting/decrypting data, and that other actors can verify the trustworthiness of the public keys of these pairs via a suitable mechanism (e.g., a root certificate that is delivered together with the web application).
Initialization
Users in the system would like to make their contact data available to GÄ on an ad hoc basis. We assume here that users would like to store the data in such a way that GÄ can access the data without any further action on the part of the user (but in a way that is comprehensible to the user).
The contact data should only be processed by trustworthy actors and generally be available to as few actors as possible in the system (whether in encrypted or unencrypted form).
To enter contact data, users first open the web application and enter relevant data such as name, address, telephone number and e-mail address (a validation of this data is described below). The application then generates two randomly generated symmetric keys $K _ a$ and $K _ b$, which are combined using a suitable key derivation procedure to form a key $K _ c$. The application now encrypts the user's contact data symmetrically with key $K _ c$, adds key $K _ a$ to this encrypted data, encrypts this data asymmetrically with the public GÄ data key, and transmits this data to the API, which stores it in a backend and returns a random identifier $I_D$. The data stored there cannot be decrypted by any actor without knowledge of the key $K _ b$ as well as the private key of the GÄ. The latter is initially under the control of the user and can only reach a GÄ that can decrypt it via the user or via an operator.
Furthermore, the user's application generates a random value $H _ s$, from which a pseudo-random series of further values $H _ 1, H _ 2, \ldots H _ n$ is generated using a suitable method. The web application now stores $H _ s$, $I _ D$ and $ K _ b$ together in a data structure and encrypts them with the public GÄ data key. This data remains with the user and is only passed on to a GA for contact tracing.
Now the application generates value pairs consisting of $H _ i$ ($ \ge 1$) on the one hand and $K _ b$ and $I _ D$ on the other hand, whereby $H _ i$ is unencrypted and $(K _ b, I _ D)$ is encrypted individually for each value pair with the GÄ data key. The public key $U _ i ^ \mathrm{pub} $ generated for this data set is appended to the data set, and the application stores the associated private key $U _ i ^ \mathrm{priv} $ for possible later transfer to the health authority. These pairs are used for contact tracking and shared with public operators.
The application then generates QR codes from the data, hands them to the user for printing, and then deletes all data. The data that is transferred to the health department in the case of contact tracing $ ( K _ b, I _ D, H _ s, U _ 1 ^ \mathrm{priv} \ldots U _ n ^ \mathrm{priv} ) $, as well as the data that the user himself uses to check and adjust his data can either be stored in files, or locally encrypted and then stored in the backend. In the latter case, the application generates a random password and ID that the user must write down in order to recover the data later. Local storage as a file is preferable from a data protection point of view.
Note: Currently, deriving $K _ c$ from $(K _ a, K _ b)$ does not increase the security of the system, as $K _ b $ is kept permanently with the user's contact information. In an extension of the Proktoll, however, it is planned to separate the key $K _ b$ from the contact data in an additional step of by a GA, to store it separately from the beginning or to encrypt it asymmetrically and to give another party control over the decryption. It was therefore left in the minutes for the time being.
Sequence diagram
The following sequence diagram summarizes the initialization process.
Optional: Initialization with data validation
As part of the normal initialization, no validation of the contact data provided by the user takes place. If validation of this data is desired, the initialization must be performed with the help of a trusted third party. For this purpose, this third party operates a special version of the web application via which users initially initialize their data in exactly the same way as above. In contrast to the normal initialization, however, the third party checks the data before encryption (e.g. by comparing it with an identification document) and confirms its correctness. The web application then signs each value pair of the user with a signature that certifies the presence of correct user data for the value pair. These signatures $ S _ i $ are applied to the user's QR codes in addition to the value pairs. An operator's web application can read and confirm this signature when scanning a QR code. The operator can hereby confirm that validated user data belongs to the QR code. In addition, the third party can restrict the validity of the QR codes. This is useful to prevent reuse in the system.
Trusted third parties could be, for example, state institutions but also, if necessary, private sector actors (e.g. post offices) that already have experience with the validation of data. However, implementing such a system is likely to require a great deal of effort and create an additional risk, as another third party will have access to a user's data during initialisation. The benefits should therefore be weighed against the effort involved.
Validation via third party APIs as done in other centralized systems can theoretically be done as well, e.g. the web application can only allow the creation of QR codes after certain data like a phone number or an email address has been validated via an external service. Since the web application (or generally any client application) is under the control of the user they can easily manipulate it to bypass validation. That this is feasible has already been demonstrated. Client-side validation of data therefore only discourages non-technical, cooperative users from providing false data (this is not to say that such additional validation is completely useless, but it should by no means be considered secure or reliable).
Visit documentation
To document the visit of a location to operators, users simply give them a random QR code. In addition to the QR code, it is possible to enter additional meta data such as the exact time of arrival and the length of stay in order to increase the accuracy of the documentation. The operator then records the QR codes received over a given period (e.g. one day) using the web application. Initially, they are only stored locally. Operators can also capture additional meta-data to improve accuracy for contact tracing.
The operator's web application groups visit data as close to real time as possible (within a few hours). Individual visit data can exist in several groups (overlapping). For each group, the web application generates an asymmetric key pair. All visit records of the group are encrypted with the public key of this pair. The corresponding private key is encrypted with the public key of each visit record and stored with the record. In addition, the operator encrypts the locality data in each case with the respective group keys of a visit data record and appends them there.
As soon as it is clear that no further visit records will be assigned to a group, the web application deletes the private key belonging to the group. The public key is stored together with the group data. Similarly, the web application deletes a user's original visit data as soon as it is clear that they will not be added to another group.
The data encrypted in this way can only be accessed with the help of the GA data key and a matching private key of a visit record belonging to the group. These private keys are under the control of the user and are only transmitted to the GA by the user in the event of an infection. Accordingly, even if the GA could access all operator data, it can only decrypt epidemiologically relevant data for which a user has provided the matching private key.
Manual visit recording
The operator must also be able to record contact data for people who do not use Zilp-Zalp and store it in encrypted form. However, this raises the question of how a purpose limitation can be achieved for this data, since in this case the person has not gone through any initialization in the Zilp-Zalp system. There are several possibilities for this:
In principle, operators can also hand out prefabricated QR code blocks to people, which they assign to a person with the help of initialization. However, this involves the risk that operators can view the entire QR codes of a block. However, this risk can be limited by technical means.
Storage of visit data
Visit data can either be stored locally, or stored in a (federated) backend. Local storage is privacy-friendly, but also poses availability risks, as the operator of a locality must ensure that the data is stored in a fail-safe manner. In addition, local storage may slow down contact tracking. Since Zilp-Zalp already integrates a mechanism that protects user visit data with the help of an assignment to infection communities, the purpose limitation is already given here. Accordingly, visit data can also be stored in a backend, which can then automatically respond to hashes written out. Here, the operator's web application can add the locality data to all encrypted visit data and encrypt it with the appropriate group key as well. The backend cannot assign individual visit data to specific locations. Only via metadata (IP address of the uploading location) would it be possible for a backend to establish an assignment if necessary.
The backend should again be federated to reduce the risk of metadata analysis. The storage of data can also take place asynchronously.
Contact Tracking
In order to identify possible risk contacts of an infected user, the user first hands over the QR code (either digital or analogue) to the GA. This can decrypt it with the private GÄ data key, whereby it receives the values $H _ s ^ l$, $I _ D ^ l$ and $K _ b ^ l$ ($l$ denotes here the data of the $l$-th user). With the value $I _ D$ the GA can receive the encrypted user data from the backend, which can be decrypted with the help of the private data key as well as $K _ b$. Furthermore, the GA can use $H _ s$ to create all hash values $H _ i$ of the user. It publishes these values via the backend (together with other hash values to protect the anonymity of the user). The operators' web applications regularly download the list of these values and match them with the locally stored hash values. If there is a match, all visit data related to these hash values $H _ i$ (e.g. determined by comparing the visit times) are transferred to the backend via the public API after confirmation by the operator (if necessary, the data can be encrypted again with the GÄ data key). From there they can be recalled by the GA. This can decrypt data from users who have formed an infection community with it and have accordingly been encrypted with the same group key. To do this, the GA first decrypts the group key with the matching private key $U _ i ^ {pub}$ that belongs to the original visit data. With this and the private GÄ data key, it can in turn decrypt the values $ I _ D ^ k$, and $K _ b ^ k$ of relevant users, which in turn can be used to query and decrypt the user's contact data from the backend. However, since the GA does not have the key $ H _ s ^ k$ from this user, it cannot reconstruct the user's visit history without consent. On the contrary, the active cooperation of this user is necessary. Nor can the GA - decrypt visits or - contact data of users who have not formed an infection community with the original user.
Sequence diagram
The following sequence diagrams show the contact follow-up process. For reasons of clarity, the process was divided into three steps.
Transfer of user data to the GA
First, the GA must receive the GA data from the user in order to initiate further contact follow-up.
Tendering of hashes
The GA then writes out relevant hashes for contact tracing and waits for feedback from operators. Important: In order to prevent the submission of manipulated data, operators must always also submit the data available for the hash tendered. These data cannot be falsified by an operator without knowledge of the key $K _ b$, GÄ can thus exclude manipulated or incorrect data. An operator may still return irrelevant data, but such behaviour can be traced back to the operator in a number of ways and penalised accordingly. An operator always returns complete group data, which contains visit data encrypted with a private group key. The private key in turn was encrypted with the public keys of the visit data of all group members.
Processing of relevant contact data
Finally, the GA processes the data of the operators. However, epidemiological group data can only be decrypted if the GA receives from the user a matching private key to visit data containing the encrypted private key of the respective group data. Without the presence of such a key, the GA cannot decrypt the data even if all data and all other keys are present.
Ombuds process
Scenarios are conceivable in which contact tracing must be performed by a health department without the health department knowing a user's secret keys. For example, it is conceivable that a user loses his secret keys, but can provide the health department with a list of visited locations. The health department should then be able to use the visit data collected from these locations using Zilp-Zalp for contact tracing. At the same time, it must be ensured that health authorities or other actors cannot use this possibility to decode visit data for any purpose.
One possibility to design such a process is the use of an ombudsman office, which monitors and controls the data request as a neutral third party. For this purpose, a key pair can be generated for this body, the public key being made available to operators. Group keys are then additionally encrypted with this key. If a health authority requests visit data from an operator, it must request decryption of the corresponding group key from the ombudsman service in order to make the data usable. The ombudsman checks the request and decrypts the group key if necessary. This process is then publicly documented if necessary.
The disadvantage of this procedure is that ombud keys in turn enable the global decryption of visit data. The operator of a locality can therefore be used as a further trust authority. However, sole control of decryption by operators should be avoided as well, since they cannot exercise an effective control function vis-à-vis public authorities, as the accessibility of guest lists to police authorities has already shown.
In general, requesting visit data through this mechanism should be an absolute exception; accordingly, the process can be equipped with a strong additional control mechanism in the form of an ombuds process without greatly reducing its effectiveness.
Risk analysis
A detailed risk analysis can be found in a separate document.
Opportunities for improvement
In our view, the following aspects of the protocol can still be improved:
Variants
To further enhance user privacy, different variants of the protocol can be created, which are discussed in the following sections.
Event-related data release
Visit data collection using QR data also works without storing contact data in a central backend. Accordingly, the collection and storage of this data can be delayed as follows:
This procedure is more privacy-friendly, but may not be practical for a paper-based procedure as it relies on the active cooperation of the user. Users who do not regularly open the web application will not know that their data has been requested. However, the process is very well suited for digital contact tracing via app. In this case, it delays the storage of personal information until it is really needed.
The delayed provision of contact details associated with the procedure must again be balanced against the protection of the privacy of individual users.
Extensions
The following sections describe possible extensions to the protocol that go beyond the basic functionality.
Review by the user
During initialization, in addition to the GA data, a data package can also be generated for the user, which can contain, among other things, the secret $ H _ s $ (this data package can also be protected with a password of your choice).
The user can make this data package, which can also be provided as a QR code, available to the web application. The web application can use the data to reconstruct the user's hashes $ H _ i $ and, with the help of the backend, check whether and by whom these hashes were written out. The user thus receives information as to whether his visit data has been requested by a GA and, in the event of non-notification, can request further data on the use from this office using his GA data.
The data package can also be used as described above to have hashes declared invalid, e.g. in the event of loss or theft.