This is the codeAbility Sharing Platform! Learn more about the codeAbility Sharing Platform.

Skip to content
Snippets Groups Projects
Commit 68b834c1 authored by Michael Breu's avatar Michael Breu
Browse files

Preliminary finished documentation

parent 8453e0f2
Branches
1 merge request!222Bringing the december release into production
# Step 1
# preparation of a generic jhipster docker container
## Step 1
docker login sharing-codeability.uibk.ac.at:5051
# Step 2
## Step 2
docker build -t sharing-codeability.uibk.ac.at:5051/development/sharing/codeability-sharing-platform/root-jhipster .
# Step 3
## Step 3
docker push sharing-codeability.uibk.ac.at:5051/development/sharing/codeability-sharing-platform/root-jhipster
......@@ -3,7 +3,7 @@
FileHooks
=========
The fileHooks project is a simple infrastructure for forwarding
The fileHooks project (https://sharing-codeability.uibk.ac.at/development/sharing/file-hooks) is a simple infrastructure for forwarding
events from GitLab to the GitSearch REST service at http://sharing_search:8080/api/gitlab/eventListener.
The services GitLab and Elasticsearch are considered backend services.
......@@ -12,40 +12,6 @@ This section describes the fileHooks used in GitLab and the infrastructure setup
Finally, some tips to handle errors are provided.
GitSearch Indexer
=================
.. _ref_gitsearch_indexer:
The GitSearch Indexer listens to requests via the REST service at http://sharing_search:8080/api/gitlab/eventListener.
It is responsible for validating and updating the Elasticsearch index.
This GitSearch Indexer does two tasks:
1. Health check and validation: It informs the user who modified the project via email if the metadata information is incomplete or invalid after a modification in a repository was conducted.
Validation happens on the ``master``-branch of all projects in the group ``sharing``. It also checks projects in all other groups, however if they do not contain meta data, the check is skipped.
The indexer will mainly be triggered by push events, but also by moving or renaming a project and or groups/namespaces.
The check proceeds as follows:
First, the root directory of the repository is checked for files named ``metadata.json``, ``metadata.yaml``, or ``metadata.yml``.
There must be exactly one such file, otherwise the check fails.
Subsequently, the correctness of all metadata files is validated (also dependent metadata files, if it is a collection).
If an error occurred, an email is sent to the user who pushed the changes.
Meta data checks comprise:
- the syntactical correctness of the metadata file as yaml or json file (results in an error)
- the presence of the required fields (results in an error)
- the presence of the required fields in the dependent metadata files (results in an error)
- checks against the vocabulary service at https://oeresource.logic.at/en/meta/api/v1?format=json (results in a warning)
The check fails if there is an error, but is accepted if there are only warnings. In both cases the author is informed by e-mail.
2. It keeps the Elasticsearch index up-to-date by adding/updating/deleting files according to the triggered GitLab event.
Only the ``main``-branch (or ``master`` if ``main`` does not exist) and the group ``sharing`` (including subgroups and all subprojects) are indexed in Elasticsearch.
Metadata files (``metadata.json``, ``metadata.yaml``, or ``metadata.yml``) at the project root are indexed in the alias ``metadata``.
Finally, the GitSearch Indexer provides functionality, to recreate the index and to recheck all projects. During this task all event-processing is paused.
Infrastructure Setup
......@@ -63,7 +29,6 @@ Subsequently, the setup for GitLab, PlantUml Elasticsearch is shown.
The setup of the Services GitLab search and MySQL are discussed in the section :ref:`ref_git_search`.
To create all containers for the backend in production, a docker-compose script is provided in ``src/main/docker/gitlab-setup/``.
It can be executed as follows:
......@@ -72,10 +37,6 @@ It can be executed as follows:
- ``docker-compose create``
- ``docker-compose start``
Files in the ``setup/config/`` directory are ignored by git by default, so writing secrets into a copy
prevents accidentally committing them.
The following environment variables are set within the config files.
No modification should be required for those if the correct config file is used.
......@@ -114,67 +75,15 @@ Installing the Filehooks package
In the previous section, the container infrastructure is set up.
When this is successfully done, the filehooks code needs to be installed in the GitLab container.
There is another script in the ``setup`` directory for this job:
There is a script in the ``setup`` directory of the file-hooks project for this job:
.. code-block::
./install_filehooks_locally.sh --create-index
./install_filehooks_locally.sh
This script copies files from the repository into the GitLab container
and sets up the code such that it is run whenever GitLab emits an event.
The ``--create-index`` flag causes the initial index to be created in ElasticSearch.
Local Development Setup
~~~~~~~~~~~~~~~~~~~~~~~
For development, it can be beneficial to have the setup running
on the development machine.
In order to have access to the GitLab and ElasticSearch containers via http,
modify ``setup/docker-compose.yml`` and enable the lines marked with the comment
.. code-block::
# add this for your local testing setup
It might also be useful to remove the lines saying ``restart: always``.
For local development the ``local`` config file can by used directly:
Afterwards, the same setup procedure as for deployments can be used.
The configuration for local development does not need to be copied and modified
if the defaults are used.
.. code-block::
cd setup
./setup-infrastructure.sh config/local
When this completes, the filehooks code needs to be installed as described previously:
.. code-block::
./install_filehooks_locally.sh --create-index
At this point, GitLab should be reachable at http://localhost:10082 and ElasticSearch at http://localhost:9200.
To view the entire index check http://localhost:9200/metadata/_search.
The index will be updated whenever a repository or group in the GitLab group "sharing" is updated.
Execution logs of the filehooks code can be found in the GitLab container at ``/var/log/gitlab/gitlab-rails/trigger_project_update.log``
for normal logs and ``/var/log/gitlab/gitlab-rails/file_hook.log`` for crashes due to uncaught exceptions.
A convenient way to observe these files is running
.. code-block::
docker exec -it sharing_gitlab tail -f /var/log/gitlab/gitlab-rails/file_hook.log
for errors and similarly for the logs.
To deploy modifications of the code to GitLab, the relevant files need to be copied to the mounted volume.
The script ``install_filehooks_locally.sh`` does this automatically when called without any arguments.
This allows to quickly install new code without having to restart the container.
The GitLab container can be accessed interactively by running
......@@ -351,3 +260,38 @@ FileHooks
- ``/var/log/gitlab/gitlab-rails/file_hook.log``: Fatal errors (e.g., unexpected exceptions) are logged in this file.
- ``/var/log/gitlab/gitlab-rails/trigger_project_update.log``: General logging information for the fileHook ``trigger_project_update.py`` are logged in this file.
GitSearch Indexer
=================
.. _ref_gitsearch_indexer:
The GitSearch Indexer listens to requests via the REST service at http://sharing_search:8080/api/gitlab/eventListener.
It is responsible for validating and updating the Elasticsearch index.
This GitSearch Indexer does two tasks:
1. Health check and validation: It informs the user who modified the project via email if the metadata information is incomplete or invalid after a modification in a repository was conducted.
Validation happens on the ``master``-branch of all projects in the group ``sharing``. It also checks projects in all other groups, however if they do not contain meta data, the check is skipped.
The indexer will mainly be triggered by push events, but also by moving or renaming a project and or groups/namespaces.
The check proceeds as follows:
First, the root directory of the repository is checked for files named ``metadata.json``, ``metadata.yaml``, or ``metadata.yml``.
There must be exactly one such file, otherwise the check fails.
Subsequently, the correctness of all metadata files is validated (also dependent metadata files, if it is a collection).
If an error occurred, an email is sent to the user who pushed the changes.
Meta data checks comprise:
- the syntactical correctness of the metadata file as yaml or json file (results in an error)
- the presence of the required fields (results in an error)
- the presence of the required fields in the dependent metadata files (results in an error)
- checks against the vocabulary service at https://oeresource.logic.at/en/meta/api/v1?format=json (results in a warning)
The check fails if there is an error, but is accepted if there are only warnings. In both cases the author is informed by e-mail.
2. It keeps the Elasticsearch index up-to-date by adding/updating/deleting files according to the triggered GitLab event.
Only the ``main``-branch (or ``master`` if ``main`` does not exist) and the group ``sharing`` (including subgroups and all subprojects) are indexed in Elasticsearch.
Metadata files (``metadata.json``, ``metadata.yaml``, or ``metadata.yml``) at the project root are indexed in the alias ``metadata``.
Finally, the GitSearch Indexer provides functionality, to recreate the index and to recheck all projects. During this task all event-processing is paused.
......@@ -11,22 +11,16 @@ Concept
Ideas
~~~~~
The main idea of the frontend of the CodeAbility Sharing Platform is providing a search engine which allows searching the projects provided in any repository that lies within the group ``sharing``.
The main idea of the frontend of the CodeAbility Sharing Platform is providing a search engine which allows searching the projects provided in any repository that lies within the group ``sharing``.
Additionally other repositories are indexed, if they contain a metadata-file in the root directory of the repository.
The search engine should also allow users to search/filter for specific metadata (e.g., license(s), programming language(s)).
Search results should be presented visually appealing so that users can easily find and use resources without any prior knowledge (especially Git).
Moreover, the Sharing Platform should provide a starting and an imprint page, as well as data privacy pages.
Challenges
~~~~~~~~~~
- How can we ensure that a user is allowed to see specific content? How can we synchronize user permissions with GitLab?
- How can we add additional statistics not recorded by GitLab, like the number of downloads and views?
Technology
~~~~~~~~~~
The prototype uses a monolithic architecture and `Spring Boot <https://spring.io/projects/spring-boot>`_ with Java as a backend and `Angular <https://angular.io/>`_ as frontend technology.
It was generated with `JHipster <https://www.jhipster.tech>`_ v6.10.0.
The application was generated with `JHipster <https://www.jhipster.tech>`_ v7.9.3.
`JWT <https://jwt.io/>`_-tokens are used for authentication.
In production `MySQL <https://www.mysql.com/>`_ is utilized as database management system.
H2 with disk-based persistence is employed in development.
......@@ -39,67 +33,30 @@ Production/Development environment
Before starting the search, please ensure that the backend is up and running!
Afterward, the following steps can be executed to start the search.
1. Log in to the Docker registry
.. code-block:: bash
docker login docker.uibk.ac.at:443
2. Clone the GitLab search project
3. Navigate to the folder ``src/main/docker``
4. Set the required environment variables
+----------------------+----------------------------------------+------------------------------------------------+
| Environment variable | Production | Development |
+======================+========================================+================================================+
| MYSQL_HOME | /mnt/qt-sharing-codeability/mysql | /mnt/qt-codeability-austria/sharing/mysql |
+----------------------+----------------------------------------+------------------------------------------------+
5. Start the docker containers
.. code-block:: bash
docker-compose -f gitsearch.yml up
.. note::
When this setup is executed for the first time, the default password of the users ``admin`` (``Sharing-Search - default - admin``) and ``user`` (``Sharing-Search - default - user``) should be changed!
The default password can be found in KeePass.
The new password of admin and user should be added to KeePass (``Sharing-Search - Dev - admin`` and ``Sharing-Search - Dev - user`` for development and ``Sharing-Search - Prod - admin`` and ``Sharing-Search - Prod - user`` for production).
The ci/cd pipeline is configured to automatically build and deploy the application to the development (CI task 'deploy') or the production environment (CI task 'deploy-prod').
The required environment variables are defined in the GitLab project settings (CI/CD -> Variables) and .env file.
Local development environment
-----------------------------
A local development environment may help develop, test, and debug the application.
The following parts should run on your development machine to set up a development environment locally.
Backend
~~~~~~~
To set up the entire backend of the CodeAbility Sharing Platform, execute the following steps.
It is however quite resource intensive and requires a lot of setup. Therefore the recommended way to develop the GitLab search is to use docker infrastructure
on the development environment.
1. Navigate to the folder ``setup``
2. Open the ports and add the public network to the service ``elastic`` in the file ``docker-compose.yml``:
I.e. use the gitlab instance at https://sharing.codeability-austria.uibk.ac.at/dashboard.
Additionally to use the elasticsearch index, open an ssh-tunnel to the elasticsearch docker container.
.. code-block:: YAML
ports:
- '9200:9200'
- '9300:9300'
networks:
- backend
- frontend
.. code-block::
ssh -L 9200:sharing_elasticsearch:9200 <username>@codeability-austria.uibk.ac.at
3. Execute the script ``setup-infrastructure.sh`` (TODO: add script documentation)
or in windows (e.g. with putty)
.. note::
This setup is somewhat resource intensive. For developing the GitLab search, it may be sufficient to run only an elasticsearch server and add data 'manually' to elasticsearch.
For example, SSH port forwarding could be used ``ssh -L 9200:192.168.48.4:9200 <username>@codeability-austria.uibk.ac.at``. For this to work, the port ``9200`` needs to be made available. In the default configuration it cannot be accessed this way. If you want to enable such access using the setup script, edit ``setup/docker-compose.yml`` as described previously. Note that such access via SSH requires being connected to UIBK's internal network (possibly via VPN).
.. image:: puttyConfig.png
:width: 300pt
:alt: Putty tunnel config
......@@ -131,9 +88,7 @@ Authentication and Integration with GitLab
It is important that the user of the search platform does not need to register / login twice for the CodeAbility repository
In the long run we should integrate with existing authentication systems as e.g. EduId (or even Google or Facebook).
For a short term solution we use the sharing platform GitLab as an authentication (and authorisation) provider.
To this end a keycloak server is installed.
Basic Concepts
~~~~~~~~~~~~~~
......
......@@ -56,7 +56,7 @@ Development Server
- OS: Debian 10
.. note::
The development server is shared with other development services (e.g., Artemis).
......
Code Documentation
==================
The package ``filehooks`` implements the main functionality for the backend of the CodeAbility Sharing Platform.
Since relative imports do not work without any issues, this module has to be installed in GitLab.
The documentation for filehooks can be found at https://development.pages.sharing-codeability.uibk.ac.at/sharing/file-hooks/
This package can be installed with ``pip`` using the following command.
.. code-block:: bash
pip3 install .
.. note::
When installing this package manually the api token for GitLab, email username and email password have to be set in the ``conf.production.ini`` before the installation!
filehooks module
---------------
.. automodule:: filehooks
:members:
:undoc-members:
:show-inheritance:
scripts
-------
Collection of all scripts which can be installed in GitLab to extend its functionality.
.. note::
Those scripts assume that the package ``filehooks`` is installed!
.. toctree::
:maxdepth: 1
trigger_project_update
Tests
-----
Tests are written using the Python testing framework ``pytest``.
Unit tests
~~~~~~~~~~
Unit tests can be found in ``tests/filehooks`` and can be executed manually by the following command:
.. code-block:: bash
pytest --cov-report term-missing --cov=filehooks/ tests/filehooks
After each alteration of the code, it should be ensured that a code coverage of 100% is available.
On every push event, unit tests are automatically executed by GitLab CI/CD.
Integration tests
~~~~~~~~~~~~~~~~~
Since the code dependents heavily on the GitLab behavior, which was mocked in the unit tests,
integration tests are provided to check that the entire system works as expected.
.. note::
The integration tests' current implementation is not very robust, meaning that some tests may fail when executed too fast even though they would pass if waited long enough.
The integration tests are not executed automatically by GitLab CI/CD.
Those tests can be executed locally by running ``./run_integration_tests.sh``.
In order to run the integration tests in isolation, they run in a dedicated
set of containers.
These are created automatically upon calling the script mentioned above,
if they do not exist already.
They have the same names as the containers which are created for production,
but with the postfix ``_integration``.
Data used by these containers and the container running the integration tests
is stored in ``/tmp/sharing/integration/`` by default.
This location can be configured in the ``run_integration_tests.sh`` script.
.. note::
Some tests use the configuration file ``filehooks/conf/conf.test.ini``. Please ensure that the correct values are set before test execution.
.. note::
The containers use quite a lot of memory, so running the integration tests while
the normal set of containers is running could be problematic on systems with
too little memory/swap space.
Linter
------
The tools ``pylint`` and ``flake-8`` are used for static code analysis.
Experience shows that ``pylint`` is more strict and verbose.
However, if ``flake-8`` finds a potential issue, it is worth checking it out.
Some default settings of the tools had to be adjusted. It should be ensured that the ``pylint``-score always reaches 10 points.
If a potential issue is fine the way it is, suppress it.
Automated code checks
---------------------
This project uses git hooks to automatically check the code.
Git hooks are scripts which are automatically executed upon certain git events.
They can be installed on the client side (the developers machine)
and on the server side (the git server which hosts the repository, e.g. a GitLab instance).
Installation
~~~~~~~~~~~~
For development, a client side hooks is used to check code about to be committed.
This is done by a pre-commit hook.
Client side hooks need to be installed by the developer an their machine:
1. Install the python package ``pre-commit``, e.g. by running ``pip install pre-commit``. For other installation methods, see https://pre-commit.com/#installation.
2. Go to the root directory of the project and run ``pre-commit install``.
Working with code checks in place
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When these two steps succeeded, git will automatically run some checks on the code before every commit.
If any check fails, the commit is aborted.
Some checks which auto-format files fail if they do any formatting.
When this happens, the changes they do are not put into the git index
and need to be added manually, e.g. by running ``git add <auto-formatted file>``
in order to include them in the commit.
Some IDEs which allow committing from within them might not do a good job at displaying the error messages.
Running ``git commit`` from a terminal should give much better feedback,
including colored messages and information about which check is running at the moment.
Checks in use
~~~~~~~~~~~~~
The pre commit hooks runs several checks.
The detailed configuration is found in the pre-commit configuration file ``.pre-commit-config.yaml``.
These checks are run:
- some generic hooks (pre-commit defaults)
- ``isort`` sorts import statements (changes are not added to git index automatically)
- ``black`` formats code (changes are not added to git index automatically)
- ``mypy`` checks type annotations
- ``pyright`` checks type annotations
- ``flake8`` linter
- ``pylint`` linter
- ``pytest`` runs unit tests
Please keep in mind that the integration tests are not run automatically.
Since it takes several minutes to run them, it would not be reasonable to include them here.
Disabling checks
~~~~~~~~~~~~~~~~
Sometimes it might be necessary to temporarily disable on of the checks.
One way to do it is forcing the commit, which just skips all checks.
However, this is discouraged, since most checks will probably be useful.
For example, when pylint finds an issue in the code which you intend to fix in a later commit
it still makes sense to auto-format the code. In such cases the ``SKIP`` environment variable can be used:
``SKIP=pylint,flake8 git commit``
sets this variable and initiates a commit.
The variable takes a comma-separated list of checks to skip.
The names of the tools can be found in the configuration file for pre-commit, ``.pre-commit-config.yaml``.
There looks for the values of ``id`` keys.
GitLab CI
~~~~~~~~~
Since the git-hook based checks need to be installed by each developer on their machine,
it could happen that someone forgets to use them.
GitLab continuous integration jobs are used to run most of the same checks
in our GitLab instance. This requires a GitLab runner to be installed.
The configuration is found in ``.gitlab-ci.yml``.
Each push to GitLab starts a pipeline which runs the configured tools.
Linter failures are indicated as warnings,
unit test failures as errors which abort the pipeline.
It is possible to view the command line output of the tool in GitLab,
and in some cases artifacts can be downloaded.
Commandline Interface
---------------------
A command-line interface for elastic search index manipulation is provided. It supports the following operations:
- List indexes (``list-index``)
- Create index (``create-index``)
- Delete index (``delete-index``)
- Delete unused MetaData indices (``delete-unused-indices``)
- List aliases (``list-alias``)
- Switch aliases (``switch-alias``)
- Reindex (``reindex``)
Workflow
~~~~~~~~
1. Start a shell in the ``sharing_gitlab`` docker container
.. code-block:: bash
docker exec -it sharing_gitlab /bin/bash
2. Navigate to ``utils`` (e.g., ``/file-hooks-src/utils``)
3. Run ``python3 cli.py`` with the appropriate arguments
.. note::
Running ``python3 cli.py`` with no arguments yields the help output.
Commands
~~~~~~~~
``list-index``
^^^^^^^^^^^^^^
- *Functionality:* List all elasticsearch indexes
- *Usage:*
.. code-block:: shell
python3 cli.py list-index
``create-index``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- *Functionality:* Creates new indexes for metadata information
- *Usage:*
.. code-block:: shell
python3 cli.py create-index \
-idx-metadata IDX_METADATA
- *Arguments:*
- IDX_METADATA: Name of new index for metadata information
``delete-index``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- *Functionality:* Deletes elasticsearch index
- *Usage:*
.. code-block:: shell
python3 cli.py delete-index IDX
- *Arguments:*
- IDX: Name of index to be deleted
``delete-unused-indices``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- *Functionality:* Deletes currently unused metadata indices in elasticsearch
- *Usage:*
.. code-block:: shell
python3 cli.py delete-unused-indices
``list-alias``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- *Functionality:* List all elasticsearch aliases
- *Usage:*
.. code-block:: shell
python3 cli.py list-alias
``switch-alias``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- *Functionality:* Removes the alias for the old metadata index and adds a new alias for the new metadata index
- *Usage:*
.. code-block:: shell
python3 cli.py switch-alias \
--old-idx-metadata OLD_IDX_METADATA \
--new-idx-metadata NEW_IDX_METADATA
- *Arguments:*
- OLD_IDX_METADATA: Name of new index for metadata information
- NEW_IDX_METADATA: Name of new index for metadata information
.. note::
- Running ``python3 cli.py -h`` yields the help output.
- Running ``python3 cli.py <command> -h`` yields the help for the specified command.
``reindex``
^^^^^^^^^^^
- *Functionality:* Creates a new index for metadata, fills them and switches the alias.
- *Usage:*
.. code-block:: shell
python3 cli.py reindex
Example
~~~~~~~
Assuming you executed steps 1 and 2 from the workflow. Your workflow for reindexing might look like:
.. code-block::
$ python3 cli.py list-index
list-index
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open idx_metadata_0 KGQ0aWacREKM44aEYLWkNA 1 1 4 3 41.1kb 41.1kb
$ python3 cli.py list-alias
list-alias
alias index filter routing.index routing.search is_write_index
metadata idx_metadata_0 - - - true
$ python3 cli.py create-index idx_metadata_1
$ python3 cli.py list-index
list-index
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open idx_metadata_1 LaA64koiRESp4eF-Of1GTA 1 1 4 0 27.9kb 27.9kb
yellow open idx_metadata_0 KGQ0aWacREKM44aEYLWkNA 1 1 4 3 41.1kb 41.1kb
$ python3 cli.py switch-alias --old-idx-metadata idx_metadata_0 --new-idx-metadata idx_metadata_1
switch-alias
You are about to switch the following alias:
'idx_metadata_0 -> 'idx_metadata_1'
Would you like to continue? [Y/n]
Y
The aliases were switched!
$ python3 cli.py list-alias
list-alias
alias index filter routing.index routing.search is_write_index
metadata idx_metadata_1 - - - true
$ python3 cli.py delete-index idx_metadata_0
delete-index
You are about to delete the indexes in the list ['idx_metadata_0']. Would you like to continue? [Y/n]
Y
The indexes in the list ['idx_metadata_0'] were deleted!
.. warning::
If users add content between the index creation and switching of the alias, this content might not be indexed until it is changed again!
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment