El Géo Paso - Documentation¶
Description : Simple web application performing statistical analisis on job offers published on GeoRezo.
Auteur/ice(s) et contributeur/ice(s) : Julien M. (guts@github, geojulien@twitter)
Version du projet : 1.5.0
Code source : https://github.com/Guts/elgeopaso/
Dernière mise à jour : 12 December 2022
Introduction¶
Documentation technique du projet El Géo Paso, statistiques dynamiques sur les offres d’emploi en géomatique publiées sur le forum francophone de géomatique GeoRezo.
Base de connaissances techniques sur le projet El Géo Paso. Cette base, enrichie au gré du temps disponible et de l’envie (autant dire qu’elle est incomplète), a vocation à éviter que le projet ne soit une boîte noire et à faciliter les phases de reprise du développement, celui-ci étant discontinu et irrégulier (bénévolat mon amour).
Description fonctionnelle¶
Récupération à partir de GeoRezo¶
Toutes les heures, les dernières offres publiées sont récupérées à partir du flux RSS du forum Job de GeoRezo et stockées en brut dans une table dédiée ;
Chaque nouvelle offre est analysée en s’appuyant sur le kit de traitement du langage naturel NLTK et des correspondances personnalisables en base de données via l’interface d’administration ;
Représentation des données¶
Les données sont ensuite représentées dans différents modes :
métriques globales ;
valeurs absolues du nombre d’offre par période ;
valeurs proportionnelles selon le différents critères (types de contrats…)
Description technique¶
Pour dupliquer le projet, la documentation est dans le wiki du dépôt.
Base de données¶
*Modèle généré automatiquement par Django Extensions graph-models (pydot)*
Briques logicielles¶
Le projet est développé en Python 3.5.x avec le framework Django et des extensions :
feedparser pour la consommation du flux RSS
NLTK pour l’analyse sémantique
DRF (Django REST Framework) et drf-yasg pour la mise en place de l’API REST et sa documentation automatisée
Django Extensions pour la boîte à outils de dév Django
Du côté de l’interface du site web, on retrouve les classiques :
Django Suit pour l’habillage de l’interface d’administration
Le site est servi sur le Web par gunicorn et nginx ou Apache selon les plateformes (dév ou production).
Histoire du Projet¶
Auteurs¶
L’idée initiale est celle de Pierre Vernier et de Julien Moura, portés que nous étions par la regrettée dynamique de Geotribu !
Les bénévoles de GeoRezo ont aussi contribué à relancer la dynamique lorsqu’elle s’éteignait, que ce soit Yves Jacolin pour les aspects techniques, Marc Isenmann pour son intérêt jamais démenti pour l’analyse des offres qu’il modère depuis tant d’années et Bruno Iratchet pour son soutien ponctuel mais non moins fidèle.
Pourquoi El Paso¶
Pourquoi El Paso ? Depuis des temps immémoriaux, le choix d’un nom de projet informatique est cornélien (le mot est sûrement trop faible). Né autour de fajitas et de rhum, le nom El Paso s’est imposé comme une évidence. Bien traduit, il représente le petit pas qui sépare un chercheur d’emploi d’un poste. Une simple offre, un simple pas… non, en fait, rien de tout cela, c’est juste parce-que les fajitas étaient bonnes !
Gestion du projet¶
Nommage des versions¶
Le projet respecte SemVer, pour Semantic Versioning.
Consulter :
le paragraphe Wikipédia dédié à cette convention de nommage
Exemples :
1.0.0
Pré-production¶
Le site est déployé en pré-production sur Heroku.
Prérequis¶
compte Heroku : le niveau gratuit suffit pour un déploiement basique
outil en ligne de commande Heroku : CLI Heroku
Déployer¶
Déploiement automatisé¶
L’application peut être déployée automatiquement via le fichier app.json
.
Plus d’informations : https://devcenter.heroku.com/articles/heroku-button
Déploiement pas à pas¶
Commandes lancées sous Windows 10 avec WSL activé.
# authenticate
heroku login
# add convenient plugin to pull/push from/to `.env` files
# https://github.com/xavdid/heroku-config
heroku plugins:install heroku-config
# create app in Europe
heroku create elgeopaso-dev --region eu
# add PostgreSQL database and schedule backup
heroku addons:create heroku-postgresql:hobby-dev --version=12
heroku pg:backups schedule --at "02:00 Europe/Paris" DATABASE_URL
# On bash, use simple quotes for the time zone: heroku pg:backups schedule --at '02:00 Europe/Paris' DATABASE_URL
# set some environment variables pointing to application's settings
heroku config:set PYTHONHASHSEED=random
heroku config:set WEB_CONCURRENCY=4
heroku config:set DJANGO_DEBUG=False
heroku config:set DJANGO_SETTINGS_MODULE=elgeopaso.settings.production
heroku config:set DJANGO_ALLOWED_HOSTS=elgeopaso-dev.herokuapp.com
heroku config:set DJANGO_ADMIN_URL=admin
heroku config:set DJANGO_SECRET_KEY=$(wsl -- openssl rand -base64 64)
# on bash: heroku config:set DJANGO_SECRET_KEY="$(openssl rand -base64 64)"
# now, let's deploy
git push heroku master:master
heroku run python manage.py migrate
heroku run python manage.py createsuperuser --email elpaso@georezo.net
heroku run python manage.py check --deploy
heroku open
Git Flow¶
Le projet s’appuie sur l’intégration d’Heroku avec Github pour déployer des versions de test/développement et de pré-production :
les Review Apps : des déploiements temporaires correspondant à une Pull Request
une application gratuite : déploiement automatisé à partir de
master
sur https://elgeopaso-dev.herokuapp.com/.
Branches¶
origin/master
:branche principale
correspond à la pré-production
pull request obligatoire : aucun commit ne peut être poussé directement
automatiquement déployée sur Heroku : https://elgeopaso-dev.herokuapp.com/
origin/develop
:branche générique pour le développement actif
origin/housekeeping
:branche dédiée aux opérations courantes de maintenance, mise à jour des dépendances, etc.
Processus type¶
Une nouvelle branche est créée ou une existante est utilisée
Des changements sont apportés dans cette branche et poussés vers la branche principale (master) via une pull-request. Un déploiement temporaire est effectué sur une URL mi-aléatoire. Exemple :
travail sur l’amélioration de la lecture du RSS pour gérer les problèmes d’encodage : https://github.com/Guts/elgeopaso/pull/9
déploiement temporaire correspondant : https://el-geo-paso-rss-parser-xxsprem.herokuapp.com/ - l’URL est indiqué sur la pull request
Une fois les changements achevés et validés, ils sont fusionnés dans la branche principale (merged) qui est automatiquement déployée sur Heroku : https://elgeopaso-dev.herokuapp.com/
Lorsqu’une nouvelle version est finalisée, un numéro de version est ajouté via un
git tag
.
Pour comprendre l’étiquetage des commits, voir https://git-scm.com/book/en/v2/Git-Basics-Tagging ou Divers - Utilitaires.
Déploiement¶
Depuis le serveur de production¶
Configuration initiale de Git¶
S’ajouter aux utilisateurs
sudo adduser geotribu users
Générer une paire de clés SSH :
ssh-keygen -f ~/.ssh/git_elgeopaso_rsa -t rsa -b 4096 -C "elpaso@georezo.net"
Ajouter la clé publique dans la partie Deploy keys du dépôt en lecture seule : https://github.com/Guts/elgeopaso/settings/keys.
Voir la documentation officielle de GitHub : https://developer.github.com/v3/guides/managing-deploy-keys/#deploy-keys
Configurer le dossier de destination¶
On utilise le fork du script de François Romain :
# récupérer le script
mkdir ~/scripts
cd ~/scripts
git clone git@gist.github.com:36672e8730244764b4a047f6584bd66d.git git-flow-deploy
# lancer le script
source git-flow-deploy/project-create elgeopaso
# modifier le git hook
cd /srv/git/elgeopaso.git/hooks/
sudo nano post-receive
# copier le contenu du fichier : .deploy/git-hooks/post-receive
Ressources :
Voir le billet de blog lié : https://medium.com/@francoisromain/vps-deploy-with-git-fea605f1303b
Depuis la machine locale¶
Ajouter le dépôt distant correspondant au serveur :
git remote add deploy-prod ssh://geotribu@elgeopaso.georezo.net/srv/git/elgeopaso.git/
Pour publier (par exemple depuis master) :
git push --follow-tags deploy-prod master
Utiliser Apache pour servir le site¶
Pour servir l’application avec Apache, retenir ces quelques points de vigilance :
par défaut, Apache ne support pas WSGI. Il faut donc utiliser le module
mod_wsgi
pour Apache.par défaut sur Ubuntu 18.04, ce module est compilé ave Python 3.6. Or, il faut utiliser la version compilée avec la même version de Python que celle utilisée par l’application.
Prérequis¶
# add repo with latest Apache version
sudo add-apt-repository ppa:ondrej/apache2
# install apache and dependencies
sudo apt install apache2 apache2-dev brotli
# enable brotli module
sudo a2enmod brotli
Déployer l’application Django avec le module Apache WSGI¶
1. Identifier la bonne version du module et le chemin Python¶
Installer et utiliser le module inclus dans l’environnement virtuel de l’application¶
# in project folder
cd /var/www/elgeopaso
source .venv/bin/activate
# install mod_wsgi Python module
python -m pip install mod-wsgi==4.7.*
# run the config command to get the directives values
mod_wsgi-express module-config
> LoadModule wsgi_module "/var/www/elgeopaso/.venv/lib/python3.7/site-packages/mod_wsgi/server/mod_wsgi-py37.cpython-37m-x86_64-linux-gnu.so"
> WSGIPythonHome "/var/www/elgeopaso/.venv"
Installer et utiliser le module directement dans Apache¶
# open root input
sudo su -
# in project folder
cd /var/www/elgeopaso
source .venv/bin/activate
# install mod_wsgi Python module
python -m pip install mod-wsgi==4.7.*
# run the config command to get the directives values
mod_wsgi-express install-module
> LoadModule wsgi_module "/usr/lib/apache2/modules/mod_wsgi-py37.cpython-37m-x86_64-linux-gnu.so"
> WSGIPythonHome "/var/www/elgeopaso/.venv"
2. Mettre à jour la configuration Apache¶
# edit Apache module wsgi loader
sudo nano /etc/apache2/mods-available/wsgi.load
# paste the line output in the previous step. For example:
LoadModule wsgi_module "/usr/lib/apache2/modules/mod_wsgi-py37.cpython-37m-x86_64-linux-gnu.so"
3. Enable site, reload and restart¶
Validate configuration syntax:
sudo apache2ctl -t
Enable virtual hosts:
sudo a2ensite elpaso.conf
sudo a2ensite elpaso-ssl.conf
At the end, restart Apache server:
sudo service apache2 restart
4. Générer le certificat SSL avec Let’s Encrypt¶
Il s’agit principalement de la reproduction de la doc officielle : https://certbot.eff.org/lets-encrypt/ubuntubionic-apache.
# travailler dans home
mkdir ~/letsencrypt
cd ~/letsencrypt
# enregistrer le dépôt des paquets de certbot - letsencrypt
sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository universe
sudo add-apt-repository ppa:certbot/certbot
sudo apt update
# installer le certbot
sudo apt-get install certbot python3-certbot-apache
# lancer le processus en choisiasant elgeopaso.georezo.net
sudo certbot --apache
Tester le renouvellement automatique :
sudo certbot renew --dry-run
Commandes habituelles¶
# check full version and compilation details
apache2ctl -V
# help
apache2ctl -h
# list enabled modules
apache2ctl -M
Resources¶
Tâches planifiées¶
Le projet repose sur certaines tâches récurrentes :
la récupération des offres d’emploi depuis GeoRezo
le vidage du cache
la génération des rapports envoyés par mail
la génération des fichiers GeoJSON pour les cartes interactives
le renouvellement du certificat SSL par Let’s Encrypt
En production, c’est cron qui est utilisé.
Paramètres de planification¶
Pour éditer les tâches planifiées lancées par cron avec nano :
export VISUAL=nano; crontab -e
Insérer :
# El Paso
@hourly cd /var/www/elgeopaso && /var/www/elgeopaso/.venv/bin/python /var/www/elgeopaso/manage.py rss2db
@daily cd /var/www/elgeopaso && /var/www/elgeopaso/.venv/bin/python /var/www/elgeopaso/manage.py clear_cache
30 23 * * 7 cd /var/www/elgeopaso && /var/www/elgeopaso/.venv/bin/python /var/www/elgeopaso/manage.py report
05 00 * * 7 cd /var/www/elgeopaso && /var/www/elgeopaso/.venv/bin/python /var/www/elgeopaso/manage.py map_builder
# Let's Encrypt
0 2 * * * root /bin/bash /home/geotribu/letsencrypt/scripts/cron.sh > /home/geotribu/log/cron/letsencrypt.log
Pour la syntaxe, le site crontab.guru est une bonne ressource.
elgeopaso¶
elgeopaso package¶
Subpackages¶
elgeopaso.accounts package¶
Subpackages¶
- class elgeopaso.accounts.migrations.0001_initial.Migration(name, app_label)[source]¶
Bases :
django.db.migrations.migration.Migration
- dependencies = [('auth', '__first__')]¶
- initial = True¶
- operations = [<CreateModel name='Subscription', fields=[('id', <django.db.models.fields.AutoField>), ('report_hour', <django.db.models.fields.BooleanField>), ('report_week', <django.db.models.fields.BooleanField>), ('user', <django.db.models.fields.related.OneToOneField>)], options={'verbose_name': 'Abonnements mail'}>]¶
Submodules¶
- class elgeopaso.accounts.models.Subscription(id, user, report_hour, report_week)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- report_hour¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- report_week¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- user¶
Accessor to the related object on the forward side of a one-to-one relation.
In the example:
class Restaurant(Model): place = OneToOneField(Place, related_name='restaurant')
Restaurant.place
is aForwardOneToOneDescriptor
instance.
- user_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
elgeopaso.api package¶
Submodules¶
- class elgeopaso.api.serializers.ContractSerializer(*args, **kwargs)[source]¶
Bases :
rest_framework.serializers.ModelSerializer
- class elgeopaso.api.serializers.JobSerializer(*args, **kwargs)[source]¶
Bases :
rest_framework.serializers.ModelSerializer
- class elgeopaso.api.serializers.OfferSerializer(*args, **kwargs)[source]¶
Bases :
rest_framework.serializers.ModelSerializer
- class elgeopaso.api.serializers.PlaceSerializer(*args, **kwargs)[source]¶
Bases :
rest_framework.serializers.ModelSerializer
Application URLs settings.
Learn more here:
- class elgeopaso.api.views.ContractViewSet(**kwargs)[source]¶
Bases :
rest_framework.viewsets.ReadOnlyModelViewSet
API endpoint that allows contracts types to be viewed or edited.
- basename = None¶
- description = None¶
- detail = None¶
- name = None¶
- queryset¶
- serializer_class¶
- suffix = None¶
- class elgeopaso.api.views.JobViewSet(**kwargs)[source]¶
Bases :
rest_framework.viewsets.ReadOnlyModelViewSet
API endpoint that allows contracts types to be viewed or edited.
- basename = None¶
- description = None¶
- detail = None¶
- name = None¶
- queryset¶
- serializer_class¶
- suffix = None¶
- class elgeopaso.api.views.OfferViewSet(**kwargs)[source]¶
Bases :
rest_framework.viewsets.ReadOnlyModelViewSet
API endpoint that allows offers to be viewed or edited.
- basename = None¶
- description = None¶
- detail = None¶
- name = None¶
- queryset¶
- serializer_class¶
- suffix = None¶
- class elgeopaso.api.views.PlaceVariationsViewSet(**kwargs)[source]¶
Bases :
rest_framework.viewsets.ReadOnlyModelViewSet
Places used as reference to parse raw offers.
- basename = None¶
- description = None¶
- detail = None¶
- name = None¶
- queryset¶
- serializer_class¶
alias de
elgeopaso.api.serializers.PlaceVariationsSerializer
- suffix = None¶
- class elgeopaso.api.views.PlaceViewSet(**kwargs)[source]¶
Bases :
rest_framework.viewsets.ModelViewSet
Places used as reference to parse raw offers.
- basename = None¶
- description = None¶
- detail = None¶
- name = None¶
- queryset¶
- serializer_class¶
- suffix = None¶
elgeopaso.cms package¶
Subpackages¶
- class elgeopaso.cms.migrations.0001_cms.Migration(name, app_label)[source]¶
Bases :
django.db.migrations.migration.Migration
- dependencies = [('auth', '__first__')]¶
- initial = True¶
- operations = [<CreateModel name='Article', fields=[('id', <django.db.models.fields.AutoField>), ('title', <django.db.models.fields.CharField>), ('content', <ckeditor.fields.RichTextField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>), ('author', <django.db.models.fields.related.ForeignKey>)], options={'verbose_name': 'Contenu éditorial', 'verbose_name_plural': 'Contenus éditoriaux', 'ordering': ['title'], 'get_latest_by': 'updated'}>, <CreateModel name='Category', fields=[('id', <django.db.models.fields.AutoField>), ('name', <django.db.models.fields.CharField>), ('description', <django.db.models.fields.TextField>), ('slug_name', <django.db.models.fields.SlugField>)], options={'verbose_name': 'Type de contenu', 'verbose_name_plural': 'Types de contenu', 'ordering': ['name']}>, <AddField model_name='article', name='category', field=<django.db.models.fields.related.ForeignKey>>, <AddField model_name='article', name='slug_title', field=<django.db.models.fields.SlugField>>, <AddField model_name='article', name='ext_url', field=<django.db.models.fields.URLField>>]¶
- replaces = [('cms', '0001_initial'), ('cms', '0002_auto_20180115_1329'), ('cms', '0003_auto_20180115_1415'), ('cms', '0004_auto_20180117_0846'), ('cms', '0005_article_ext_url')]¶
- class elgeopaso.cms.migrations.0003_auto_20180308_1427.Migration(name, app_label)[source]¶
Bases :
django.db.migrations.migration.Migration
- dependencies = [('cms', '0002_auto_20180308_1417')]¶
- operations = [<AddField model_name='article', name='published', field=<django.db.models.fields.BooleanField>>, <AlterField model_name='article', name='content', field=<ckeditor_uploader.fields.RichTextUploadingField>>]¶
- class elgeopaso.cms.migrations.0004_auto_20180628_2357.Migration(name, app_label)[source]¶
Bases :
django.db.migrations.migration.Migration
- dependencies = [('cms', '0003_auto_20180308_1427')]¶
- operations = [<AlterField model_name='article', name='author', field=<django.db.models.fields.related.ForeignKey>>, <AlterField model_name='article', name='category', field=<django.db.models.fields.related.ForeignKey>>]¶
Submodules¶
- class elgeopaso.cms.admin.ArticleAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- fieldsets = (('Métadonnées', {'fields': ('category', 'author')}), ('Titre', {'fields': ('title', 'slug_title')}), ('Contenu', {'classes': ('full-width',), 'fields': ('content',)}), ('Divers', {'fields': ('ext_url',)}), ('Publication', {'fields': ('published',)}))¶
- list_display = ('title', 'slug_title', 'category', 'created', 'updated')¶
- list_filter = ('category', 'author', 'published')¶
- property media¶
- ordering = ('created',)¶
- prepopulated_fields = {'slug_title': ('title',)}¶
- search_fields = ('title', 'content')¶
- class elgeopaso.cms.admin.CategoryAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- list_display = ('name', 'slug_name', 'description')¶
- list_filter = ('name',)¶
- property media¶
- ordering = ('name',)¶
- prepopulated_fields = {'slug_name': ('name',)}¶
- search_fields = ('name', 'description')¶
Application settings.
- class elgeopaso.cms.models.Article(id, author, category, title, slug_title, content, ext_url, published, created, updated)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- author¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- author_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- category¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- category_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- content¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- ext_url¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- published¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- property short_content¶
- slug_title¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- title¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.cms.models.Category(id, name, slug_name, description)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- article_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- description¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- slug_name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
Application URLs settings.
elgeopaso.jobs package¶
Subpackages¶
Content parser.
Title parser.
- class elgeopaso.jobs.analyzer.georezo.parsers.title.TitleParser(offer_id, input_title)[source]¶
Bases :
object
Parse title of offers published on GeoRezo to extract informations.
- parse_contract_type()[source]¶
Extraction of types of contracts: CDI, CDD, mission, volontariat, etc.
In theory, offer’s title is formatted to contain the type between []…
- Type renvoyé
Module in charge of analyzing raw offers from GeoRezo: extracting contract type, place, etc. from title and abstract.
Name: GeoRezo Jobs RSS Parser Purpose: Parse GeoRezo RSS Python: 3.7+
- class elgeopaso.jobs.crawlers.georezo_rss_parser.GeorezoRssParser(feed_base_url='https://georezo.net/extern.php?fid=10', feed_length_param='show', items_to_parse=50, user_agent='ElGeoPaso/DEV +https://elgeopaso.georezo.net/')[source]¶
Bases :
object
Handy module to parse GeoRezo job offers through RSS.
- Paramètres
feed_base_url (str) – URL to the feed. Defaults to: « https://georezo.net/extern.php?fid=10 » - optional
feed_length_param (str) – name of the URL parameter to specifiy the number of items. Defaults to: « show » - optional
items_to_parse (int) – number of items to request to the feed. Defaults to: 50 - optional
user_agent (str) – HTTP user-agent. Defaults to: « ElGeoPaso/DEV +https://elgeopaso.georezo.net/ » - optional
- CRAWLER_LATEST_METADATA = 'crawler_georezo_rss_latest.json'¶
- FEED_DATETIME_RAW_FORMAT = '%a, %d %b %Y %H:%M:%S %z'¶
- FEED_DATETIME_RAW_FORMAT_ARROW = 'ddd, D MMM YYYY HH:mm:ss Z'¶
- classmethod extract_offer_id_from_url(in_url)[source]¶
Parse input URL to extract RSS item ID = job offer ID.
- Paramètres
in_url (str) – input URL as string. In GeoRezo RSS, it’s: - in raw XML: “<guid isPermaLink= »true »>https://georezo.net/forum/viewtopic.php?pid=331081#p331081</guid>” - parsed by feedparser: entry.id = “https://georezo.net/forum/viewtopic.php?pid=331144#p331144”
- Renvoie
offer ID
- Type renvoyé
- classmethod load_previous_crawler_metadata(from_source='./last_id_georezo.txt')[source]¶
Retrieve last parsed item ID from specified source.
- Paramètres
from_source (str) – where to load the ID. Defaults to: « ./last_id_georezo.txt »
- Lève
NotImplementedError – [description]
ValueError – [description]
- Renvoie
dictionary with previous crawler execution metadata
- Type renvoyé
- parse_new_offers(ignore_encoding_errors=True, only_new_offers=True)[source]¶
Parse RSS feed, handle errors and filter on new offers.
- Paramètres
- Renvoie
list with offers whose identifier is superior to the latest parsed
- Type renvoyé
- save_parsing_metadata(feed_parsed, save_type='json')[source]¶
Dumps some metadata from parsed feed to track behavior and enforce future usage into a structured JSON file.
- Paramètres
feed_parsed (feedparser.FeedParserDict) – parsed feed
save_type (str) – type of save to perform. Defaults to: « json » - optional
- Renvoie
dictionary of saved data
- Type renvoyé
- Example
[ { "encoding": "ISO-8859-1", "entries_required": 50, "entries_total": 50, "feed_updated_converted": "2020-03-10 13:07:06+01:00", "feed_updated_parsed": [ 2020, 3, 10, 12, 7, 6, 1, 70, 0 ], "feed_updated_raw": "Tue, 10 Mar 2020 13:07:06 +0100", "latest_offer_id": 331132, "status": 200, "version": "rss20" } ]
Command used to import a CSV from a Georezo database export.
python manage.py csv2db --input-csv ./georezo/georezo_db_backup_2016-2017.csv
- class elgeopaso.jobs.management.commands.csv2db.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶
Bases :
django.core.management.base.BaseCommand
- args = '<foo bar ...>'¶
- create_parser(*args, **kwargs)[source]¶
Create and return the
ArgumentParser
which will be used to parse the arguments to this command.
- handle(*args, **options)[source]¶
The actual logic of the command. Subclasses must implement this method.
- help = 'Import CSV data into the project database'¶
- class elgeopaso.jobs.management.commands.map_builder.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶
Bases :
django.core.management.base.BaseCommand
- args = '<foo bar ...>'¶
- build_geojson_fr_departements(in_geojson)[source]¶
Parse input GeoJSON and update needed values to display maps.
- Paramètres
in_geojson (Path) – Path to the input GeoJSON
- help = '\n Commands to generate geojson files used for map visualization.\n '¶
- class elgeopaso.jobs.management.commands.reset_analisis.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶
Bases :
django.core.management.base.BaseCommand
- args = '<foo bar ...>'¶
- create_parser(*args, **kwargs)[source]¶
Create and return the
ArgumentParser
which will be used to parse the arguments to this command.
- help = 'Empty tables and launch GeorezoOfferAnalizer from the whole georezo_rss table.'¶
Custom Django management command to parse GeoRezo feed and launch analisis. See: https://docs.djangoproject.com/fr/2.2/howto/custom-management-commands/
- class elgeopaso.jobs.management.commands.rss2db.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶
Bases :
django.core.management.base.BaseCommand
Commands to manage offers sync and analisis.
Two main steps:
Crawl GeoRezo RSS to get new offers, analyze it and store into the database.
Relaunch offer analisis on offers which have been manually modified (through the admin)
- Paramètres
BaseCommand ([type]) – [description]
- Lève
ValueError – [description]
- Renvoie
[description]
- Type renvoyé
[type]
- add_arguments(parser)[source]¶
Add arguments to the CLI.
- Paramètres
parser (CommandParser) – command parser
- args = '<foo bar ...>'¶
- create_parser(*args, **kwargs)[source]¶
Super a command parser.
- Renvoie
[description]
- Type renvoyé
CommandParser
- handle(*args, **options)[source]¶
The actual logic of the command. Subclasses must implement this method.
- help = 'Commands to manage offers sync and analisis. 2 main steps:\n 1. Crawl GeoRezo RSS to get new offers, analyze it and store into\n the database.\n\n 2. Relaunch offer analisis on offers which have been manually\n modified (through the admin)'¶
- now = <Arrow [2022-12-12T14:22:50.866877+01:00]>¶
- class elgeopaso.jobs.migrations.0001_initial.Migration(name, app_label)[source]¶
Bases :
django.db.migrations.migration.Migration
- dependencies = []¶
- initial = True¶
- operations = [<CreateModel name='Contract', fields=[('abbrv', <django.db.models.fields.CharField>), ('name', <django.db.models.fields.CharField>), ('comment', <django.db.models.fields.TextField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>)], options={'verbose_name': 'Type de contrat', 'verbose_name_plural': 'Types de contrats', 'ordering': ['abbrv']}>, <CreateModel name='GeorezoRSS', fields=[('id_rss', <django.db.models.fields.IntegerField>), ('title', <django.db.models.fields.CharField>), ('content', <django.db.models.fields.TextField>), ('pub_date', <django.db.models.fields.DateTimeField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>), ('source', <django.db.models.fields.BooleanField>), ('to_update', <django.db.models.fields.BooleanField>)], options={'verbose_name_plural': "Offres d'emploi brutes issues du RSS de GeoRezo", 'db_table': 'georezo_rss', 'get_latest_by': 'pub_date', 'unique_together': {('id_rss', 'pub_date', 'source')}}>, <CreateModel name='JobPosition', fields=[('name', <django.db.models.fields.CharField>), ('comment', <django.db.models.fields.CharField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>)], options={'verbose_name': 'Métier', 'verbose_name_plural': 'Métiers', 'ordering': ['name']}>, <CreateModel name='Place', fields=[('name', <django.db.models.fields.CharField>), ('code', <django.db.models.fields.CharField>), ('scale', <django.db.models.fields.CharField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>)], options={'verbose_name': 'Lieu', 'verbose_name_plural': 'Lieux', 'ordering': ['code']}>, <CreateModel name='Source', fields=[('id', <django.db.models.fields.AutoField>), ('name', <django.db.models.fields.CharField>), ('url', <django.db.models.fields.URLField>), ('comment', <django.db.models.fields.TextField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>)], options={'verbose_name': "Source de l'offre", 'verbose_name_plural': 'Sources'}>, <CreateModel name='Technology', fields=[('id', <django.db.models.fields.AutoField>), ('name', <django.db.models.fields.CharField>), ('license', <django.db.models.fields.CharField>), ('type_soft', <django.db.models.fields.CharField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>)], options={'verbose_name': 'Technologie', 'verbose_name_plural': 'Technologies', 'ordering': ['name']}>, <CreateModel name='TechnologyVariations', fields=[('id', <django.db.models.fields.AutoField>), ('label', <django.db.models.fields.CharField>), ('name', <django.db.models.fields.related.ForeignKey>)], options={'verbose_name': 'Variante des technologies', 'verbose_name_plural': 'Variantes des technologies'}>, <CreateModel name='PlaceVariations', fields=[('id', <django.db.models.fields.AutoField>), ('label', <django.db.models.fields.CharField>), ('name', <django.db.models.fields.related.ForeignKey>)], options={'verbose_name': 'Variante de lieu', 'verbose_name_plural': 'Variantes des lieux'}>, <CreateModel name='Offer', fields=[('id', <django.db.models.fields.AutoField>), ('id_rss', <django.db.models.fields.IntegerField>), ('title', <django.db.models.fields.CharField>), ('content', <django.db.models.fields.TextField>), ('pub_date', <django.db.models.fields.DateTimeField>), ('week', <django.db.models.fields.IntegerField>), ('created', <django.db.models.fields.DateTimeField>), ('updated', <django.db.models.fields.DateTimeField>), ('contract', <django.db.models.fields.related.ForeignKey>), ('jobs_positions', <django.db.models.fields.related.ManyToManyField>), ('place', <django.db.models.fields.related.ForeignKey>), ('raw_offer', <django.db.models.fields.related.OneToOneField>), ('source', <django.db.models.fields.related.ForeignKey>), ('technologies', <django.db.models.fields.related.ManyToManyField>)], options={'verbose_name': "Offre d'emploi", 'verbose_name_plural': "Offres d'emploi", 'ordering': ['id_rss'], 'get_latest_by': 'pub_date'}>, <CreateModel name='JobPositionVariations', fields=[('id', <django.db.models.fields.AutoField>), ('label', <django.db.models.fields.CharField>), ('name', <django.db.models.fields.related.ForeignKey>)], options={'verbose_name': 'Variante de métier', 'verbose_name_plural': 'Variantes des métiers'}>, <CreateModel name='ContractVariations', fields=[('id', <django.db.models.fields.AutoField>), ('label', <django.db.models.fields.CharField>), ('name', <django.db.models.fields.related.ForeignKey>)], options={'verbose_name': 'Variante du type de contrat', 'verbose_name_plural': 'Variantes des types de contrats'}>]¶
- class elgeopaso.jobs.migrations.0003_variations_unique.Migration(name, app_label)[source]¶
Bases :
django.db.migrations.migration.Migration
- dependencies = [('jobs', '0002_remove_offer_week')]¶
- operations = [<AlterField model_name='contractvariations', name='label', field=<django.db.models.fields.CharField>>, <AlterField model_name='jobpositionvariations', name='label', field=<django.db.models.fields.CharField>>, <AlterField model_name='placevariations', name='label', field=<django.db.models.fields.CharField>>, <AlterField model_name='technologyvariations', name='label', field=<django.db.models.fields.CharField>>]¶
Submodules¶
Application in administration panel.
- class elgeopaso.jobs.admin.ContractVariationsAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- list_display = ('label', 'name')¶
- list_filter = ('name',)¶
- property media¶
- model¶
- search_fields = ('label',)¶
- class elgeopaso.jobs.admin.ContractVariationsInline(parent_model, admin_site)[source]¶
Bases :
django.contrib.admin.options.TabularInline
- list_display = ('name', 'label')¶
- property media¶
- model¶
- class elgeopaso.jobs.admin.ContractsAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- inlines = (<class 'elgeopaso.jobs.admin.ContractVariationsInline'>,)¶
- list_display = ('abbrv', 'name', 'comment')¶
- list_filter = ('abbrv',)¶
- property media¶
- ordering = ('abbrv',)¶
- readonly_fields = ('created', 'updated')¶
- search_fields = ('name', 'abbrv')¶
- class elgeopaso.jobs.admin.GeorezoRSSAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- actions = [<function GeorezoRSSAdmin.offers_to_update>]¶
- date_hierarchy = 'pub_date'¶
- fieldsets = (('Modifier', {'fields': ('title', 'content', 'to_update')}), ('Date', {'fields': ('pub_date', 'created', 'updated')}), ('Référence', {'fields': ('id_rss',)}), ('Autres', {'fields': ('source', 'show_clean_offer')}))¶
- formfield_overrides = {<class 'django.db.models.fields.CharField'>: {'widget': <django.forms.widgets.TextInput object>}, <class 'django.db.models.fields.TextField'>: {'widget': <django.forms.widgets.Textarea object>}}¶
- list_display = ('id_rss', 'title', 'short_content', 'pub_date', 'created', 'updated')¶
- list_display_links = ('id_rss', 'title')¶
- list_filter = ('pub_date', 'created', 'updated', 'to_update')¶
- property media¶
- readonly_fields = ('created', 'id_rss', 'pub_date', 'source', 'updated', 'show_clean_offer')¶
- search_fields = ('title', 'content')¶
- class elgeopaso.jobs.admin.JobPositionAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- inlines = (<class 'elgeopaso.jobs.admin.JobPositionVariationsInline'>,)¶
- list_display = ('name', 'comment')¶
- list_filter = ('name',)¶
- property media¶
- readonly_fields = ('created', 'updated')¶
- search_fields = ('name',)¶
- class elgeopaso.jobs.admin.JobPositionVariationsAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- list_display = ('label', 'name')¶
- list_filter = ('name',)¶
- property media¶
- model¶
alias de
elgeopaso.jobs.models.JobPosition
- search_fields = ('label',)¶
- class elgeopaso.jobs.admin.JobPositionVariationsInline(parent_model, admin_site)[source]¶
Bases :
django.contrib.admin.options.TabularInline
- list_display = ('name', 'label')¶
- property media¶
- model¶
- class elgeopaso.jobs.admin.OfferAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- date_hierarchy = 'pub_date'¶
- fieldsets = (('Contenu', {'fields': ('title', 'content')}), ('Date', {'fields': ('pub_date', 'yearweek', 'created', 'updated')}), ('Informations extraites', {'fields': ('contract', 'technologies', 'place', 'jobs_positions')}), ('Autres', {'fields': ('show_raw_offer', 'source')}))¶
- list_display = ('id_rss', 'title', 'short_content', 'contract', 'place', 'pub_date')¶
- list_filter = ('raw_offer__to_update', 'pub_date', 'contract', 'technologies', 'place')¶
- property media¶
- ordering = ('-pub_date',)¶
- readonly_fields = ('content', 'contract', 'created', 'jobs_positions', 'place', 'pub_date', 'show_raw_offer', 'source', 'technologies', 'title', 'updated', 'yearweek')¶
- search_fields = ('title', 'content')¶
- class elgeopaso.jobs.admin.PlaceAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- inlines = (<class 'elgeopaso.jobs.admin.PlaceVariationsInline'>,)¶
- list_display = ('name', 'code', 'scale')¶
- list_filter = ('scale',)¶
- property media¶
- ordering = ('code',)¶
- readonly_fields = ('created', 'updated')¶
- search_fields = ('name', 'code')¶
- class elgeopaso.jobs.admin.PlaceVariationsAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- list_display = ('label', 'name')¶
- list_filter = ('name',)¶
- property media¶
- model¶
- search_fields = ('label',)¶
- class elgeopaso.jobs.admin.PlaceVariationsInline(parent_model, admin_site)[source]¶
Bases :
django.contrib.admin.options.TabularInline
- list_display = ('name', 'label')¶
- property media¶
- model¶
- class elgeopaso.jobs.admin.SourcesAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- list_display = ('name', 'url', 'comment')¶
- list_filter = ('name',)¶
- property media¶
- ordering = ('name',)¶
- readonly_fields = ('created', 'updated')¶
- class elgeopaso.jobs.admin.TechnoVariationsInline(parent_model, admin_site)[source]¶
Bases :
django.contrib.admin.options.TabularInline
- list_display = ('name', 'label')¶
- property media¶
- model¶
- class elgeopaso.jobs.admin.TechnologyAdmin(model, admin_site)[source]¶
Bases :
django.contrib.admin.options.ModelAdmin
- inlines = (<class 'elgeopaso.jobs.admin.TechnoVariationsInline'>,)¶
- list_display = ('name', 'license', 'type_soft')¶
- list_filter = ('name', 'license', 'type_soft')¶
- property media¶
- ordering = ('name',)¶
- readonly_fields = ('created', 'updated')¶
- search_fields = ('name',)¶
Application settings.
Filters.
Learn more here: https://django-filter.readthedocs.io/en/master/
- class elgeopaso.jobs.filters.OfferFilter(data=None, queryset=None, *, request=None, prefix=None)[source]¶
Bases :
django_filters.rest_framework.filterset.FilterSet
Filters related to search within offers.
- class Meta[source]¶
Bases :
object
- fields = ['contract', 'place', 'technologies', 'pub_date', 'content', 'title', 'raw_offer__to_update']¶
- model¶
alias de
elgeopaso.jobs.models.Offer
- base_filters = {'content': <django_filters.filters.CharFilter object>, 'contract': <django_filters.filters.ModelChoiceFilter object>, 'date': <django_filters.filters.DateFromToRangeFilter object>, 'place': <django_filters.filters.ModelChoiceFilter object>, 'pub_date': <django_filters.filters.IsoDateTimeFilter object>, 'raw_offer__to_update': <django_filters.rest_framework.filters.BooleanFilter object>, 'technologies': <django_filters.filters.ModelMultipleChoiceFilter object>, 'title': <django_filters.filters.CharFilter object>}¶
- declared_filters = {'content': <django_filters.filters.CharFilter object>, 'date': <django_filters.filters.DateFromToRangeFilter object>, 'raw_offer__to_update': <django_filters.rest_framework.filters.BooleanFilter object>, 'title': <django_filters.filters.CharFilter object>}¶
Application database models.
- class elgeopaso.jobs.models.Contract(abbrv, name, comment, created, updated)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- abbrv¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- comment¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- contractvariations_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- offer_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.jobs.models.ContractVariations(id, label, name)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- ND = 'UNDEFINED'¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- label¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- name_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- class elgeopaso.jobs.models.GeorezoRSS(*args, **kwargs)[source]¶
Bases :
django.db.models.base.Model
GeoRezo RAW offers.
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- clean_offer¶
Accessor to the related object on the reverse side of a one-to-one relation.
In the example:
class Restaurant(Model): place = OneToOneField(Place, related_name='restaurant')
Place.restaurant
is aReverseOneToOneDescriptor
instance.
- content¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- id_rss¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- property offre_traitee¶
- pub_date¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- property short_content¶
- source¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- title¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- to_update¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.jobs.models.JobPosition(name, comment, created, updated)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- comment¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- jobpositionvariations_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- offer_set¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.jobs.models.JobPositionVariations(id, label, name)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- ND = 'UNDEFINED'¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- label¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- name_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- class elgeopaso.jobs.models.Offer(id, created, updated, id_rss, raw_offer, source, title, content, pub_date, contract, place)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- content¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- contract¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- contract_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- id_rss¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- jobs_positions¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- objects = <django.db.models.manager.Manager object>¶
- place¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- place_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- pub_date¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- raw_offer¶
Accessor to the related object on the forward side of a one-to-one relation.
In the example:
class Restaurant(Model): place = OneToOneField(Place, related_name='restaurant')
Restaurant.place
is aForwardOneToOneDescriptor
instance.
- raw_offer_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- property short_content: str¶
Return the first 300 characters of the offer summary (content).
- Renvoie
300 first characters of offer.content
- Type renvoyé
- source¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- source_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- technologies¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- title¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- property yearweek: str¶
Return the week from the publication date (monday as first day). See: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
- Renvoie
year and week (format: “YYYYWW”)
- Type renvoyé
- class elgeopaso.jobs.models.Place(name, code, scale, created, updated)[source]¶
Bases :
django.db.models.base.Model
- COUNTRY = 'COUNTRY'¶
- DPT = 'DEPARTEMENT'¶
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- SCALES = (('DEPARTEMENT', 'Département français'), ('TOM', 'Territoire français'), ('COUNTRY', 'Pays'), ('UNDEFINED', 'Indéfini'))¶
- TOM = 'TOM'¶
- UNDEFINED = 'UNDEFINED'¶
- code¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- get_scale_display(*, field=<django.db.models.fields.CharField: scale>)¶
- name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- offer_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- placevariations_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- scale¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.jobs.models.PlaceVariations(id, label, name)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- ND = 'UNDEFINED'¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- label¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- name_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- class elgeopaso.jobs.models.Source(id, name, url, comment, created, updated)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- comment¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- offer_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- url¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.jobs.models.Technology(id, name, license, type_soft, created, updated)[source]¶
Bases :
django.db.models.base.Model
- DEV = 'LANGUAGE'¶
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- ND = 'UNDEFINED'¶
- OSS = 'OSS'¶
- PROPRIETARY = 'PROPRIETARY'¶
- SOFTWARE = 'SOFTWARE'¶
- TYPE_LICENSE = (('OSS', 'Libre'), ('PROPRIETARY', 'Propriétaire'), ('UNDEFINED', 'Indéfini'))¶
- TYPE_SOFT = (('LANGUAGE', 'Language de programmation'), ('SOFTWARE', 'Logiciel'), ('UNDEFINED', 'Indéfini'))¶
- created¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- get_license_display(*, field=<django.db.models.fields.CharField: license>)¶
- get_next_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=True, **kwargs)¶
- get_next_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=True, **kwargs)¶
- get_previous_by_created(*, field=<django.db.models.fields.DateTimeField: created>, is_next=False, **kwargs)¶
- get_previous_by_updated(*, field=<django.db.models.fields.DateTimeField: updated>, is_next=False, **kwargs)¶
- get_type_soft_display(*, field=<django.db.models.fields.CharField: type_soft>)¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- license¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
- offer_set¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- technologyvariations_set¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- type_soft¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- updated¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- class elgeopaso.jobs.models.TechnologyVariations(id, label, name)[source]¶
Bases :
django.db.models.base.Model
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶
- ND = 'UNDEFINED'¶
- id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- label¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- name¶
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- name_id¶
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>¶
Application URLs settings.
Learn more here: https://docs.djangoproject.com/fr/2.2/topics/http/urls/
Application views.
Learn more here: https://docs.djangoproject.com/fr/2.2/topics/http/views/
- elgeopaso.jobs.views.calc_compare_week_by_year()[source]¶
Calculate variation in percentage between number of published offers this week and those published the same week last year.
- Renvoie
variation percentage (can be negative)
- Type renvoyé
- elgeopaso.jobs.views.get_contracts_by_technos(request)[source]¶
Count offers by contracts types and software passed.
- elgeopaso.jobs.views.get_countries_top5(request)[source]¶
Count offers by countries other than France.
- elgeopaso.jobs.views.get_fr_dpts_top10(request)[source]¶
Count offers by French departments, including DOM TOM.
- elgeopaso.jobs.views.get_offers_by_period(request)[source]¶
Get the number of offers per period (year, month, week). month and week TO DO Called via AJAX
- elgeopaso.jobs.views.get_types_contract_by_period(request)[source]¶
Get the number types of contract per period (year, month, week). month and week TO DO Called via AJAX
- elgeopaso.jobs.views.stats_contrats(request)[source]¶
Renders statistics by contracts type on contracts page.
elgeopaso.settings package¶
Submodules¶
Base settings to build other settings files upon.
Settings built upon base for local development.
Settings built upon base for production.
Settings built upon base for running tests.
elgeopaso.utils package¶
Submodules¶
Project utilities.
Tool.
- class elgeopaso.utils.text_toolbelt.TextToolbelt[source]¶
Bases :
object
Tools to manipulate text: tokenize, clean, etc.
- classmethod remove_html_markups(html_text, cleaner='bs-lxml')[source]¶
Very basic cleaner for HTML markups.
- classmethod tokenize(input_content)[source]¶
Extraction of words mentioned into the offers. The goal is to perform a semantic analysis. Mainly based on NLTK: https://www.nltk.org/.
Submodules¶
elgeopaso.urls module¶
Project URLs settings.
Learn more here: https://docs.djangoproject.com/fr/2.2/topics/http/urls/
To add a new path:
# first import the app
import jobs
# then add the new path:
path('jobs/', jobs.urls, name="Jobs offers")
elgeopaso.wsgi module¶
WSGI config for elgeopaso project.
It exposes the WSGI callable as a module-level variable named application
.
For more information on this file, see https://docs.djangoproject.com/en/2.1/howto/deployment/wsgi/
Contributing Guidelines¶
First off, thanks for considering to contribute to this project!
These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request.
Git flow¶
Creating branches¶
A new branch should be always created from the main branch master
, except in certain cases which require to be justified.
Naming pattern¶
The pattern is: {category}/{slugified-description}
. Where:
category
is the type of work. Can be:feature
,bug
,tooling
,refactor
,test
,chore
,release
,hotfix
,docs
,ci
,deploy
orrelease-candidate
.slugified-description
is the description of the work, slugified.
Example: feature/improve-encoding
Merge Requests workflow¶
Rules¶
the code coverage must be increased or equal, never decreased. If you write some new code, write new tests.
the code must run without any error on the CI.
Using the draft status¶
A draft Merge Request is a merge request that is not ready to be merged but the code is published to allow other team mates follow the development.
Comments are welcome but they must be global, about the conception, not the details (wait for the WIP status removal).
Code Style¶
Make sure your code roughly follows PEP-8 and keeps things consistent with the rest of the code. Related tools:
docstrings: sphinx-style is used to write technical documentation.
formatting: black is used to automatically format the code without debate.
sorted imports: isort is used to sort imports
static analisis (linter): flake8 is used to catch some dizziness and keep the source code healthy.
Git hooks¶
We use git hooks through pre-commit to enforce and automatically check some “rules”. Please install it before to push any commit: pre-commit install
.
See the relevant configuration file: .pre-commit-config.yaml
.
Développement¶
Prérequis¶
OS¶
Windows¶
Version 10 minimum
activer le sous-système Linux (WSL) :
installer les paquest souhaités, par exemple openssl :
sudo apt update && sudo apt upgrade && sudo apt install openssl libssl-dev
l’utilisation du nouveau Terminal est fortement recommandée
Linux¶
Distributions compatibles :
Debian
Ubuntu 16.04 ou 18.04
Logiciels¶
Docker (Engine et Compose a minima)
Python 3.7 64 bits
PostgreSQL 12
Lancer en local¶
Commandes lancées sous Windows 10 avec WSL activé.
Installation¶
# create virtual env
# on Linux: python3.7 -m venv .venv
py -3.7 -m venv .venv
# enter into
.\.venv\Scripts\activate
# on Linux: source .venv/bin/activate
# upgrade install tooling
python -m pip install --upgrade pip
# install requirements
python -m pip install -U -r .\requirements\local.txt
# on Linux: python -m pip install -U -r requirements/local.txt
# download NLTK packages - please refer to `ntlk.txt`
python -m nltk.downloader punkt stopwords
# optionally, install pre-commit git-hooks
pre-commit install
Configuration¶
Renommer le fichier example.env
en .env
et le compléter. Pour info, il est possible de générer une clé Django Secret en passant par OpenSSL sur WSl : wsl -- openssl rand -base64 64
(copier/coller dans le fichier .env
).
Base de données¶
Initialiser la base de données :
# apply migrations to database
python manage.py migrate
# create the super user
python manage.py createsuperuser
Pour charger des enregistrements de base (technologies, métiers, types de contrats, etc.), utiliser loaddata
:
voir les commandes dans le fichier de déploiement :
.deploy/release-tasks.sh
Lancer¶
# launch development web server
python .\manage.py runserver
# on Linux: python manage.py runserver
# alternatively, use the enhanced command from django-extensions
python .\manage.py runserver_plus
Ouvrir le navigateur à l’adresse indiquée dans le terminal. Par défaut : http://localhost:8000/.
Avec HTTPS¶
Pour développer au mieux, il est préférable de servir l’application en HTTPS. C’est possible via runserver_plus
de Django Extensions (voir la documentation).
# create folder where to store certificate and key
mkdir certs
# generate SSL certificate and key
wsl -- openssl req -nodes -new -x509 -days 365 -keyout certs/serverKey.key -out certs/serverCert.cert
# on Linux: remove 'wsl -- '
# alternatively, use the enhanced command from django-extensions
python .\manage.py runserver_plus --cert-file .\certs\serverCert.cert --key-file .\certs\serverKey.key
Ouvrir le navigateur à l’adresse indiquée dans le terminal. Par défaut : https://localhost:8000/. Accepter le risque lié aux certificats auto-signés.
Docker¶
Prérequis Docker¶
Docker 2.2+ ou dans le détail :
Docker Engine : 19.03+
Docker Compose 1.25+
Configuration Docker¶
Renommer fichier example.env
en docker.env
et compléter :
# DEVELOPMENT
DJANGO_DEBUG=1
USE_DOCKER=1
# GLOBAL
DJANGO_ADMIN_URL="admin"
DJANGO_PROJECT_FOLDER="elgeopaso"
DJANGO_SECURE_SSL_REDIRECT=0
DJANGO_SETTINGS_MODULE="elgeopaso.settings.production"
WEB_CONCURRENCY=4
# SECURITY
DJANGO_SECRET_KEY="change_me_with_generated_key"
DJANGO_ALLOWED_HOSTS="localhost, 0.0.0.0, 127.0.0.1"
# EMAIL
REPORT_RECIPIENTS="elpaso@georezo.net,"
SMTP_USER="elpaso@georezo.net"
SMTP_PSWD=
# PostgreSQL
# ------------------------------------------------------------------------------
POSTGRES_HOST=database
POSTGRES_PORT=5432
POSTGRES_DB=elgeopaso-dev
POSTGRES_USER=elgeopaso
POSTGRES_PASSWORD=elgeopaso
DATABASE_URL="postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}"
Usage¶
Lancer l’application¶
docker-compose -f docker-compose.dev.yml up -d
Ouvrir le navigateur sur http://localhost:8000.
Données et analyses de base¶
Après que l’application soit lancée :
docker-compose -f docker-compose.dev.yml run --rm webapp sh .deploy/release-tasks.sh
Documentation¶
Prérequis¶
Installer les dépendances additionnelles :
python -m pip install -U -r requirements/documentation.txt
Générer la documentation¶
sphinx-build -b html docs docs/_build
Ouvrir le fichier docs/_build/index.html
dans un navigateur.
Rédiger avec un rendu live¶
sphinx-autobuild -b html docs/ docs/_build
Ouvrir http://localhost:8000.