Aleph filter¶
This module is used to skip Publications, which are already in Aleph.
Note
The module is using fuzzy lookup, see name_to_vector() and compare_names().
- harvester.filters.aleph_filter.name_to_vector(name)[source]¶
Convert name to the ASCII vector.
Example
>>> name_to_vector("ing. Franta Putšálek") ['putsalek', 'franta', 'ing']
Parameters: name (str) – Name which will be vectorized. Returns: Vector created from name. Return type: list
- harvester.filters.aleph_filter.compare_names(first, second)[source]¶
Compare two names in complicated, but more error prone way.
Algorithm is using vector comparison.
Example
>>> compare_names("Franta Putšálek", "ing. Franta Putšálek") 100.0 >>> compare_names("F. Putšálek", "ing. Franta Putšálek") 50.0
Parameters: Returns: Percentage of the similarity.
Return type: float
- harvester.filters.aleph_filter.filter_publication(publication, cmp_authors=True)[source]¶
Filter publications based at data from Aleph.
Parameters: publication (obj) – Publication instance. Returns: None if the publication was found in Aleph or publication if not. Return type: obj/None