cvExtractor

What is this sorcery?

Long gone will be the days where the manual "copy/paste" actions will be needed to transfer your information to a new CV Template thanks to the cvExtractor. This python script will identify the various components from the TalentLink CV (important, as it only works with talentLink profiles for now) and transform them into a JSON file under the FRESH schema which can then be ran through hackmyresume to apply the data to a different template. Stay tuned for updates on completion of Europass template...

Like the idea? Let's see how it works!

cvExtractor is designed to be simple and easy to use. Below is a step by step guide that will help with the installation and running of the program.

Step 1: Download

Download the contents of the git repository to desired location on your computer. (The shorter the path, the easier)

Also download your CV from talentlink as a .txt file, and save it in a folder called CVs inside the same folder containing the newly downloaded cvExtractor repository.

Step 2: Running the script

Once you've set up the previous step, the script should be able to run and generate an output JSON file inside the folder under the name results_filename.json where filename is the name of the .txt file containing your talentlink CV

Step 3: Applying extracted data to new templates

In order to manage this final stage, a few preparations need to be taken:

1.Download and install the latest version of Node.js

2.Now in terminal, you can install hackmyresume with the following command:

npm install hackmyresume -g

once everything is correctly installed, you can proceed to the next step.

When you are ready to generate your resume, you will need to reference the location of the folder as you installed it:

hackmyresume build results_filename.json TO out/resume.all -t positive

and you should see a terminal that looks somewhat like this:

If the above image looks familiar then congratulations! You have successfully generated a new resume in various formats (html, doc, json, yml) and will find these in your folder under a folder called 'out'.

Personal Reflection on Project

Progress

The project has reached a stage where it is a stable prototype. It is nowhere near a final product, but provides a strong foothold for the following measures that need to be taken when the time arrives.

What is left to do?

There is a long road ahead, but hopefully with the Activity Log (which access to can be requested for) the path will be more structured. The biggest issue (in my opinion) is that so far the python script is written with only the TalentLink profiles in mind, and would not extract the correct information with any other. Ideally, the code should be adapted to work universally no matter the source of the CV. In order for this to be achievable, some of the methods used currently become redundant (as they follow pattern matching) and would require methods along the lines of NLP. Furthermore, there are some bugs that need to be squashed (more detailed information on the Activity Log) which could be done with some more expert python knowledge and time. Lastly, the adaptation of the Europass template to match the JSON file output is still in the works and would be ideal for the final presentation of the product.

Suggestions for future developers

For now, this program is a good start but needs some adjustments. One thing to keep in mind is the dynamic and non standardized nature of CVs, as the ideal program should work with any CV (this also allows the product to be better commercialized) Here are my thoughts on how to approach this:

scrap the methods of pattern matching and replace them with word recognition on particular key words. Although this will reduce the accuracy of correct identification, this could allow more CVs to be parsed through the script and not just TalentLink layout.
perhaps look into HTML scraping (if the CV is online) although this may be a bit difficult considering TalentLink can be a bit temperamental
reduce the use of dictionaries and sets as these cannot be ordered and can produce some errors/bugs when translating to a new template (current example is with the name matching)
set an encoding that is more general to take away error rate/script failure rate. Also to ensure that special characters like umlauts are printed correctly

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.idea		.idea
.gitignore		.gitignore
Languages.txt		Languages.txt
OGcvExtractor.py		OGcvExtractor.py
README.md		README.md
cvExtractor.py		cvExtractor.py
locations.txt		locations.txt
names.txt		names.txt
roles.txt		roles.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cvExtractor

What is this sorcery?

Like the idea? Let's see how it works!

Step 1: Download

Step 2: Running the script

Step 3: Applying extracted data to new templates

Personal Reflection on Project

Progress

What is left to do?

Suggestions for future developers

About

Releases

Packages

Languages

pwc-technology-be/cvExtractor

Folders and files

Latest commit

History

Repository files navigation

cvExtractor

What is this sorcery?

Like the idea? Let's see how it works!

Step 1: Download

Step 2: Running the script

Step 3: Applying extracted data to new templates

Personal Reflection on Project

Progress

What is left to do?

Suggestions for future developers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages