Skip to content

Script that collects and generates a JSON file of your CV data to parse into new CV template

Notifications You must be signed in to change notification settings

pwc-technology-be/cvExtractor

Repository files navigation

cvExtractor

meme

What is this sorcery?

Long gone will be the days where the manual "copy/paste" actions will be needed to transfer your information to a new CV Template thanks to the cvExtractor. This python script will identify the various components from the TalentLink CV (important, as it only works with talentLink profiles for now) and transform them into a JSON file under the FRESH schema which can then be ran through hackmyresume to apply the data to a different template. Stay tuned for updates on completion of Europass template...

Like the idea? Let's see how it works!

cvExtractor is designed to be simple and easy to use. Below is a step by step guide that will help with the installation and running of the program.

Step 1: Download

Download the contents of the git repository to desired location on your computer. (The shorter the path, the easier)

Also download your CV from talentlink as a .txt file, and save it in a folder called CVs inside the same folder containing the newly downloaded cvExtractor repository.

Capture

Step 2: Running the script

Once you've set up the previous step, the script should be able to run and generate an output JSON file inside the folder under the name results_filename.json where filename is the name of the .txt file containing your talentlink CV

Step 3: Applying extracted data to new templates

In order to manage this final stage, a few preparations need to be taken:

1.Download and install the latest version of Node.js

2.Now in terminal, you can install hackmyresume with the following command:

npm install hackmyresume -g

once everything is correctly installed, you can proceed to the next step.

When you are ready to generate your resume, you will need to reference the location of the folder as you installed it:

hackmyresume build results_filename.json TO out/resume.all -t positive

and you should see a terminal that looks somewhat like this:

snapshot1

If the above image looks familiar then congratulations! You have successfully generated a new resume in various formats (html, doc, json, yml) and will find these in your folder under a folder called 'out'.

Personal Reflection on Project

Progress

The project has reached a stage where it is a stable prototype. It is nowhere near a final product, but provides a strong foothold for the following measures that need to be taken when the time arrives.

What is left to do?

There is a long road ahead, but hopefully with the Activity Log (which access to can be requested for) the path will be more structured. The biggest issue (in my opinion) is that so far the python script is written with only the TalentLink profiles in mind, and would not extract the correct information with any other. Ideally, the code should be adapted to work universally no matter the source of the CV. In order for this to be achievable, some of the methods used currently become redundant (as they follow pattern matching) and would require methods along the lines of NLP. Furthermore, there are some bugs that need to be squashed (more detailed information on the Activity Log) which could be done with some more expert python knowledge and time. Lastly, the adaptation of the Europass template to match the JSON file output is still in the works and would be ideal for the final presentation of the product.

Suggestions for future developers

For now, this program is a good start but needs some adjustments. One thing to keep in mind is the dynamic and non standardized nature of CVs, as the ideal program should work with any CV (this also allows the product to be better commercialized) Here are my thoughts on how to approach this:

  • scrap the methods of pattern matching and replace them with word recognition on particular key words. Although this will reduce the accuracy of correct identification, this could allow more CVs to be parsed through the script and not just TalentLink layout.
  • perhaps look into HTML scraping (if the CV is online) although this may be a bit difficult considering TalentLink can be a bit temperamental
  • reduce the use of dictionaries and sets as these cannot be ordered and can produce some errors/bugs when translating to a new template (current example is with the name matching)
  • set an encoding that is more general to take away error rate/script failure rate. Also to ensure that special characters like umlauts are printed correctly

About

Script that collects and generates a JSON file of your CV data to parse into new CV template

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages