Skip to content

Latest commit

 

History

History
105 lines (88 loc) · 3 KB

README.md

File metadata and controls

105 lines (88 loc) · 3 KB

Synthetic Data for Jane Lila

This repository houses synthetic data specifically curated for Jane Lila. The data is designed to support testing and development of a Personal AI Assistant, enabling robust evaluation of PAI capabilities, personalized interactions, and overall assistant functionality. The synthetic dataset includes emails, messages, and other relevant text examples, ensuring comprehensive testing for tasks such as contextual understanding, response generation, and user intent recognition.


Categories of Personal Data

Here is the list of data types that are considered personal data which can be simulated.

  1. Basic Personal Information: Full name, Address (home or work), Phone number, Email address, Date of birth, Gender, Nationality, Marital status.

  2. Identification Numbers: Social Security Number (SSN), Passport number, Driver's license number, National identification card number, Tax identification number, Employee or student ID numbers.

  3. Financial Information: Bank account details, Credit or debit card numbers, Loan or debt information, Transaction history, Credit score.

  4. Online Identifiers: IP address, Device IDs (e.g., MAC address, IMEI), Browser cookies, Login credentials (e.g., usernames), Social media handles, Tracking data (e.g., geolocation data).

  5. Health Information (Sensitive Personal Data): Medical history, Diagnoses and test results, Health insurance details, Genetic data, Biometric data (e.g., fingerprints, facial recognition data), Disability status.

  6. Employment Information: Job title and position, Employment history, Salary and benefits details, Performance evaluations, Workplace attendance records.

  7. Education Information: Academic records, School or university details, Certificates and diplomas, Grades or performance reports.

  8. Behavioral Data: Purchase history, Browsing history, Clickstream data, Search history, Communication patterns (e.g., call logs, messaging activity).

  9. Sensitive Personal Information: Racial or ethnic origin, Political opinions, Religious or philosophical beliefs, Trade union membership, Sexual orientation or preferences.

  10. Location Data: Real-time GPS location, Historical location data, Geotagged photos or videos.

  11. Biometric Data: Fingerprints, Facial recognition profiles, Voice recognition patterns, Handwriting samples.

  12. Family and Relationships: Names and details of relatives, Emergency contact information, Information about dependents or children,

  13. Communication Records: Emails, Chat or messaging logs, Recorded phone calls, Voice notes.

  14. Miscellaneous Data: Photos and videos, Professional certifications, Hobbies and interests, Memberships in clubs or organizations, Volunteer activity records.