What ChatGPT Knows about You: OpenAI’s Journey Towards Data Privacy | by Andrea Valenzuela

[ad_1]

Companies allowing their users to ask for their personal data make them comply with the aforementioned GDPR regulation. Nevertheless, there is a catch: the file format can make the data unreadable for most of the population. In this case, we got both html and json files. While html can be read directly, json files can be more difficult to interpret. I personally think that new regulations should also enforce a readable format of the data. But for the time being…

Let’s explore the files one by one to get the most out of this new feature!

The first file is chat.html which contains my entire chat history with ChatGPT. Conversations are stored with their corresponding title. The user’s questions and ChatGPT’s answers are labeled as assistantand user, respectively.

If you have ever trained an AI model yourself, this labeling system will sound familiar to you.

Let’s observe a sample conversation from my history:

Self-made screenshot from my ChatGPT history. The conversation title is highlighted in blue. User/Assistant labels are highlighted in red and green, respectively.

Have you ever seen the thumbs-up, thumbs-down icons (👍👎) next to any ChatGPT answer?

This information is seen by ChatGPT as the feedback for a given answer, which will then help in the chatbot training.

This information is stored in the message_feedback.json file containing any feedback you provided to ChatGPT using the thumbs icons. Information is stored in the following format:

["message_id": <MESSAGE ID>, "conversation_id": <CONVERSATION ID>, "user_id": <USER ID>, "rating": "thumbsDown", "content": "\"tags\": [\"not-helpful\"]"]

The thumbsDown rating accounts for wrongly-generated answers while the thumbsUp accounts for the correctly-generated ones.

There is also a file (user.json) containing the following personal data from the user:

"id": <USER ID>, "email": <USER EMAIL>, "chatgpt_plus_user": [true

Some platforms are known for creating a model of the user based on their usage of the platform. For example, if the Google searches of a user are mostly about programming, Google is likely to infer that the user is a programmer and use this information to show personalized advertisements.

ChatGPT could do the same with the information from the conversations, but they are currently obliged to include this inferred information in the exported data.

⚠️ FYI, One can access What Google knows about them from Gmail by clicking on Account >> Data & Privacy >> Personalized Ads >> My Ad Center.

There is another file containing the conversation history, and also including some metadata. This file is named conversations.json and includes information such as the creation time, several identifiers, and the model behind ChatGPT, among others.

⚠️ The metadata provides information about the main data. It may include information such as the origin of the data, its meaning, its location, its ownership, and its creation. Metadata accounts for information related to the main data, but it is not part of it.

Let’s explore the same conversation about the A320 Hydraulic System Failure exposed in the first example in this json format. The conversation itself consists of the following Q&A:

Self-made screenshot from the Regenerate response button in ChatGPT.

[ad_2]

Source link

What ChatGPT Knows about You: OpenAI’s Journey Towards Data Privacy | by Andrea Valenzuela | May, 2023

Input

System Messages