-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No documentation to run from a pandas dataframe #25
Comments
I have managed to make below Dataframe, same as standardly used reliability_data in this project. (Ignore the decimals, it's necessary for me). Assertion will verify that those are the same if you want to check I also replaced null values with "N/A", as I'm reading excels
|
My full implementation from a while back, with a redundant single rater logic that was fixed after I have wrote the code
|
If you add this the below code into line 310 of krippendorff.krippendorff.py it will process it naturally if type(reliability_data).__name__ == "DataFrame":
data = reliability_data.T.values.tolist()
data_tuple = tuple(' '.join(map(str, row)) for row in data)
reliability_data = [[round(-(1/float(val))+2, 4) if isinstance(val, (int, float)) and float(val) < 1 else round(float(val), 4)
if val != "*" and val != "N/A" else np.nan for val in coder.split()] for coder in data_tuple]
value_domain = list(set([val for sublist in reliability_data for val in sublist])).sort() it should look like this if (reliability_data is None) == (value_counts is None):
raise ValueError("Either reliability_data or value_counts must be provided, but not both.")
if type(reliability_data).__name__ == "DataFrame":
data = reliability_data.T.values.tolist()
data_tuple = tuple(' '.join(map(str, row)) for row in data)
reliability_data = [[round(-(1/float(val))+2, 4) if isinstance(val, (int, float)) and float(val) < 1 else round(float(val), 4)
if val != "*" and val != "N/A" else np.nan for val in coder.split()] for coder in data_tuple]
value_domain = list(set([val for sublist in reliability_data for val in sublist])).sort()
# Don't know if it's a `list` or NumPy array. If it's the latter, the truth value is ambiguous. So, ask for `None`.
if value_counts is None:
reliability_data = np.asarray(reliability_data) Expected input, where df is what should be passed as reliability_data import pandas as pd
data = {
"coder A": ["*", "*", "*", "*", "*", "3", "4", "1", "2", "1", "1", "3", "3", "*", "3"],
"coder B": ["1", "*", "2", "1", "3", "3", "4", "3", "*", "*", "*", "*", "*", "*", "*"],
"coder C": ["*", "*", "2", "1", "3", "4", "4", "*", "2", "1", "1", "3", "3", "*", "4"]
}
df = pd.DataFrame(data) |
Pandas Dataframe is the most used tool to load CSVs. Please incorporate the documentation to calculate the reliability matrix from the data frames.
The text was updated successfully, but these errors were encountered: