CSV option is greyed out #56

abdulrehmanmian · 2020-08-09T23:26:42Z

I just got done with a 100 gb jsonl file but the csv option is greyed out, how to solve this?

mihirp161 · 2020-10-05T19:40:35Z

That's huge for this app I believe, you may have to do it yourself. In case you don't have enough RAM memory, your best bet would be to read it in through a python or R environment in chunks then write that chunk to csv then clear the memory then repeat until final line (you can search online, a lot ways to do that). Or if you have limited memory, you can go ahead and use the Linux terminal (not sure of Windows, but there could be a similar method in Win OS too)-

Following command in Linux prompt will take the jsonl file and split it in 50K chunks.
split -l 50000 --additional-suffix=.jsonl *.jsonl ./FOLDER_WHERE_JSONL_FILE_IS/GIVE_OUTPUT_FILE_PREFIX_

I hope this helps. Good luck :-)

rtrad89 · 2020-10-12T08:48:56Z

Is #51 related?

PS. You may have closed the Hydrator too soon. You need to give it time till the CSV option shows and then wait even more till it finishes converting the file after you click it. If you close it in the middle of the conversion process, it keeps deactivated no matter what.

rtrad89 · 2020-10-12T10:47:25Z

That's huge for this app I believe, you may have to do it yourself. In case you don't have enough RAM memory, your best bet would be to read it in through a python or R environment in chunks then write that chunk to csv then clear the memory then repeat until final line (you can search online, a lot ways to do that). Or if you have limited memory, you can go ahead and use the Linux terminal (not sure of Windows, but there could be a similar method in Win OS too)-

Following command in Linux prompt will take the jsonl file and split it in 50K chunks.
split -l 50000 --additional-suffix=.jsonl *.jsonl ./FOLDER_WHERE_JSONL_FILE_IS/GIVE_OUTPUT_FILE_PREFIX_

I hope this helps. Good luck :-)

Here's a basic snippet of code in Python 3x -- just replace [INPUT] with your jsonl filename, and insert a desirable name for the output csv in place of [OUTPUT]

# -*- coding: utf-8 -*-
"""
Adapted from https://stackoverflow.com/a/46653313/3429115
"""

import json
import csv
import io
from datetime import datetime

'''
creates a .csv file using a Twitter .json file
the fields have to be set manually
'''

def extract_json(fileobj):
    """
    Iterates over an open JSONL file and yields
    decoded lines.  Closes the file once it has been
    read completely.
    """
    with fileobj:
        for line in fileobj:
            yield json.loads(line)    


data_json = io.open('tweets_20200501-V2.jsonl', mode='r', encoding='utf-8') # Opens in the JSONL file
data_python = extract_json(data_json)

csv_out = io.open('tweets_20200501.csv', mode='w', encoding='utf-8') #opens csv file


fields = u'id,created_at,reweet_id,user_screen_name,user_followers_count,user_friends_count,retweet_count,favourite_count,text' #field names
csv_out.write(fields)
csv_out.write(u'\n')

print(f"{datetime.utcnow()}: Output file created. Starting conversion..")

for i, line in enumerate(data_python):

    #writes a row and gets the fields from the json object
    #screen_name and followers/friends are found on the second level hence two get methods
    row = [line.get('id_str'),
           line.get('created_at'),
           line.get('retweeted_status').get('id_str') if line.get('retweeted_status') is not None else "",
           line.get('user').get('screen_name'),  
           str(line.get('user').get('followers_count')),
           str(line.get('user').get('friends_count')),
           str(line.get('retweet_count')),
           str(line.get('favorite_count')),
           '"' + line.get('full_text').replace('"','""') + '"', #creates double quotes
           ]
    
    if i%100000 == 0 and i > 0:
        print(f"{datetime.utcnow()}: {i} tweets done...")

    row_joined = u','.join(row)
    csv_out.write(row_joined)
    csv_out.write(u'\n')

print("All tweets done. Saving the csv...")
csv_out.close()
print("Done.")

edsu mentioned this issue Apr 30, 2021

Fewer Tweets in CSV than hydrated #88

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV option is greyed out #56

CSV option is greyed out #56

abdulrehmanmian commented Aug 9, 2020

mihirp161 commented Oct 5, 2020 •

edited

Loading

rtrad89 commented Oct 12, 2020 •

edited

Loading

rtrad89 commented Oct 12, 2020 •

edited

Loading

CSV option is greyed out #56

CSV option is greyed out #56

Comments

abdulrehmanmian commented Aug 9, 2020

mihirp161 commented Oct 5, 2020 • edited Loading

rtrad89 commented Oct 12, 2020 • edited Loading

rtrad89 commented Oct 12, 2020 • edited Loading

mihirp161 commented Oct 5, 2020 •

edited

Loading

rtrad89 commented Oct 12, 2020 •

edited

Loading

rtrad89 commented Oct 12, 2020 •

edited

Loading