Hole | Wins | Losses | Score | +20 |
---|---|---|---|---|
1 | 37 | 43 | +8 | (+1) |
10 | 33 | 48 | +20 | (E) |
Avg | 35 | 46 | +42 |
Here is the output of the statsmodels
command on the text data:
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
# Load data into a Pandas DataFrame
data = pd.read_csv('text_data.csv')
# Fit a linear regression model to the data
model = ols(formula='value ~ time', data=data).fit()
# Print the summary of the model
print(model.summary())
# Perform ANOVA on the data
anova_table = anova_lm(model)
# Print the ANOVA table
print(anova_table)
Note that this code assumes that the text data is in a CSV file named text_data.csv
, and that the values to be modeled are stored in a column named value
with a corresponding time series. You will need to modify the code to suit your specific use case.
Also, please note that I did not process the text data as it was given in a very large format which would require processing before using statsmodels.
Here's an example of how you might pre-process this text data for a simple linear regression analysis:
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
def preprocess_text(text):
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))
tokens = word_tokenize(text)
filtered_tokens = [t for t in tokens if t.lower() not in stop_words]
filtered_tokens = [lemmatizer.lemmatize(t) for t in filtered_tokens]
return ' '.join(filtered_tokens)
text_data = pd.read_csv('text_data.csv', usecols=['value', 'time'])
text_data['value'] = text_data['value'].apply(preprocess_text)
Updated: July 18, 2025 at 3:55 AM